Thoroughly understand the browser's caching mechanism

Browser caching mechanism is what we call HTTP caching mechanism, the mechanism is based on the cache identity of the HTTP message, so before analyzing the browser caching mechanism, we first use the graphic to briefly introduce the HTTP message, HTTP message is divided into two kinds:

HTTP request (Request) message, the message format is: request line – HTTP header (generic information header, request header, entity header) – request message body (only POST has a message body), the following figure

HTTP response (Response) message, the message format is: status line – HTTP header (General Information Header, Response Header, Entity Header) – response message body, as follows

Note: General information header refers to the header fields supported by both request and response messages, respectively Cache-Control, Connection, Date, Pragma, Transfer-Encoding, Upgrade, Via; entity header is the entity header fields of entity information, respectively Allow, Content-Base Content-Encoding, Content-Language, Content-Length, Content-Location, Content-MD5, Content-Range, Content-Type, Etag, Expires, Last-Modified, Extension-Encoding, and Via; Entity Header is the entity header field of entity information, which are Allow, Content-Base, Content-Encoding, Content-Language, Content-Length, Content-Location, Content-MD5, Content-Range, Content-Type, Etag, Expires, Last- Modified, extension-header.Here just for ease of understanding, the generic information header, response header/request header, entity header are all categorized as HTTP headers.

We will not explain more about the above concepts here, only a brief introduction, interested children can study on their own.

Caching Process Analysis

The way the browser and server communication for the answer mode, that is: the browser initiated HTTP request – the server to respond to the request. Then the browser for the first time to the server to initiate the request to get the request result, will be based on the response message in the HTTP header of the cache ID, decide whether to cache the results, is the request results and cache ID into the browser cache, a simple process as follows:

As we can tell by the chart above:

Each time the browser initiates a request, it first looks up the result of the request in the browser cache along with the cache identifier
Each time the browser gets the result of a returned request, it stores the result and the cache identifier in the browser cache.

The above two conclusions is the key to the browser caching mechanism, he ensures that each request for cache deposited and read, as long as we then understand the rules of use of the browser cache, then all the problems are solved, this article will also be analyzed in detail around this point. In order to facilitate understanding, here we need to re-initiate HTTP requests to the server according to whether the caching process is divided into two parts, respectively, forced caching and negotiation cache.

Force Cache

Forced caching is to the browser cache to find the result of the request, and according to the result of the caching rules to decide whether to use the cached results of the process, there are three main cases of forced caching (do not analyze the negotiation cache process), as follows:

This cached result and cache identifier do not exist, forcing the cache to be invalidated, then the request is made directly to the server (in the same way as the first request is made), as follows:

This cached result and cache identifier exists, but the result is invalidated, forcing the cache to be invalidated, then the negotiated cache is used (not analyzed at this time), as follows

If the cached result and the cache identifier exist, and the result is not invalid yet, force the cache to take effect and return the result directly, as follows

So what are the caching rules for forced caching?

When the browser initiates a request to the server, the server will put the caching rules into the HTTP header of the HTTP response message and the result of the request together with the return to the browser, the control of the mandatory caching of the fields are Expires and Cache-Control, which Cache-Control priority is higher than Expires.

Expires

Expires is the HTTP/1.0 control of web page caching field, its value for the server to return the request result of the expiration of the cache, that is, when the request is initiated again, if the client’s time is less than the value of Expires, the direct use of cached results.

Expires is a field in HTTP/1.0, but browsers now use HTTP/1.1 by default, so is page caching still controlled by Expires in HTTP/1.1?

By HTTP/1.1, Expire has been replaced by Cache-Control, the reason is that Expires control the principle of caching is to use the client’s time to compare with the time returned by the server, so if the client and server time for some reasons (such as different time zones; the client and the server have a side of inaccurate time) error, then forced caching will be invalidated. cache will be invalidated, in which case the existence of forced caching is meaningless, so how does Cache-Control control this?

Cache-Control

In HTTP/1.1, Cache-Control is the most important rule, mainly used to control web page caching, the main values are:

public: all content will be cached (both client and proxy cachable)
private: all content can only be cached by the client, the default value of Cache-Control
no-cache: the client caches the content, but whether or not to use the cache needs to be verified by negotiating the cache to decide
no-store: all content is not cached, i.e., no forced caching or negotiated caching is used
max-age=xxx (xxx is numeric): cached content will expire after xxx seconds

Next, let’s look at an example directly below:

As we can see by the example above:

The time value of expires in the HTTP response message, which is an absolute value
Cache-Control in HTTP response messages is max-age=600, which is a relative value

Since Cache-Control is prioritized over expires, then caching is done directly based on the value of Cache-Control, meaning that if the request is initiated again within 600 seconds, the cached result will be used directly to force caching to take effect.

Note: Cache-Control is a better choice than expires in cases where it is not possible to determine whether the client’s time is synchronized with the server’s time, so only Cache-Control takes effect when it exists at the same time.

After understanding the process of forced caching, let’s think expansively:

Where is the browser’s cache stored, and how can I tell if forced caching is in effect in my browser?

Here we take the blog request as an example, the request with a gray status code represents the use of forced caching, and the corresponding Size value of the request represents the location where the cache is stored, which is from memory cache and from disk cache respectively.

What do from memory cache and from disk cache represent? When would you use from disk cache and when would you use from memory cache?

from memory cache means using the cache in memory, from disk cache means using the cache on the hard disk, the order in which the browser reads the cache is memory -> disk.

Although I have been directly out of the conclusion, but I believe there are many people can not understand this, then we analyze the next detailed analysis of the cache read problem, here still let my blog as an example of analysis:

For this question, we need to understand the memory cache (from memory cache) and the hard disk cache (from disk cache) as follows.

Memory cache (from memory cache): Memory cache has two characteristics, which are fast read and timeliness:
Fast Read: The memory cache will store the compiled and parsed files directly into the memory of the process, occupying a certain amount of memory resources of the process, in order to facilitate fast reads for the next run.
Timeliness: Once the process is shut down, the memory of the process is cleared.
Hard disk cache (from disk cache): hard disk cache is directly written to the cache in the hard disk file, read the cache needs to be stored in the cache hard disk file I/O operations, and then re-parse the cache content, read complex, slower than the memory cache.

In the browser, the browser will be in the js and images and other files are parsed and executed directly into the memory cache, then when the page is refreshed only directly from the memory cache to read (from memory cache); while the css file will be stored in the hard disk file, so each time the rendering of the page need to be read from the hard disk cache (from disk cache).

negotiation cache

Negotiated caching is the process of forcing the cache to be invalidated, the browser carries the cache logo to the server to initiate a request, and the server decides whether or not to use the cache based on the cache logo, there are two main cases:

The negotiation cache takes effect and returns 304, as follows

304

Negotiate cache invalidation, return 200 and request result result as follows

200

Similarly, the identity of the negotiation cache is returned to the browser in the HTTP header of the response message along with the request result. The fields controlling the negotiation cache are: Last-Modified / If-Modified-Since and Etag / If-None-Match, where Etag / If-None-Match has a higher priority than Last-Modified / If-Modified-Since.

Last-Modified / If-Modified-Since

Last-Modified is the time when the server responds to a request to return the last time this resource file was modified on the server, as follows.

last-modify

If-Modified-Since is when the client initiates the request again, it carries the Last-Modified value returned by the last request, and tells the server the last modified time of the resource returned by the last request through this field value. When the server receives the request and finds that the request header contains the If-Modified-Since field, it will compare the Last-Modified-Since value with the last modified time of the resource on the server, and if the last modified time of the resource on the server is greater than the value of the If-Modified-Since field, it will return the resource again with a status code of 200; otherwise, it returns 30 code is 200; otherwise, it returns 304, which means that the resource is not updated and you can continue to use the cached file as follows.

If-Modified-Since

Etag / If-None-Match

Etag is a unique identifier (generated by the server) of the current resource file returned by the server in response to a request, as follows.

Etag

If-None-Match is when the client initiates this request again, it carries the Etag value of the unique identifier returned by the last request, and tells the server the value of the unique identifier returned by the last request of this resource through this field value. After the server receives the request and finds that the request header contains If-None-Match, it will compare the field value of If-None-Match with the Etag value of the resource in the server, and if it is consistent, it will return 304, which means that there is no update of the resource, and it will continue to use the cached file; if not, it will re-return the resource file with a status code of 200, as follows.

Etag-match

Note: Etag / If-None-Match has higher priority than Last-Modified / If-Modified-Since, if both exist, only Etag / If-None-Match will take effect.

Thoroughly understand the browser’s caching mechanism

Caching Process Analysis

Force Cache

Expires

Cache-Control

negotiation cache

Last-Modified / If-Modified-Since

Etag / If-None-Match

By lzz

Related Post

Leave a Reply Cancel reply

You Missed

8 Python practical scripts, save them for future use!

Python logging library logging summary – probably the best article summarizing the logging library so far

I hear you know Python?

An article on collection manipulation functions in Kotlin