Monday, September 04, 2006

Simple things first

With default configuration web servers and web browsers do a good job keeping "fresh" static documents near you, saving them in computer's hard disk. A web page that, with all related documents, would take one second to be fully downloaded from the web can be retrieved from disk in some milliseconds — about 100 times faster. This can be implemented through HTTP headers or equivalent HTML meta tags.

Suppose there is in a default configured server one web page with some text and an image. First time you visit the page your browser will ask for the HTML page and will reveive it through the HTTP protocol  [to view HTTP headers you can use tools like Proxomitron]:

GET /cache/static1/testpage.html HTTP/1.0

HTTP/1.0 200 OK
Last-Modified: Sun, 03 Sep 2006 20:22:00 GMT
ETag: "26832-ae-60b32a00"
(show all)

In this case Last-Modified is the last modified date as reported by the operating system to the web server and ETag [Entity Tag] is a caching identifier created by the web server. While receiving the entity, browser will parse HTML tags and ask for all related documents:

GET /cache/static1/antbird.jpg HTTP/1.0

HTTP/1.0 200 OK
Last-Modified: Sun, 03 Sep 2006 20:22:00 GMT
ETag: "26830-1981-60b32a00"

Finally, if browser cache is enabled, all documents are saved in a special folder inside client computer hard disk. Next time you visit the page it is retrieved almost instantly from cache, without asking the web server. However after some visits the page begins to be "stale" and, depending on browser configuration, a check with the server will be done:

GET /cache/static1/testpage.html HTTP/1.0
If-Modified-Since: Sun, 03 Sep 2006 20:22:00 GMT
If-None-Match: "26832-ae-60b32a00"

HTTP/1.0 304 Not Modified
ETag: "26832-ae-60b32a00"
(show all)

The browser requests the test page if-modified-since the last-modified date received when the page was cached. If the test page was not modified and there is a match caching identifier in the server, the browser receives a not-modified answer with no HTML, saving bandwidth consumption. What about the antbird image? That depends on browser's cache implementation and web designers must assume nothing.

2 Comments:

At 7:45 PM, September 04, 2006, Blogger Rui Baptista said...

The way browsers check freshness depends on configuration. By default Explorer checks page and content when page is first visited in current session, and later can check the page only. Mozillas check only the page.
But all this can change with user configuration.

 
At 8:58 PM, September 04, 2006, Blogger Rui Baptista said...

An easy way to solve the "changed content" problem is to give new names to content -- this way the page must be changed [with new name] and things should work (need test).
In many cases (like css) this may not be the best shot.

 

Post a Comment

<< Home