Friday, September 22, 2006

activity log, to do list

Now my test pages have a how to and a much more intuitive interface: to figure what's going on there is no need to trace http headers, just watch the image — it is intended to change only when cache is not used. Also a reason to believe for my anti-msie friends was included.

Before moving on to [script generated] dynamics the following tests will be done:

  • Expires with no validator
  • Last-Modified vs ETag

Monday, September 18, 2006

META cache

This should be a post about using the html tag META with an equivalent to Expires to define an expiration time. But that makes no sense in most situations! Usually we want to say "check this every day" or "every 5 minutes" and that implies an expiration time relative to last access, something we can't do with META Expires — only absolute expiration times are possible. So it's time to move on.

HTTP 1.1 defines the header Cache-Control to extend the basic mechanism "cache validator / expiration time". There is an alternative to Expires, the max-age directive. The interesting point about max-age is that it sets the expiration time relative to last access.

But no browser tested has seen meta Cache-Control as a true equivalent of http Cache-Control, although most are near. With max-age set to a positive value:

  • Explorer ignores meta Cache-Control, or max-age in it.
  • Mozillas check with outworld after max-age period, but also make one check for each two requests while cache is fresh (should use cache).
  • Opera almost did it: page is checked after serving from cache for the max-age period and then other period starts… if the page was modified; for not modified items Opera checks each new request until refresh.

Was expecting something better… more tests around META cache:

  • Cache-Control with max-age=0: ignored by Explorer and respected by everyone else (no way to test IE7 release candidate: XP++ only).
  • Expires with future date: working with Opera and Explorer (using cache before expiration time and checking every request after time). Mozillas "check even requests" before timeout and check every request after.
  • Expires with past date: works for all, Explorer since 01 Jan 1980!
  • Pragma with no-cache: works for all.

Humm, seems time to move back… to Expires with past date, or Pragma with no-cache. Those two are used to keep the page fresh, requesting a check for each revisit. It is good to be checked, but there is a drawback: latency [and some bandwidth consumption, a 304 can use hundreds of bytes].

But not even the old Pragma can be trusted. Proxies do not read html meta tags; if a proxy is in the way chances are that he keeps the page and returns a hit next time page is requested:

GET /cache/static3/pragma.html HTTP/1.1

HTTP/1.1 200 OK
Last-Modified: Sun, 17 Sep 2006 01:45:28 GMT
ETag: "16217-6a7-697fb47b"
X-Cache: MISS from localhost

GET /cache/static3/pragma.html HTTP/1.1
If-Modified-Since: Sun, 17 Sep 2006 01:45:28 GMT
If-None-Match: "16217-6a7-697fb47b"

HTTP/1.1 304 Not Modified
Last-Modified: Sun, 17 Sep 2006 01:45:28 GMT
ETag: "16217-6a7-697fb47b"
X-Cache: HIT from localhost
(show all)

This is not a big deal for sparsely visited pages, they are soon overwritten by more hitted ones in proxy cache. For people that has some server space provided by ISP, and can't run scripts or set expiration with http, "meta Pragma no-cache" still is the way to keep pages fresh. Other items like images can't be expired this way, so here is the tip: never change other items without changing their name. Alter the links in html pages, in next visit the new links will be detected and the changed items will be requested.

Thursday, September 14, 2006

setting Expires with web server

Expiration date can be set in web server configuration for static resources [dynamic resources can have Expires set by scripts].

Apache Expires can be set in the configuration file [httpd.conf] or with .htaccess files. In any case expires_module must be active. While .htaccess files are more flexible, allowing changes without server restart [and sometimes are the only choice available], there is a performance impact: each time a file is requested Apache must process directory's .htaccess file as well as every one that exists up to the server root.

The following example ensures that all "*-stable.pdf" files are cacheable for one year from the time they are requested. To use inside httpd.conf <Directory> or <Location> directives, or in .htaccess files:

<IfModule mod_expires.c>
  ExpiresActive On
  <FilesMatch "-stable\.pdf$">
    ExpiresDefault "access plus 1 year"
  </FilesMatch>
</IfModule>
[actually, <FilesMatch> doesn't match with <Location> :/]

IIS Expires can be set with IIS Manager [control panel / administrative tools]. When you set Expires for a server or directory that applies to all files under:

  • Select server, directory or file properties with menus.
  • In properties dialog select the HTTP headers tab.
  • Select a relative [to access date] or absolute expiration time.

Sunday, September 10, 2006

Old style cache control

What if you want your page to be checked once per day, whatever are browser settings? Or every time the page is visited, although this is a bad idea for statics? HTTP/1.0 has the Expires clause, that allows web servers to specify a date/time after which the entity should be considered out of date and checked by browsers. Using Expires  the web designer can control how fresh each entity is, instead of relying on browser's and proxie's good will — and still using cache advantages.

Now imagine two users sharing their ISP's proxy. As result of one happy coincidence they are going to visit a new page with expiration set by the server; first request is made by Mr.Fox:

GET /cache/static2/testpage.html HTTP/1.1
User-Agent: Mozilla/5.0 Gecko/20060410 Firefox/1.0.8

HTTP/1.1 200 OK
Last-Modified: Fri, 08 Sep 2006 20:09:37 GMT
ETag: "4a1f-4b9-c99ec240"
Expires: Sun, 10 Sep 2006 06:12:57 GMT
X-Cache: MISS from localhost
(show all)

The proxy never seen that page, so will fetch it from the server or other proxy and send it to the browser, reporting (or omitting) a "MISS". While the page is considered fresh Mr.Fox can navigate anywhere and, if returns to the page, will get it from his own cache. More, if Mr.Smith requests the same page in that period of time he will get it from proxy cache:

GET /cache/static2/testpage.html HTTP/1.0
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0)

HTTP/1.0 200 OK
Last-Modified: Fri, 08 Sep 2006 20:09:37 GMT
ETag: "4a1f-4b9-c99ec240"
Expires: Sun, 10 Sep 2006 06:12:57 GMT
X-Cache: HIT from localhost
(show all)

Notice that, if the page's period of validity is 60 seconds and Mr.Smith visits it 40 seconds after Mr.Fox, Mr.Smith's browser will see it as valid for 20 seconds. After the page is considered out of date, if someone using the proxy tries to access the page, proxy will check it with server and the answer will be available to any user:

GET /cache/static2/testpage.html HTTP/1.0
If-Modified-Since: Fri, 08 Sep 2006 20:09:37 GMT
If-None-Match: "4a1f-4b9-c99ec240"
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0)

HTTP/1.0 304 Not Modified
ETag: "4a1f-4b9-c99ec240"
Expires: Sun, 10 Sep 2006 06:20:03 GMT
X-Cache: MISS from localhost
(show all)

Saturday, September 09, 2006

browser cache strategy

Opera [7.5] is very straightforward: for pages without explicit expiration it checks at regular, configurable periods of time starting on last check.

Explorer [6.0] has 4 options: ever, never, per start or default. Per start means per instance: open other window and all stuff will be checked again. Default seems to be per start plus some heuristic checks, for entities without expiration.

Netscape [7.1] also has 4 options: ever, never, per session or default. I was not able to see per session working, but default checks recent entities much more frequently than older ones; very recent modified entities are allways checked for modifications.

My Firefox [0.9] behaves like Netscape default.

Thursday, September 07, 2006

Woow, how time flies!

This blog thing took me a big time with the template. Now main block and sideboard are adjustable to window width, with a max-width working in IE thanks to Svend Tofte — I dont like style hacks but this one is very clear and makes IE do something that it should be doing lta.

Also a nice toggle system for logs was implemented, it is a little fat but allows multiple toggles in one post. Believe me, I am no anti M$IE but it is optimized for no IE — should show top of the box after toggle, and drawing glitches can be observed.

Now for what matters, my testing environment has grown:
Apache and IIS/Personal web servers [alternate]
IIS uses htdocs folders as virtual folders
Both can run PHP
Squid web cache proxy
Proxomitron to monitor HTTP headers
Explorer, Netscape, Firefox and Opera browsers

Browsers get from Proxomitron that gets from Squid that gets from Apache. Lets see what happens with two users using the same proxy server:

First, Firefox gets the entities and they are stored in proxy cache. When Netscape asks for the same entities they are delivered from proxy and the server is not contacted. This happens even with refresh, something I was not expecting.

Monday, September 04, 2006

Simple things first

With default configuration web servers and web browsers do a good job keeping "fresh" static documents near you, saving them in computer's hard disk. A web page that, with all related documents, would take one second to be fully downloaded from the web can be retrieved from disk in some milliseconds — about 100 times faster. This can be implemented through HTTP headers or equivalent HTML meta tags.

Suppose there is in a default configured server one web page with some text and an image. First time you visit the page your browser will ask for the HTML page and will reveive it through the HTTP protocol  [to view HTTP headers you can use tools like Proxomitron]:

GET /cache/static1/testpage.html HTTP/1.0

HTTP/1.0 200 OK
Last-Modified: Sun, 03 Sep 2006 20:22:00 GMT
ETag: "26832-ae-60b32a00"
(show all)

In this case Last-Modified is the last modified date as reported by the operating system to the web server and ETag [Entity Tag] is a caching identifier created by the web server. While receiving the entity, browser will parse HTML tags and ask for all related documents:

GET /cache/static1/antbird.jpg HTTP/1.0

HTTP/1.0 200 OK
Last-Modified: Sun, 03 Sep 2006 20:22:00 GMT
ETag: "26830-1981-60b32a00"

Finally, if browser cache is enabled, all documents are saved in a special folder inside client computer hard disk. Next time you visit the page it is retrieved almost instantly from cache, without asking the web server. However after some visits the page begins to be "stale" and, depending on browser configuration, a check with the server will be done:

GET /cache/static1/testpage.html HTTP/1.0
If-Modified-Since: Sun, 03 Sep 2006 20:22:00 GMT
If-None-Match: "26832-ae-60b32a00"

HTTP/1.0 304 Not Modified
ETag: "26832-ae-60b32a00"
(show all)

The browser requests the test page if-modified-since the last-modified date received when the page was cached. If the test page was not modified and there is a match caching identifier in the server, the browser receives a not-modified answer with no HTML, saving bandwidth consumption. What about the antbird image? That depends on browser's cache implementation and web designers must assume nothing.

Sunday, September 03, 2006

Web Cache Logs

Computer-related cache technology is the use of a faster and smaller memory type to accelerate a slower and larger memory type. Web pages and other documents are frequently cached closer to the client through browser, proxy, or server caches. By storing frequently accessed documents closer to the client, bandwidth consumption, server load, and latency can be reduced.

Usually a cache of recently visited web pages is managed by your web browser. Some browsers are configured to use an external proxy web cache, a server that routes all web requests and can cache frequently accessed pages for everyone in an organization. Web servers can cache computing or database intensive dynamic pages.