Show Contents Previous Page Next Page Chapter 7 - Other Request Phases In this section...
One of the web's virtues is its Uniform Resource Identifier (URI) and Uniform Resource Locator (URL) standards. End users never know for sure what is sitting behind a URI. It could be a static file, a dynamic script, a proxied request, or something even more esoteric. The file or program behind a URI may change over time, but this too is transparent to the end user.
Much of Apache's power and flexibility comes from its highly configurable URI translation phase, which comes relatively early in the request cycle, after the post_read_request and before the header_parser phases. During this phase, the URI requested by the remote browser is translated into a physical filename, which may in turn be returned directly to the browser as a static document or passed on to a CGI script or Apache API module for processing. During URI translation, each module that has declared its interest in handling this phase is given a chance to modify the URI. The first module to handle the phase (i.e., return something other than a status of By default, two URI translation handlers are installed in stock Apache distributions. The mod_alias module looks for the existence of several directives that may apply to the current URI. These include Alias, ScriptAlias, Redirect, AliasMatch, and other directives. If it finds one, it uses the directive's value to map the URI to a file or directory somewhere on the server's physical filesystem. Otherwise, the request falls through to the default URI translation handler, which simply appends the URI to the value of the Document-Root configuration directive, forming a file path relative to the document root. The optional mod_rewrite module implements a much more comprehensive URI translator that allows you to slice and dice URIs in various interesting ways. It is extremely powerful but uses a series of pattern matching conditions and substitution rules that can be difficult to get right. Once a translation handler has done its work, Apache walks along the returned filename path in the manner described in Chapter 4, Content Handlers, finding where the path part of the URI ends and the additional path information begins. This phase of processing is performed internally and cannot be modified by the module API. In addition to their intended role in transforming URIs, translation handlers are sometimes used to associate certain types of URIs with specific upstream handlers. We'll see examples of this later in the chapter when we discuss creating custom proxy services in the section "Handling Proxy Requests." A Very Simple Translation Handler Show Contents Go to Top Previous Page Next PageLet's look at an example. Many of the documents browsed on a web site are files that are located under the configured DocumentRoot. That is, the requested URI is a filename relative to a directory on the hard disk. Just so you can see how simple a translation handler's job can be, we present a Perl version of Apache's default translation handler found in the http_core module. package Apache::DefaultTrans; use Apache::Constants qw(:common BAD_REQUEST); use Apache::Log (); sub handler { my $r = shift; my $uri = $r->uri; if($uri !~ m:^/: or index($uri, '*')) { $r->log->error("Invalid URI in request ", $r->the_request); return BAD_REQUEST; } $r->filename($r->document_root . $r->uri); return OK; } 1; __END__
The handler begins by subjecting the requested URI to a few sanity checks, making sure that it begins with a slash and doesn't contain any We don't check at this point whether the file exists or can be opened. This is the job of handlers further down the request chain. To install this handler, just add the following directive to the main part of your perl.conf configuration file (or any other Apache configuration file, if you prefer): PerlTransHandler Apache::DefaultTrans Beware. You probably won't want to keep this handler installed for long. Because it overrides other translation handlers, you'll lose the use of Alias, ScriptAlias, and other standard directives. A Practical Translation Handler Show Contents Go to Top Previous Page Next PageHere's a slightly more complex example. Consider a web-based system for archiving software binaries and source code. On a nightly basis an automated system will copy changed and new files from a master repository to multiple mirror sites. Because of the vagaries of the Internet, it's important to confirm that the entire file, and not just a fragment of it, is copied from one mirror site to the other. One technique for solving this problem would be to create an MD5 checksum for each file and store the information on the repository. After the mirror site copies the file, it checksums the file and compares it against the master checksum retrieved from the repository. If the two values match, then the integrity of the copied file is confirmed. In this section, we'll begin a simple system to retrieve precomputed MD5 checksums from an archive of files. To retrieve the checksum for a file, you simply append the extension .cksm to the end of its URI. For example, if the archived file you wish to retrieve is: /archive/software/cookie_cutter.tar.gz then you can retrieve a text file containing its MD5 checksum by fetching this URI: /archive/software/cookie_cutter.tar.gz.cksm The checksum files will be precomputed and stored in a physical directory tree that parallels the document hierarchy. For example, if the document itself is physically stored in: /home/httpd/htdocs/archive/software/cookie_cutter.tar.gz then its checksum will be stored in a parallel tree in this file: /home/httpd/checksums/archive/software/cookie_cutter.tar.gz The job of the URI translation handler is to map requests for /file/path/filename.cksm files into the physical file /home/httpd/checksums/file/path/filename. When called from a browser, the results look something like the screenshot in Figure 7-1. Figure 7-1. A checksum file retrieved by Apache::Checksum1 As often happens with Perl programs, the problem takes longer to state than
to solve. Example 7-1 shows a translation handler,
Apache::Checksum1, that accomplishes this task. The structure is similar
to other Apache Perl modules. After the usual preamble, the handler()
subroutine shifts the Apache request object off the call stack and uses it to
recover the URI of the current request, which is stashed in the local variable
Now the subroutine checks whether this URI needs special handling. It does this by attempting a string substitution which will replace the .cksm URI with a physical path to the corresponding file in the checksums directory tree. If the substitution returns a false value, then the requested URI does not end with the .cksm extension and we return Example 7-1. A URI Translator for Checksum Files package Apache::Checksum1; # file: Apache/Checksum1.pm use strict; use Apache::Constants qw(:common); use constant DEFAULT_CHECKSUM_DIR => '/usr/tmp/checksums'; sub handler { my $r = shift; my $uri = $r->uri; my $cksumdir = $r->dir_config('ChecksumDir') || DEFAULT_CHECKSUM_DIR; $cksumdir = $r->server_root_relative($cksumdir); return DECLINED unless $uri =~ s!^(.+)\.cksm$!$cksumdir$1!; $r->filename($uri); return OK; } 1; __END__The configuration for this translation handler should look something like this: # checksum translation handler directives PerlTransHandler Apache::Checksum1 PerlSetVar ChecksumDir /home/httpd/checksums <Directory /home/httpd/checksums> ForceType text/plain </Directory> This configuration declares a URI translation handler with the PerlTransHandler directive and sets the Perl configuration variable ChecksumDir to /home/httpd/checksums, the top of the checksum tree. We also need a <Directory> section to force all files in the checksums directory to be of type text/plain. Otherwise, the default MIME type checker will try to use each checksum file's extension to determine its MIME type. There are a couple of important points about this configuration section. First, the PerlTransHandler and PerlSetVar directives are located in the main section of the configuration file, not in a <Directory>, <Location>, or <Files> section. This is because the URI translation phase runs very early in the request processing cycle, before Apache has a definite URI or file path to use in selecting an appropriate <Directory>, <Location>, or <Files> section to take its configuration from. For the same reason, PerlTransHandler is not allowed in .htaccess files, although you can use it in virtual host sections. The second point is that the ForceType directive is located in a <Directory> section rather than in a <Location> block. The reason for this is that the <Location> section refers to the requested URI, which is not changed by this particular translation handler. To apply access control rules and other options to the physical file path returned by the translation handler, you must use <Directory> or <Files>. To set up the checksum tree, you'll have to write a script that will recurse through the web document hierarchy (or a portion of it) and create a mirror directory of checksum files. In case you're interested in implementing a system like this one, Example 7-2 gives a short script named checksum.pl that does this. It uses the File::Find module to walk the tree of source files, the MD5 module to generate MD5 checksums, and File::Path and File::Basename for filename manipulations. New checksum files are only created if the checksum file doesn't exist or the modification time of the source file is more recent than that of an existing checksum file. You call the script like this: % checksum.pl -source ~www/htdocs -dest ~www/checksums Replace ~www/htdocs and ~www/checksums with the paths to the web document tree and the checksums directory on your system. Example 7-2. checksum.pl Creates a Parallel Tree of Checksum Files #!/usr/local/bin/perl use File::Find; use File::Path; use File::Basename; use IO::File; use MD5; use Getopt::Long; use strict; use vars qw($SOURCE $DESTINATION $MD5); GetOptions('source=s' => \$SOURCE, 'destination=s' => \$DESTINATION) || die <<USAGE; Usage: $0 Create a checksum tree. Options: -source <path> File tree to traverse [.] -destination <path> Destination for checksum tree [TMPDIR] Option names may be abbreviated. USAGE $SOURCE ||= '.'; $DESTINATION ||= $ENV{TMPDIR} || '/tmp'; die "Must specify absolute destination directory" unless $DESTINATION=~m!^/!; $MD5 = new MD5; find(\&wanted,$SOURCE); # This routine is called for each node (directory or file) in the # source tree. On entry, $_ contains the filename, # and $File::Find::name contains its full path. sub wanted { return unless -f $_ && -r _; my $modtime = (stat _)[9]; my ($source,$dest,$url); $source = $File::Find::name; ($dest = $source)=~s/^$SOURCE/$DESTINATION/o; return if -e $dest && $modtime <= (stat $dest)[9]; ($url = $source) =~s/^$SOURCE//o; make_checksum($_,$dest,$url); } # This routine is called with the source file, the destination in which # to write the checksum, and a URL to attach as a comment to the checksum. sub make_checksum { my ($source,$dest,$url) = @_; my $sfile = IO::File->new($source) || die "Couldn't open $source: $!\n"; mkpath dirname($dest); # create the intermediate directories my $dfile = IO::File->new(">$dest") || die "Couldn't open $dest: $!\n"; $MD5->reset; $MD5->addfile($sfile); print $dfile $MD5->hexdigest(),"\t$url\n"; # write the checksum } __END__Show Contents Go to Top Previous Page Next Page Copyright © 1999 by O'Reilly & Associates, Inc. |
HIVE: All information for read only. Please respect copyright! |