Show Contents Previous Page Next Page Chapter 7 - Other Request Phases In this section...
The HTTP proxy protocol was originally designed to allow users unfortunate enough to be stuck behind a firewall to access external web sites. Instead of connecting to the remote server directly, an action forbidden by the firewall, users point their browsers at a proxy server located on the firewall machine itself. The proxy goes out and fetches the requested document from the remote site and forwards the retrieved document to the user. Nowadays most firewall systems have a web proxy built right in so there's no need for dedicated proxying servers. However, proxy servers are still useful for a variety of purposes. For example, a caching proxy (of which Apache is one example) will store frequently requested remote documents in a disk directory and return the cached documents directly to the browser instead of fetching them anew. Anonymizing proxies take the outgoing request and strip out all the headers that can be used to identify the user or his browser. By writing Apache API modules that participate in the proxy process, you can achieve your own special processing of proxy requests. The proxy request/response protocol is nearly the same as vanilla HTTP. The major difference is that instead of requesting a server-relative URI in the request line, the client asks for a full URL, complete with scheme and host. In addition, a few optional HTTP headers beginning with Proxy- may be added to the request. For example, a normal (nonproxy) HTTP request sent by a browser might look like this: GET /foo/index.html HTTP/1.0 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */* Pragma: no-cache Connection: Keep-Alive User-Agent: Mozilla/2.01 (WinNT; I) Host: www.modperl.com:80 In contrast, the corresponding HTTP proxy request will look like this: GET http://www.modperl.com/foo/index.html HTTP/1.0 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */* Pragma: no-cache User-Agent: Mozilla/2.01 (WinNT; I) Host: www.modperl.com:80 Proxy-Connection: Keep-Alive Notice that the URL in the request line of an HTTP proxy request includes the scheme and hostname. This information enables the proxy server to initiate a connection to the distant server. To generate this type of request, the user must configure his browser so that HTTP and, optionally, FTP requests are proxied to the server. This usually involves setting values in the browser's preference screens. An Apache server will be able to respond to this type of request if it has been compiled with the mod_proxy module. This module is part of the core Apache distribution but is not compiled in by default. You can interact with Apache's proxy mechanism at the translation handler phase. There are two types of interventions you can make. You can take an ordinary (nonproxy) request and change it into one so that it will be handled by Apache's standard proxy module, or you can take an incoming proxy request and install your own content handler for it so that you can examine and possibly modify the response from the remote server. Invoking mod_proxy for Nonproxy Requests Show Contents Go to Top Previous Page Next PageWe'll look first at Apache::PassThru, an example of how to turn an ordinary request into a proxy request.9 Because this technique uses Apache's mod_proxy module, this module will have to be compiled and installed in order for this example to run on your system. The idea behind the example is simple. Requests for URIs beginning with a
certain path will be dynamically transformed into a proxy request. For example,
we might transform requests for URLs beginning with /CPAN/ into a request
for http://www.perl.com/CPAN/ The configuration for this example uses a PerlSetVar to set a variable named Perl-PassThru. A typical entry in the configuration directive will look like this: PerlTransHandler Apache::PassThru PerlSetVar PerlPassThru '/CPAN/ => http://www.perl.com/,\ /search/ => http://www.altavista.digital.com/' The PerlPassThru variable contains a string representing a series of URI=>proxy pairs, separated by commas. A backslash at the end of a line can be used to split the string over several lines, improving readability (the ability to use backslash as a continuation character is actually an Apache configuration file feature but not a well-publicized one). In this example, we map the URI /CPAN/ to http://www.perl.com/ and /search/ to http://www.altavista.digital.com/. For the mapping to work correctly, local directory names should end with a slash in the manner shown in the example. The code for Apache::PassThru is given in Example 7-10. The handler() subroutine begins by retrieving the request object and calling its proxyreq() method to determine whether the current request is a proxy request: sub handler { my $r = shift; return DECLINED if $r->proxyreq;
If this is already a proxy request, we don't want to alter it in any way, so we decline the transaction. Otherwise, we retrieve the value of PerlPassThru, split it into its key/value components with a pattern match, and store the result in a hash named my $uri = $r->uri; my %mappings = split /\s*(?:,|=>)\s*/, $r->dir_config('PerlPassThru'); We now loop through each of the local paths, looking for a match with the current request's URI. If a match is found, we perform a string substitution to replace the local path with the corresponding proxy URI. Otherwise, we continue to loop: for my $src (keys %mappings) { next unless $uri =~ s/^$src/$mappings{$src}/; $r->proxyreq(1); $r->uri($uri); $r->filename("proxy:$uri"); $r->handler('proxy-server'); return OK; } return DECLINED; }
If the URI substitution succeeds, there are four steps we need to take to transform this request into something that mod_proxy will handle. The first two are obvious, but the others are less so. First, we need to set the proxy request flag to a true value by calling
If we turned the local path into a proxy request, we return Example 7-10. Invoking Apache's Proxy Request Mechanism from Within a Translation Handler package Apache::PassThru; # file: Apache/PassThru.pm; use strict; use Apache::Constants qw(:common); sub handler { my $r = shift; return DECLINED if $r->proxyreq; my $uri = $r->uri; my %mappings = split /\s*(?:,|=>)\s*/, $r->dir_config('PerlPassThru'); for my $src (keys %mappings) { next unless $uri =~ s/^$src/$mappings{$src}/; $r->proxyreq(1); $r->uri($uri); $r->filename("proxy:$uri"); $r->handler('proxy-server'); return OK; } return DECLINED; } 1; __END__Show Contents Go to Top Previous Page Next Page As public concern about the ability of web servers to track people's surfing sessions grows, anonymizing proxies are becoming more popular. An anonymizing proxy is similar to an ordinary web proxy, except that certain HTTP headers that provide identifying information such as the Referer, Cookie, User-Agent, and From fields are quietly stripped from the request before forwarding it on to the remote server. Not only is this identifying information removed, but the identity of the requesting host is obscured. The remote server knows only the hostname and IP address of the proxy machine, not the identity of the machine the user is browsing from. You can write a simple anonymizing proxy in the Apache Perl API in all of 18 lines (including comments). The source code listing is shown in Example 7-11. Like the previous example, it uses Apache's mod_proxy, so that module must be installed before this example will run correctly.
The module defines a package global named
If proxyreq() returns true, we know that we are in the midst of a proxy request. We loop through each of the fields to be stripped and delete them from the incoming headers table by using the request object's header_in() method to set the field to undef. We then return To activate the anonymizing proxy, install it as a URI translation handler as before: PerlTransHandler Apache::AnonProxy An alternative that works just as well is to call the module during the header parsing phase (see the discussion of this phase earlier). In some ways, this makes more sense because we aren't doing any actual URI translation, but we are modifying the HTTP header. Here is the appropriate directive: PerlHeaderParserHandler Apache::AnonProxy The drawback to using PerlHeaderParserHandler like this is that, unlike PerlTransHandler, the directive is allowed in directory configuration sections and .htaccess files. But directory configuration sections are irrelevant in proxy requests, so the directive will silently fail if placed in one of these sections. The directive should go in the main part of one of the configuration files or in a virtual host section. Example 7-11. A Simple Anonymizing Proxy package Apache::AnonProxy; # file: Apache/AnonProxy.pm use strict; use Apache::Constants qw(:common); my @Remove = qw(user-agent cookie from referer); sub handler { my $r = shift; return DECLINED unless $r->proxyreq; foreach (@Remove) { $r->header_in($_ => undef); } return OK; } 1; __END__ In order to test that this handler was actually working, we set up a test Apache server as the target of the proxy requests and added the following entry to its configuration file: CustomLog logs/nosy_log "%h %{Referer}i %{User-Agent}i %{Cookie}i %U" This created a "nosy" log that contains entries for the Referer, User-Agent, and Cookie fields. Before installing the anonymous proxy module, entries in this log looked like this (the lines have been wrapped to fit on the page): 192.168.2.5 http://prego/ Mozilla/4.04 [en] (X11; I; Linux 2.0.33 i686) - /tkdocs/tk_toc.ht 192.168.2.5 http://prego/ Mozilla/4.04 [en] (X11; I; Linux 2.0.33 i686) POMIS=10074 /perl/hangman1.pl In contrast, after installing the anonymizing proxy module, all the identifying information was stripped out, leaving only the IP address of the proxy machine: 192.168.2.5 - - - /perl/hangman1.pl 192.168.2.5 - - - /icons/hangman/h0.gif 192.168.2.5 - - - /cgi-bin/info2wwwShow Contents Go to Top Previous Page Next Page Copyright © 1999 by O'Reilly & Associates, Inc. |
HIVE: All information for read only. Please respect copyright! |