Show Contents Previous Page Next Page Chapter 6 - Authentication and Authorization / Access Control with mod_perl A limitation of using pattern matching to identify robots is that it only catches the robots that you know about and that identify themselves by name. A few devious robots masquerade as users by using user agent strings that identify themselves as conventional browsers. To catch such robots, you'll have to be more sophisticated. A trick that some The module starts out much like the previous examples: package Apache::SpeedLimit; use strict; use Apache::Constants qw(:common); use Apache::Log (); use IPC::Shareable (); use vars qw(%DB); Because it needs to track the number of hits each client makes on the site,
Apache::SpeedLimit faces the problem of maintaining a persistent variable
across multiple processes. Here, because performance is an issue in a script
that will be called for every URI on the site, we solve the problem by tying
a hash to shared memory using IPC::Shareable. The tied variable, sub handler { my $r = shift; return DECLINED unless $r->is_main; # don't handle sub-requests my $speed_limit = $r->dir_config('SpeedLimit') || 10; # Accesses per minute my $samples = $r->dir_config('SpeedSamples') || 10; # Sampling threshold (hits) my $forgive = $r->dir_config('SpeedForgive') || 20; # Forgive after this period The handler() subroutine first fetches some configuration variables. The recognized directives include SpeedLimit, the number of accesses per minute that any client is allowed to make; SpeedSamples, the number of hits that the client must make before the module starts calculating statistics, and SpeedForgive, a "statute of limitations" on breaking the speed limit. If the client pauses for SpeedForgive minutes before trying again, the module will forgive it and treat the access as if it were the very first one. A small but important detail is the second line in the handler, where the subroutine declines the transaction unless is_main() returns true. It is possible for this handler to be invoked as the result of an internal subrequest, for example, when Apache is rapidly iterating through the contents of an automatically indexed directory to determine the MIME types of each of the directory's files. We do not want such subrequests to count against the user's speed limit totals, so we ignore any request that isn't the main one. is_main() returns true for the main request, false for subrequests. In addition to this, there's an even better reason for the is_main() check because the very next thing the handler routine does is to call lookup_uri() to look up the requested file's content type and to ignore requests for image files. Without the check, the handler would recurse infinitely: my $content_type = $r->lookup_uri($r->uri)->content_type; return OK if $content_type =~ m:^image/:i; # ignore images The rationale for the check for image files is that when a browser renders a graphics-intensive page, it generates a flurry of requests for inline images that can easily exceed the speed limit. We don't want to penalize users for this, so we ignore requests for inline images. It's necessary to make a subrequest to fetch the requested file's MIME type because access control handlers ordinarily run before the MIME type checker phase. If we are dealing with a nonimage document, then it should be counted against
the client's total. In the next section of the module, we tie a hash named tie %DB, 'IPC::Shareable', 'SPLM', {create => 1, mode => 0644} unless defined %DB; The next task is to create a unique ID for the client to use as a key into the hash: my($ip, $agent) = ($r->connection->remote_ip, $r->header_in ('User-Agent')); my $id = "$ip:$agent"; my $now = time()/60; # minutes since the epoch The client's IP address alone would be adequate in a world of one desktop PC per user, but the existence of multiuser systems, firewalls, and web proxies complicates the issue, making it possible for multiple users to appear to originate at the same IP address. This module's solution is to create an ID that consists of the IP address concatenated with the User-Agent field. As long as Microsoft and Netscape release new browsers every few weeks, this combination will spread clients out sufficiently for this to be a practical solution. A more robust solution could make use of the optional cookie generated by Apache's mod_usertrack module, but we didn't want to make this example overly complex. A final preparatory task is to fetch the current time and scale it to minute units. tied(%DB)->shlock; my($first, $last, $hits, $locked) = split ' ', $DB{$id}; Now we update the user's statistics and calculate his current fetch speed. In preparation for working with the shared data we call the tied hash's shlock() method, locking the data structure for writing. Next, we look up the user's statistics and split them into individual fields.
At this point in the code, we enter a block named my $result = OK; my $l = $r->server->log; CASE: {
Just before entering the block, we set a variable named
The first case we consider is when the unless ($first) { # we're seeing this client for the first time $l->debug("First request from $ip. Initializing speed counter."); $first = $last = $now; $hits = $locked = 0; last CASE; } In this case, we can safely assume that this is the first time we're seeing this client. Our action is to initialize the fields and exit the block. The second case occurs when the interval between the client's current and last accesses is longer than the grace period: if ($now - $last > $forgive) { # beyond the grace period. Treat like first $l->debug("$ip beyond grace period.Reinitializing speed counter."); $last = $first = $now; $hits = $locked = 0; last CASE; } In this case, we treat this access as a whole new session and reinitialize all the fields to their starting values. This "forgives" the client, even if it previously was locked out. At this point, we can bump up the number of hits and update the last access time. If the number of hits is too small to make decent statistics, we just exit the block at this point: $last = $now; $hits++; if ($hits < $samples) { $l->debug("$ip not enough samples to calculate speed."); last CASE; }
Otherwise, if the user is already locked out, we set the result code to if ($locked) { # already locked out, so forbid access $l->debug("$ip locked"); $result = FORBIDDEN; last CASE; }
If the client isn't yet locked out, then we calculate its average fetch speed by dividing the number of accesses it has made by the time interval between now and its first access. If this value exceeds the speed limit, we set the my $interval = $now - $first; $l->debug("$ip speed = ", $hits/$interval); if ($hits/$interval > $speed_limit) { $l->debug("$ip exceeded speed limit. Blocking."); $locked = 1; $result = FORBIDDEN; last CASE; } }
At the end of the module, we check the result code. If it's $r->log_reason("Client exceeded speed limit.", $r->filename) if $result == FORBIDDEN; $DB{$id} = join " ", $first, $now, $hits, $locked; tied(%DB)->shunlock; return $result; } To apply the Apache::SpeedLimit module to your entire site, you would create a configuration file entry like the following: <Location /> PerlAccessHandler Apache::SpeedLimit PerlSetVar SpeedLimit 20 # max 20 accesses/minute PerlSetVar SpeedSamples 5 # 5 hits before doing statistics PerlSetVar SpeedForgive 30 # amnesty after 30 minutes </Location> Example 6-4. Blocking Greedy Clients package Apache::SpeedLimit; # file: Apache/SpeedLimit.pm use strict; use Apache::Constants qw(:common); use Apache::Log (); use IPC::Shareable (); use vars qw(%DB); sub handler { my $r = shift; return DECLINED unless $r->is_main; # don't handle sub-requests my $speed_limit = $r->dir_config('SpeedLimit') || 10; # Accesses per minute my $samples = $r->dir_config('SpeedSamples') || 10;(hits) # Sampling threshold (hits) my $forgive = $r->dir_config('SpeedForgive') || 20; # Forgive after this period (minutes) # Forgive after this period (minutes) my $content_type = $r->lookup_uri($r->uri)->content_type; return OK if $content_type =~ m:^image/:i; # ignore images tie %DB, 'IPC::Shareable', 'SPLM', {create => 1, mode => 0644} unless defined %DB; my($ip, $agent) = ($r->connection->remote_ip, $r->header_in('User-Agent')); my $id = "$ip:$agent"; my $now = time()/60; # minutes since the epoch # lock the shared memory while we work with it tied(%DB)->shlock; my($first, $last, $hits, $locked) = split ' ', $DB{$id}; my $result = OK; my $l = $r->server->log; CASE: { unless ($first) { # we're seeing this client for the first time $l->debug("First request from $ip. Initializing speed counter."); $first = $last = $now; $hits = $locked = 0; last CASE; } if ($now - $last > $forgive) { # beyond the grace period. Treat like first $l->debug("$ip beyond grace period.Reinitializing speed counter."); # update the values now $last = $now; $hits++; if ($hits < $samples) { $l->debug("$ip not enough samples to calculate speed."); last CASE; } if ($locked) { # already locked out, so forbid access $l->debug("$ip locked"); $result = FORBIDDEN; last CASE; } my $interval = $now - $first; $l->debug("$ip speed = ", $hits/$interval); if ($hits/$interval > $speed_limit) { $l->debug("$ip exceeded speed limit. Blocking."); $locked = 1; $result = FORBIDDEN; last CASE; } } $r->log_reason("Client exceeded speed limit.", $r->filename) if $result == FORBIDDEN; $DB{$id} = join " ", $first, $now, $hits, $locked; tied(%DB)->shunlock; return $result; } 1; __END__ Footnotes 4 The mod_rewrite module may also be worth perusing. Its rewrite rules can be based on the User-Agent field, time of day, and other variables. 5 On systems that don't have IPC::Shareable available, a tied DBM file might also work, but you'd have to open and close it each time the module is called. This would have performance implications. A better solution would be to store the information in a DBI database, as described in Chapter 5, Maintaining State. Windows systems use a single-process server, and don't have to worry about this issue. 6 An alternative approach would be to use a PerlChildInitHandler
to tie the Copyright © 1999 by O'Reilly & Associates, Inc. |
HIVE: All information for read only. Please respect copyright! |