Writing Apache Modules with Perl and C

Writing Apache Modules with Perl and C

By:	Lincoln Stein and Doug MacEachern
Published:	O'Reilly & Associates, Inc. - March 1999

Show Contents Previous Page Next Page

Chapter 4 - Content Handlers / Apache::Registry
Apache::Registry Traps

There are a number of traps and pitfalls that you can fall into when using Apache::Registry. This section warns you about them.

It helps to know how Apache::Registry works in order to understand why the traps are there. When the server is asked to return a file that is handled by the Apache::Registry content handler (in other words, a script!), Apache::Registry first looks in an internal cache of compiled subroutines that it maintains. If it doesn't find a subroutine that corresponds to the script file, it reads the contents of the file and repackages it into a block of code that looks something like this:

 package $;
use Apache qw(exit);
sub handler {
  #line 1 $

$mangled_package_name is a version of the script's URI which has been modified in such a way as to turn it into a legal Perl package name while keeping it distinct from all other compiled Apache::Registry scripts. For example, the guestbook.cgi script shown in the last section would be turned into a cached subroutine in the package Apache::ROOT::perl::guestbook_2ecgi. The compiled code is then cached for later use.

Before Apache::Registry even comes into play, mod_perl fiddles with the environment to make it appear as if the script were being called under the CGI protocol. For example, the $ENV{QUERY_STRING} environment variable is initialized with the contents of Apache::args(), and $ENV{SERVER_NAME} is filled in from the value returned by Apache::server_hostname(). This behavior is controlled by the PerlSetupEnv directive, which is On by default. If your scripts do not need to use CGI %ENV variables, turning this directive Off will reduce memory overhead slightly.

In addition to caching the compiled script, Apache::Registry also stores the script's last modification time. It checks the stored time against the current modification time before executing the cached code. If it detects that the script has been modified more recently than the last time it was compiled, it discards the cached code and recompiles the script.

The first and most common pitfall when using Apache::Registry is to forget that the code will be persistent across many sessions. Perl CGI programmers commonly make profligate use of globals, allocate mammoth memory structures without disposing of them, and open filehandles and never close them. They get away with this because CGI scripts are short-lived. When the CGI transaction is done, the script exits, and everything is cleaned up automatically.

Not so with Apache::Registry scripts (or any other Apache Perl module, for that matter). Globals persist from invocation to invocation, big data structures will remain in memory, and open files will remain open until the Apache child process has exited or the server itself it shut down.

Therefore, it is vital to code cleanly. You should never depend on a global variable being uninitialized in order to determine when a subroutine is being called for the first time. In fact, you should reduce your dependency on globals in general. Close filehandles when you are finished with them, and make sure to kill (or at least wait on) any child processes you may have launched.

Perl provides two useful tools for writing clean code. use strict turns on checks that make it harder to use global variables unintentionally. Variables must either be lexically scoped (with my) or qualified with their complete package names. The only way around these restrictions is to declare variables you intend to use as globals at the top of the script with use vars. This code snippet shows how:

use strict;
use vars qw{$INIT $DEBUG @NAMES %HANDLES};

We have used strict in many of the examples in the preceding sections, and we strongly recommend it for any Perl script you write.

The other tool is Perl runtime warnings, which can be turned on in Apache::Registry scripts by including a -w switch on the #! line, or within other modules by setting the magic $^W variable to true. You can even enable warnings globally by setting $^W to true inside the server's Perl startup script, if there is one (see Chapter 2).

-w will catch a variety of errors, dubious programming constructs, typos, and other sins. Among other things, it will warn when a bareword (a string without surrounding quotation marks) conflicts with a subroutine name, when a variable is used only once, and when a lexical variable is inappropriately shared between an outer and an inner scope (a horrible problem which we expose in all its gory details a few paragraphs later).

-w may also generate hundreds of "Use of uninitialized value" messages at run-time, which will fill up your server error log. Many of these warnings can be hard to track down. If there is no line number reported with the warning, or if the reported line number is incorrect,² try using Perl's #line token described in the perlsyn manual page and in Chapter 9 under "Special Global Variables, Subroutines, and Literals."

It may also be helpful to see a full stack trace of the code which triggered the warning. The cluck() function found in the standard Carp module will give you this functionality. Here is an example:

use Carp ();
local $SIG{__WARN__} = \&Carp::cluck;

Note that -w checks are done at runtime, which may slow down script execution time. In production mode, you may wish to turn warnings off altogether or localize warnings using the $^W global variable described in the perlvar manpage.

Another subtle mod_perl trap that lies in wait for even experienced programmers involves the sharing of lexical variables between outer and inner named subroutines. To understand this problem, consider the following innocent-looking code:

#!/usr/local/bin/perl -w

for (0..3) {
   bump_and_print();
}

sub bump_and_print {
   my $a = 1;
   sub bump {
      $a++;
      print "In the inner scope, \$a is $a\n";
   }
   print "In the outer scope, \$a is $a\n";
   bump();
}

When you run this script, it generates the following inexplicable output:

Variable "$a" will not stay shared at ./test.pl line 12.
In the outer scope, $a is 1
In the inner scope, $a is 2
In the outer scope, $a is 1
In the inner scope, $a is 3
In the outer scope, $a is 1
In the inner scope, $a is 4
In the outer scope, $a is 1
In the inner scope, $a is 5

For some reason the variable $a has become "unstuck" from its my() declaration in bump_and_print() and has taken on a life of its own in the inner subroutine bump(). Because of the -w switch, Perl complains about this problem during the compilation phase, with the terse warning that the variable "will not stay shared." This behavior does not happen if the inner subroutine is made into an anonymous subroutine. It only affects named inner subroutines.

The rationale for the peculiar behavior of lexical variables and ways to avoid it in conventional scripts are explained in the perldiag manual page. When using Apache::Registry this bug can bite you when you least expect it. Because Apache::Registry works by wrapping the contents of a script inside a handler() function, inner named subroutines are created whether you want them or not. Hence, this piece of code will not do what you expect:

 #!/usr/local/bin/perl
use CGI qw/param header/;

 my $name = param('name');
print header('text/plain');
print_body();
exit 0;

 sub print_body {
   print "The contents of \$name is $name.\n";
}

The first time you run it, it will run correctly, printing the value of the name CGI parameter. However, on subsequent invocations the script will appear to get "stuck" and remember the values of previous invocations. This is because the lexically scoped $name variable is being referenced from within print_body(), which, when running under Apache::Registry, is a named inner subroutine. Because multiple Apache processes are running, each process will remember a different value of $name, resulting in bizarre and arbitrary behavior.

Perl may be fixed someday to do the right thing with inner subroutines. In the meantime, there are several ways to avoid this problem. Instead of making the outer variable lexically scoped, you can declare it to be a package global, as this snippet shows:

use strict;
use vars '$name';
$name = param('name');

Because globals are global, they aren't subject to weird scoping rules.

Alternatively, you can pass the variable to the subroutine as an argument and avoid sharing variables between scopes altogether. This example shows that variant:

my $name = param('name');
print_body($name);
sub print_body {
  my $name = shift;
  print "The contents of \$name is $name.\n";      
}

Finally, you can put the guts of your application into a library and use or require it. The Apache::Registry then becomes only a hook that invokes the library:

#!/usr/local/bin/perl
require "my_application_guts";
do_everything();

The shared lexical variable problem is a good reason to use the -w switch during Apache::Registry script development and debugging. If you see warnings about a variable not remaining shared, you have a problem, even if the ill effects don't immediately manifest themselves.

Another problem that you will certainly run into involves the use of custom libraries by Apache::Registry scripts. When you make an editing change to a script, the Apache::Registry notices the recent modification time and reloads the script. However, the same isn't true of any library file that you load into the script with use or require. If you make a change to a required file, the script will continue to run the old version of the file until the script itself is recompiled for some reason. This can lead to confusion and much hair-tearing during development!

You can avoid going bald by using Apache::StatINC, a standard part of the mod_perl distribution. It watches over the contents of the internal Perl %INC array and reloads any files that have changed since the last time it was invoked. Installing Apache::StatINC is easy. Simply install it as the PerlInitHandler for any directory that is managed by Apache::Registry. For example, here is an access.conf entry that installs both Apache::Registry and Apache::StatINC:

Alias /perl/ /usr/local/apache/perl/
<Location /perl>
 SetHandler      perl-script
 PerlHandler     Apache::Registry
 PerlInitHandler Apache::StatINC
 PerlSendHeader  On
 Options         +ExecCGI
</Location>

Because Apache::StatINC operates at a level above the level of individual scripts, any nonstandard library locations added by the script with use lib or by directly manipulating the contents of @INC will be ignored. If you want these locations to be monitored by Apache::StatINC, you should make sure that they are added to the library search path before invoking the script. You can do this either by setting the PERL5LIB environment variable before starting up the Apache server (for instance, in the server startup script), or by placing a use lib line in your Perl start-up file, as described in Chapter 2.

When you use Apache::StatINC, there is a slight overhead for performing a stat on each included file every time a script is run. This overhead is usually immeasurable, but it will become noticeable on a heavily loaded server. In this case, you may want to forego it and instead manually force the embedded Perl interpreter to reload all its compiled scripts by restarting the server with apachectl. In order for this to work, the PerlFreshRestart directive must be turned on in the Apache configuration file. If you haven't done so already, add this line to perl.conf or one of the other configuration files:

PerlFreshRestart On

You can try reloading compiled scripts in this way whenever things seem to have gotten themselves into a weird state. This will reset all scripts to known initial settings and allow you to investigate problems systematically. You might also want to stop the server completely and restart it using the -X switch. This forces the server to run as a single process in the foreground. Interacting with a single process rather than multiple ones makes it easier to debug misbehaving scripts. In a production environment, you'll want to do this on a test server in order to avoid disrupting web services.

Footnotes

² Certain uses of the eval operator and "here" documents are known to throw off Perl's line numbering.

Show Contents Previous Page Next Page

HIVE: All information for read only. Please respect copyright!