There are several input operators we'll discuss here because they parse as terms. Sometimes we call them pseudoliterals because they act like quoted strings in many ways. (Output operators like print parse as list operators and are discussed in Chapter 29, "Functions".)
First of all, we have the command input operator, also known as the backtick operator, because it looks like this:
A string enclosed by backticks (grave accents, technically) first undergoes variable interpolation just like a double-quoted string. The result is then interpreted as a command line by the system, and the output of that command becomes the value of the pseudoliteral. (This is modeled after a similar operator in Unix shells.) In scalar context, a single string consisting of all the output is returned. In list context, a list of values is returned, one for each line of output. (You can set $/ to use a different line terminator.)$info = `finger $user`;
The command is executed each time the pseudoliteral is evaluated. The numeric status value of the command is saved in $? (see Chapter 28, "Special Names" for the interpretation of $?, also known as $CHILD_ERROR). Unlike the csh version of this command, no translation is done on the return data--newlines remain newlines. Unlike in any of the shells, single quotes in Perl do not hide variable names in the command from interpretation. To pass a $ through to the shell you need to hide it with a backslash. The $user in our finger example above is interpolated by Perl, not by the shell. (Because the command undergoes shell processing, see Chapter 23, "Security", for security concerns.)
The generalized form of backticks is qx// (for "quoted execution"), but the operator works exactly the same way as ordinary backticks. You just get to pick your quote characters. As with similar quoting pseudofunctions, if you happen to choose a single quote as your delimiter, the command string doesn't undergo double-quote interpolation;
$perl_info = qx(ps $$); # that's Perl's $$ $shell_info = qx'ps $$'; # that's the shell's $$
The most heavily used input operator is the line input operator, also known as the angle operator or the readline function (since that's what it calls internally). Evaluating a filehandle in angle brackets (STDIN, for example) yields the next line from the associated filehandle. (The newline is included, so according to Perl's criteria for truth, a freshly input line is always true, up until end-of-file, at which point an undefined value is returned, which is conveniently false.) Ordinarily, you would assign the input value to a variable, but there is one situation where an automatic assignment happens. If and only if the line input operator is the only thing inside the conditional of a while loop, the value is automatically assigned to the special variable $_. The assigned value is then tested to see whether it is defined. (This may seem like an odd thing to you, but you'll use the construct frequently, so it's worth learning.) Anyway, the following lines are equivalent:
Remember that this special magic requires a while loop. If you use the input operator anywhere else, you must assign the result explicitly if you want to keep the value:while (defined($_ = <STDIN>)) { print $_; } # the longest way while ($_ = <STDIN>) { print; } # explicitly to $_ while (<STDIN>) { print; } # the short way for (;<STDIN>;) { print; } # while loop in disguise print $_ while defined($_ = <STDIN>); # long statement modifier print while $_ = <STDIN>; # explicitly to $_ print while <STDIN>; # short statement modifier
When you're implicitly assigning to $_ in a $_ loop, this is the global variable by that name, not one localized to the while loop. You can protect an existing value of $_ this way:while (<FH1> && <FH2>) { ... } # WRONG: discards both inputs if (<STDIN>) { print; } # WRONG: prints old value of $_ if ($_ = <STDIN>) { print; } # suboptimal: doesn't test defined if (defined($_ = <STDIN>)) { print; } # best
Any previous value is restored when the loop is done. $_ is still a global variable, though, so functions called from inside that loop could still access it, intentionally or otherwise. You can avoid this, too, by declaring a lexical variable:while (local $_ = <STDIN>) { print; } # use local $_
(Both of these while loops still implicitly test for whether the result of the assignment is defined, because my and local don't change how assignment is seen by the parser.) The filehandles STDIN, STDOUT, and STDERR are predefined and pre-opened. Additional filehandles may be created with the open or sysopen functions. See those functions' documentation in Chapter 29, "Functions" for details on this.while (my $line = <STDIN>) { print $line; } # now private
In the while loops above, we were evaluating the line input operator in a scalar context, so the operator returns each line separately. However, if you use the operator in a list context, a list consisting of all remaining input lines is returned, one line per list element. It's easy to make a large data space this way, so use this feature with care:
There is no while magic associated with the list form of the input operator, because the condition of a while loop always provides a scalar context (as does any conditional).$one_line = <MYFILE>; # Get first line. @all_lines = <MYFILE>; # Get the rest of the lines.
Using the null filehandle within the angle operator is special; it emulates the command-line behavior of typical Unix filter programs such as sed and awk. When you read lines from <>, it magically gives you all the lines from all the files mentioned on the command line. If no files were mentioned, it gives you standard input instead, so your program is easy to insert into the middle of a pipeline of processes.
Here's how it works: the first time <> is evaluated, the @ARGV array is checked, and if it is null, $ARGV[0] is set to "-", which when opened gives you standard input. The @ARGV array is then processed as a list of filenames. More explicitly, the loop:
is equivalent to the following Perl-like pseudocode:while (<>) { ... # code for each line }
except that it isn't so cumbersome to say, and will actually work. It really does shift array @ARGV and put the current filename into the global variable $ARGV. It also uses the special filehandle ARGV internally--<> is just a synonym for the more explicitly written <ARGV>, which is a magical filehandle. (The pseudocode above doesn't work because it treats <ARGV> as nonmagical.)@ARGV = ('-') unless @ARGV; # assume STDIN iff empty while (@ARGV) { $ARGV = shift @ARGV; # shorten @ARGV each time if (!open(ARGV, $ARGV)) { warn "Can't open $ARGV: $!\n"; next; } while (<ARGV>) { ... # code for each line } }
You can modify @ARGV before the first <> as long as the array ends up containing the list of filenames you really want. Because Perl uses its normal open function here, a filename of "-" counts as standard input wherever it is encountered, and the more esoteric features of open are automatically available to you (such as opening a "file" named "gzip -dc < file.gz|"). Line numbers ($.) continue as if the input were one big happy file. (But see the example under eof in Chapter 29, "Functions" for how to reset line numbers on each file.)
If you want to set @ARGV to your own list of files, go right ahead:
If you want to pass switches into your script, you can use one of the Getopt::* modules or put a loop on the front like this:# default to README file if no args given @ARGV = ("README") unless @ARGV;
The <> symbol will return false only once. If you call it again after this, it will assume you are processing another @ARGV list, and if you haven't set @ARGV, it will input from STDIN.while (@ARGV and $ARGV[0] =~ /^-/) { $_ = shift; last if /^--$/; if (/^-D(.*)/) { $debug = $1 } if (/^-v/) { $verbose++ } ... # other switches } while (<>) { ... # code for each line }
If the string inside the angle brackets is a scalar variable (for example, <$foo>), that variable contains an indirect filehandle, either the name of the filehandle to input from or a reference to such a filehandle. For example:
or:$fh = \*STDIN; $line = <$fh>;
open($fh, "<data.txt"); $line = <$fh>;
You might wonder what happens to a line input operator if you put something fancier inside the angle brackets. What happens is that it mutates into a different operator. If the string inside the angle brackets is anything other than a filehandle name or a scalar variable (even if there are just extra spaces), it is interpreted as a filename pattern to be "globbed".[19] The pattern is matched against the files in the current directory (or the directory specified as part of the fileglob pattern), and the filenames so matched are returned by the operator. As with line input, names are returned one at a time in scalar context, or all at once in list context. The latter usage is more common; you often see things like:
As with other kinds of pseudoliterals, one level of variable interpolation is done first, but you can't say <$foo> because that's an indirect filehandle as explained earlier. In older versions of Perl, programmers would insert braces to force interpretation as a fileglob: <${foo}>. These days, it's considered cleaner to call the internal function directly as glob($foo), which is probably the right way to have invented it in the first place. So instead you'd write@files = <*.xml>;
if you despise overloading the angle operator for this. Which you're allowed to do.@files = glob("*.xml");
[19]Fileglobs have nothing to do with the previously mentioned typeglobs, other than that they both use the * character in a wildcard fashion. The * character has the nickname "glob" when used like this. With typeglobs, you're globbing symbols with the same name from the symbol table. With a fileglob, you're doing wildcard matching on the filenames in a directory, just as the various shells do.
Whether you use the glob function or the old angle-bracket form, the fileglob operator also does while magic like the line input operator, assigning the result to $_. (That was the rationale for overloading the angle operator in the first place.) For example, if you wanted to change the permissions on all your C code files, you might say:
which is equivalent to:while (glob "*.c") { chmod 0644, $_; }
The glob function was originally implemented as a shell command in older versions of Perl (and in even older versions of Unix), which meant it was comparatively expensive to execute and, worse still, wouldn't work exactly the same everywhere. Nowadays it's a built-in, so it's more reliable and a lot faster. See the description of the File::Glob module in Chapter 32, "Standard Modules" for how to alter the default behavior of this operator, such as whether to treat spaces in its operand (argument) as pathname separators, whether to expand tildes or braces, whether to be case insensitive, and whether to sort the return values--amongst other things.while (<*.c>) { chmod 0644, $_; }
Of course, the shortest and arguably the most readable way to do the chmod command above is to use the fileglob as a list operator:
A fileglob evaluates its (embedded) operand only when starting a new list. All values must be read before the operator will start over. In a list context, this isn't important because you automatically get them all anyway. In a scalar context, however, the operator returns the next value each time it is called, or a false value if you've just run out. Again, false is returned only once. So if you're expecting a single value from a fileglob, it is much better to say:chmod 0644, <*.c>;
than to say:($file) = <blurch*>; # list context
because the former returns all matched filenames and resets the operator, whereas the latter alternates between returning filenames and returning false.$file = <blurch*>; # scalar context
If you're trying to do variable interpolation, it's definitely better to use the glob operator because the older notation can cause confusion with the indirect filehandle notation. This is where it becomes apparent that the borderline between terms and operators is a bit mushy:
@files = <$dir/*.[ch]>; # Works, but avoid. @files = glob("$dir/*.[ch]"); # Call glob as function. @files = glob $some_pattern; # Call glob as operator.
We left the parentheses off of the last example to illustrate that glob can be used either as a function (a term) or as a unary operator; that is, a prefix operator that takes a single argument. The glob operator is an example of a named unary operator, which is just one kind of operator we'll talk about in the next chapter. Later, we'll talk about pattern-matching operators, which also parse like terms but behave like operators.
Copyright © 2001 O'Reilly & Associates. All rights reserved.
HIVE: All information for read only. Please respect copyright! |