Take A Few Minutes to Learn - PERL


The Few Minutes to Learn does NOT include installation of tools needed to edit or run these scripts. Once these tools are installed, start your timer...

You may copy and paste the code to your own files and test it yourself. Expand on these basic concepts, one step at a time, to create more complex scripts.

Each section contains a fully formed code segment. No additions are necessary, though you may test your own additions within this basic framework. You may combine code segments by starting with the Basic PERL framework and copying in the code found in other code segments, where appropriate.


Basic PERL Framework

Regular Expressions

Subroutines & Packages

Simple Report Formats

Compound Report Formats
Resources

Searchable PERL technical reference manual

List of PERL Resources

THE PERL Home Page

Barry B. Floyd - PERL book marks

To interpret any of the scripts, type the command:

    perl script.pl 
where script.pl is the name of your file containing the PERL code.

In general, PERL has a syntax similar in many ways to the C programming language. If you are familiar with C, learning PERL will be easier. However, there are many significant differences and some additions to the language, as well.

NOTE: the pasted code may have a space inserted at the beginning of each line in your file. This will not cause problems, EXCEPT for the DOT . at the end of each report format block. You must make sure the DOT is in the first column after copying and pasting the code to your file.


Basic PERL Framework

 

#!/usr/local/bin/perl
#
# Read in the data file
# Print out HTML formatted lines
#

$file = 'data.txt' ;                  # Name the file
open(INFO, "<$file" ) ;               # Open the file
@lines = <INFO> ;                     # Read it into an array
close(INFO) ;                         # Close the file

print "<HTML> <HEAD> <TITLE> PERL output </TITLE> </HEAD>\n" ;
print "  <BODY>\n" ;

foreach $line (@lines)                # assign @lines to $line, one at a time
{                                     # braces {} are required, bracket code
   print "\n   <P> $line   </P>" ;    # print formatted lines to screen 
}

print "\n  </BODY>\n</HTML>\n" ;

#
# DONE
#
      

data.txt


This is the first line of text.
And a second line of text.
--------- 
A third line of text.
Now a fourth line of text.
--------- 
The fifth line of text.
Finally, the sixth line of text.



This code segment starts with a comment block, each line preceded by a # sign. The line #!/usr/local/bin/perl is not required, but if set to the correct path, will enable you to run your scripts by simply typing the script's file name (instead of perl filename.pl). This is system dependent.

Each statement must end with a ; except for the program flow control statements. They are followed by the open and close braces { }.

The second sub-block demonstrates how simple it is to open a file and read data into an array. The open statement defines a file handle INFO and specifies the source as the name stored in the variable $file. The < sign indicates that the file is opened for input. A > sign would have indicated the file is opened for output/overwrite. A >> sign would have indicated the file is opened for output/append. The next line assigns the content of the file to an array variable @lines - with the 'end of line' character \n used as the default record separator.

There are three types of variables used in PERL, two of which are demonstrated here. Variables preceded by a $ may contain one value of any type (e.g. int, char, string, etc.). Variables preceded by a @ may contain one or more values of any type (e.g. int, char, string, etc.) stored as a one dimensional array. Variables preceded by a % may contain one or more values of any type (e.g. int, char, string, etc.) stored effectively as a two-dimensional associative array - where the 1st value of each pair is the key or index to the second value (e.g. use the associative array ('red', 0x00f, 'blue', 0x0f0, 'green', 0xf00...) to lookup 'red' and get 0x00f).

The sub-block of code after the print statements demonstrates one of the several program flow control statements (e.g. for and if). The foreach statement successively takes each value in the array @lines and assigns it to the variable $line. The code associated with this, and other, control statements is contained within the open and close braces { }. The print statement displays the text found between the double-quotes ". PERL scans the text for special characters and interprets them and the related variable names. Thus, the variable $line is expanded to display its contents, not the literal text string $line. NOTE: If you change the value of $line you also change the corresponding value in the array @lines.




Return to top




Regular Expressions


#!/usr/local/bin/perl
#
# Read in the data file
# Print out a HTML formatted lines
#

$file = 'data.txt' ;                  # Name the file
open(INFO, "<$file") ;                # Open the file
@lines = <INFO> ;                     # Read it into an array
close(INFO) ;                         # Close the file

$long_line = join("\n", @lines ) ;    # Convert array to variable
                                      # retaining end-of-line characters

@dash_sep = split ( /---------/, $long_line ) ; # split long_line at dashes

print "<HTML> <HEAD> <TITLE> PERL output </TITLE> </HEAD>\n" ;
print "  <BODY>\n" ;

foreach $line (@dash_sep)             # assign @dash_sep to $line, one at a time
{ 
  $line =~ s/line of text./text fragment. /g ; # substitute text
  $line =~ s/\n//g ;                           # remove end of line characters

 if ($line =~ /^A third text/ )                # test for 3rd text fragment
 {
    print "    <HR> <BR> \n" ;
 }

  print "    <P> $line </P> \n" ;
}

print "  </BODY>\n</HTML>\n" ;

#
# DONE
#



This code segment demonstrates how easy it is to convert from one type of variable to another. The $long_line one-value variable is assigned the contents of the many-valued variable @lines. Where @lines was delimited by the end-of-line character \n, the @dash_sep variable contains a --------- delimited list. The split command introduces the much used concept of regular expressions.

Regular expressions can make your average PERL script cryptic beyond compare to any other language (except maybe APL). A whole book could be written about regular expressions - but not here. A basic regular expression contains two or three forward-slashes /. These may be preceded by a simple (often one-letter) command, and may be followed by one or more modifiers (again, often one-letter). Between the first and second forward-slash / you will find the before stuff. Between the second and third forward-slash / you will find the after stuff. Within the parens of the split command, a regexp with one pair of / forward slashes is used. This pair is implicitly preceded by the match command (i.e. m/-----/g). One may also precede a regexp with the translate command, which changes all occurances of one (and only one) character to another.

Where ever a variable is being assigned from a regular expression the =~ sign is used (e.g. the assignment of a value to $line after the substitution s/line of text/text fragment/g has taken place). The s/ is the substitution command for the regular expression. The /g is the global modifier (e.g. at all occurances of ...).

The if statement demonstrates comparison using regular expressions. Only two forward-slashes are used. The ^ character following the first forward-slash is one of many special characters used in regular expressions. This one means at the beginning of the line. Thus, if the text "A third line" occurs at the beginning of the line $line then print "<HR> <BR> \n";




Return to top




Subroutines & Packages


#!/usr/local/bin/perl
#
# Declare and Define subrouting MY_SUB
# call MY_SUB three times

#
# subroutine MY_SUB
#
sub MY_SUB
{
  local ( $param_1, $param_2 ) = @_ ;
  print "\n Called MY_SUB \n \t $param_1 \n \t $param_2 \n" ;
}

#
# MAIN part of the program 
#
for ( $i=0; $i<3; $i++ )
{
  &MY_SUB ( "text", 123 ) ;
}

#
# DONE
#

Output from running the script
 
 Called MY_SUB 
         text 
         123 

 Called MY_SUB 
         text 
         123 

 Called MY_SUB 
         text 
         123 


This code segment demonstrates the declaration, definition and calling of the PERL subroutine MY_SUB. The reserved word sub followed by a user-provided name (i.e. MY_SUB) marks the beginning of a subroutine definition. The subroutine code is enclosed in a pair of { braces }.

Within the main part of the program, MY_SUB is called with a list of parameters enclosed in ( paren's ). These are stored in the special variable @_, an array containing the values in the order they are listed. Like any array, you may also address the elements in @_ using [ bracketed ] indexes into the array (e.g. $_[1] is the single value of the array @_ stored in position one).

The reserved word local is introduced in subroutine MY_SUB. local enables you to protect variables from being modified outside of the subroutine. Thus, the main part of the program (or another subroutine) could define $param_1 without effecting the values stored in MY_SUB's local instance of $param_1.


From the PERL manual pages.

To declare subroutines:
sub NAME; # A "forward" declaration.
sub NAME(PROTO); # ditto, but with prototypes
sub NAME BLOCK # A declaration and a definition.
sub NAME(PROTO) BLOCK # ditto, but with prototypes

To define an anonymous subroutine at runtime:
$subref = sub BLOCK;

To import subroutines:
use PACKAGE qw(NAME1 NAME2 NAME3);

To call subroutines:
NAME(LIST); # & is optional with parens.
NAME LIST; # Parens optional if predeclared/imported.
&NAME; # Passes current @_ to subroutine.

Packages perform a similar task to the reserved word local. A package is simply a set of PERL statements assigned to a user-defined name space. In simple PERL programs, the default and often the only name space is internally called main.




Return to top




Simple Report Formats


#!/usr/local/bin/perl
#
# Open a file create/write
# Print out a simple formatted report
#

$col_1="column one text 789_12345" ;
$col_2="column two text 789_12345" ;
$col_3=123456789.12345 ;

$note_1 = "Note 1: The code segment demonstrates the method of generating a formatted report using several variables containing text and numbers. " ;

$note_2 = "Note 2: The code segment demonstrates the method of generating a formatted report using several variables containing text and numbers. " ;

$note_3 = "Note 3: The code segment demonstrates \n the method of generating a \n formatted report using several \n variables containing text \n and numbers. " ;

open ( REPORT_FILE, ">report.txt" ) || die " report.txt $! \n " ;

# a dot marks the end of the format
format REPORT_FILE =
123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456
+--------------------------------------------------------------------------+
|                               Top of Report                              |
+--------------------------------------------------------------------------+
|Column One              |Column Two              |Column Three            |
+------------------------+------------------------+------------------------+
|                        |                        |                        |
| @<<<<<<<<<<<<<<<<<<<<< | @<<<<<<<<<<<<<<<<<<<<< | @##################.## |
$col_1, $col_2, $col_3
| @<<<<<<<<<<<<<<<<<<<<< | @<<<<<<<<<<<<<<<<<<<<< | @##################.## |
$col_1, $col_2, $col_3
| @<<<<<<<<<<<<<<<<<<<<< | @<<<<<<<<<<<<<<<<<<<<< | @##################.## |
$col_1, $col_2, $col_3
|                        |                        |                        |
+------------------------+------------------------+------------------------+
           ^|||||||||||||||||||||||||||||||||||||||
           $note_1
 ~         ^|||||||||||||||||||||||||||||||||||||||
           $note_1
 ~         ^|||||||||||||||||||||||||||||||||||||||
           $note_1
 ~         ^|||||||||||||||||||||||||||||||||||||||
           $note_1
 ~         ^|||||||||||||||||||||||||||||||||||||||
           $note_1

 ~~        ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
           $note_2

           @*
           $note_3

+--------------------------------------------------------------------------+
.
# make sure the dot is in column one

write REPORT_FILE ;

print "\n\nALL DONE\n\n" ;
#
# DONE
#



The code segment demonstrates a method of generating a formatted report using several variables containing text and numbers.

An output file is opened for writing and assigned the file handle REPORT_FILE. The "|| die..." statement reads "or die and print ...". In this case, if the output file report.txt can not be opened for writing then the name of the file is printed to the STDERR (screen) device followed by the value of the special variable $! (error number or error string).

The formatting of a simple report begins with the statement format FILEHANDLE= and ends with a dot (period) on its own line then followed by the statement write FILEHANDLE. Everything in between is either text to be printed or format lines followed by variable lists.

The format lines contain fields starting with one of three symbols:

    @   (regular field), 
    @*  (multi-line field), 
    ^   (filling fields). 
    
These start of field symbols are followed by one of five formatting symbols, each representing one character of text to be displayed:
    <   (left justified), 
    >   (right justified), 
    |   (centered), 
    #   (numbers),
    .   (decimal placement).
    

$note_1 is spread out over three formatted lines, with breaks at word boundaries. The leading ~ symbol causes blank lines to be omitted, if there is more formatted space than characters to display. If there are more characters to display than alloted formatted space then the excess characters are omitted from the display. Otherwise, this text will be centered.

$note_2 is spread out over as many formatted lines as needed, with breaks at word boundaries. The leading ~~ symbols causes just enough formatted space to be created for the amount of text to be displayed. The format line is repeated as needed. This text will be centered as well.

$note_3 is spread out over lines, separated at the embedded \n new line special character. However, the hanging indent found in displaying $note_1 and $note_2 is not found in the display of $note_3. Only the first line is indented.





Return to top




Compound Report Formats


#!/usr/local/bin/perl
#
# Open a file create/write
# Print out a compound formatted report
#

$| = 1 ;  # if non-zero, flags the buffer to clear after each print

$col_1  = "column one text 789_12345" ;
$col_2  = "column two text 789_12345" ;
$col_3  = 123456789.12345 ;
$note_1 = "Note 1: The code segment demonstrates the method of generating a formatted compound report using several variables containing text and numbers. " ;
$name   = "Name of Report" ;

# extract each element of the time array and place in a formatted string
@t = localtime($^T) ;            # creates a 9 element array of time info.
@d = ("Sun.", "Mon.", "Tues.", "Wed.", "Thur.", "Fri.", "Sat.") ;
$m = "AM" ;
if ( $t[2] >= 12 ) { $t[2] -= 12 ; $m = "PM" ; }
$the_date = "$d[$t[6]] $t[4]/$t[3]/$t[5] $t[2]:$t[1]:$t[0] $m" ;

open ( REPORT, ">report.txt" ) || die " report.txt $! \n " ;

#------------------- 
# a dot in the 1st column marks the end of the format
#------------------- 
format TOP =
+--------------------------------------------------------------------------+
| @<<<<<<<<<<<<<<<<<<<<<<<<<   @|||||||||||||||||||||||||||   Page: @>>>>> |
$the_date, $name, $%
+--------------------------------------------------------------------------+
.

#-------------------
# a dot in the 1st column marks the end of the format
#------------------- 
format HEAD =
+--------------------------------------------------------------------------+
|Column One              |Column Two              |Column Three            |
+------------------------+------------------------+------------------------+
.

#-------------------
# a dot in the 1st column marks the end of the format
#------------------- 
format SUMMARY =
+------------------------+------------------------+------------------------+
|                             The sub-total equals: @##################.## |
$sub_total
.

#-------------------
# a dot in the 1st column marks the end of the format
#------------------- 
format DATA =
| @<<<<<<<<<<<<<<<<<<<<< | @<<<<<<<<<<<<<<<<<<<<< | @##################.## |
$col_1, $col_2, $col_3
.

#-------------------
# a dot in the 1st column marks the end of the format
#------------------- 
format NOTES =
|                           The grand total equals: @##################.## |
$grand_total
+--------------------------------------------------------------------------+
|  ~~        ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<                      |
$note_1
+--------------------------------------------------------------------------+
.


$old_handle = select(REPORT); #in this case $old_handle = STDOUT

$^ = TOP;             # sets "TOP" of page format

for ($i=0; $i<5; $i++)
{
  $~ = HEAD;          # sets current format
  write;              # write formatted header

  $sub_total = 0;

  $~ = DATA;          # sets current format

  for ($j=0; $j<10; $j++)
  {
    write;            # write formatted data
    $sub_total += $col_3 ;
  }

  $grand_total += $sub_total ;

  $~ = SUMMARY ;      # sets current format
  write;              # write formatted summary
}

$~ = NOTES ;          # sets current format
write ;               # write formatted notes

select($old_handle) ; # in this case, STDOUT
print "\n ALL DONE \n\n" ;

#
# DONE
#



This code segment demonstrates a method of generating a formatted compound report using several variables and several formats. A number of special variables are also employed in this report as well as the localtime function, one of many standard functions.

Note that the file handle REPORT and the format names (e.g. TOP, HEAD, ...) are not the same, as found in the Simple Report Format. With multiple formats associated with one report (i.e. file handle) you must specify the current format and select the default file handle (see the 4th and 5th paragraphs for details).

The $| special variable is either zero or non-zero. If $| is set to a non-zero value then the print buffer is cleared after each print statement. The $% special variable, found in the TOP format, stores the current printed page number, with a default of 60 lines per page. The $^T special variable stores the current time. The $^ special variable stores the "top of page" format name. The $~ special variable stores the format name, separate from the "top of page" format name.

The five lines of code following the # extract each element comment establish the current local date/time that the report was generated and places it in a formatted string. The [ ] square brackets enclose the numbered array position of an array element. The -= arithmetic operator subtracts the right value from the variable and assigns the result to the variable.

The $old_handle = select(REPORT); statement sets $old_handle to the current file handle number before establishing the new file handle (i.e. REPORT). This lets the programmer save and later reset the file handle. The select function's returned value (the old file handle) can be ignored, thus using the select to only select a new file handle (e.g. select($old_handle);) as found towards the end of the code segment).

The nested for loops provide a simple framework to create a compound report, with sub-totals and grand totals. The sets of $~ special variables and write statements perform the function of switching between the previously defined formats and printing the formatted text (saved to report.txt). The currently stored variable's value (e.g. $sub_total), as referenced within a format, is printed for each occurance of the write statement.





Return to top




BreBru.Com Extra Information Technology HTML