/usr/share/doc/lire/dev-manual/ch02s05.html

<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>The Meta-Data Methods</title><meta name="generator" content="DocBook XSL Stylesheets V1.75.2"><link rel="home" href="index.html" title="Lire Developer's Manual"><link rel="up" href="ch02.html" title="Chapter 2. Writing a New DLF Converter"><link rel="prev" href="ch02s04.html" title="Adding a Constructor"><link rel="next" href="ch02s06.html" title="Registering Your DLF Converter with the Lire Framework"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">The Meta-Data Methods</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="ch02s04.html">Prev</a> </td><th width="60%" align="center">Chapter 2. Writing a New DLF Converter</th><td width="20%" align="right"> <a accesskey="n" href="ch02s06.html">Next</a></td></tr></table><hr></div><div class="section" title="The Meta-Data Methods"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id402617"></a>The Meta-Data Methods</h2></div></div></div><p>The <span class="interface">Lire::DlfConverter</span> interface
          requires two kinds of methods. First, it requires methods
          which provide information to the framework on your
          converter. Second, it requires methods which will actually
          implement the conversion process. It this the format that
          this section documents.
        </p><div class="section" title="The DLF Converter Name"><div class="titlepage"><div><div><h3 class="title"><a name="id402633"></a>The DLF Converter Name</h3></div></div></div><p>The method <code class="methodname">name()</code> should
            returns the name of our DLF converter. It is this name
            that is passed to the <span class="command"><strong>lr_log2report</strong></span>
            command. This name must be unique among all the converters
            registered and it should be restricted to alphanumerical
            characters (hyphens, period and underscores can also be
            used).
          </p><p>We will name our converter
            <code class="literal">common_syslog</code>:
            </p><pre class="programlisting">


sub name {
    return "common_syslog";
}            

            </pre><p>
          </p></div><div class="section" title="Providing Information To Users"><div class="titlepage"><div><div><h3 class="title"><a name="id402670"></a>Providing Information To Users</h3></div></div></div><p>The next two required methods are used to give more
            verbose information on your converter to the users. The
            converter's <code class="methodname">title()</code> and
            <code class="methodname">description()</code> can be use to
            display information about your converter from the user
            interface or to generate documentation.
          </p><p>The <code class="methodname">title()</code> should simply
            returns a string:
            </p><pre class="programlisting">

sub title {
    return "Common Log Format embedded in Syslog DLF Converter";
}

            </pre><p>
          </p><p>The <code class="methodname">description()</code> method
            should returns a <span class="application">DocBook</span>
            fragment describing your converter and the log formats it
            support. If you don't know
            <span class="application">DocBook</span> just restrict yourself
            to using the <code class="sgmltag-element">para</code> elements to make
            paragraphs:
            </p><pre class="programlisting">

sub description {
   return &lt;&lt;EOD;
&lt;para&gt;This DLF Converter extracts web server's requests and error
information from a syslog file. 
&lt;/para&gt;
&lt;para&gt;The requests and errors should be logged under the
&lt;literal&gt;httpd&lt;/literal&gt; program name. The errors are mapped to the
&lt;type&gt;syslog&lt;/type&gt; schema, the requests are mapped to the
&lt;type&gt;www&lt;/type&gt; schema.
&lt;/para&gt;
&lt;para&gt;Syslog records from another program than
&lt;literal&gt;httpd&lt;/literal&gt; are ignored.
&lt;/para&gt;
EOF
}

            </pre><p>
          </p></div><div class="section" title="Providing Information to the Framework"><div class="titlepage"><div><div><h3 class="title"><a name="id402727"></a>Providing Information to the Framework</h3></div></div></div><p>Two other meta-data methods are used by the framework
            itself. The first one specifies to what DLF schemas your
            DLF converter is converting to:

            </p><pre class="programlisting">

sub schemas {
    return ( "www", "syslog" );
}

            </pre><p>

            In our case, we are converting to the <span class="type">syslog</span>
            and <span class="type">www</span> schemas. Like we described it in our
            converter's description, we will map the web server's
            error message to the <span class="type">syslog</span> schema and the
            request logs to the <span class="type">www</span> schema. Other
            alternatives would have been to only map the requests
            information to <span class="type">www</span> schema or map all the
            non-request records to the <span class="type">syslog</span> schema.
            The rationale behind the current choice (besides this
            being an example) is that it make it convenient to process
            one log file to obtain a report containing the requests
            and errors from our web server. For that use case, it is
            best to ignore the non-web server related stuff.
          </p><p>The other method affects how the conversion process
            will be handled. <span class="application">Lire</span> offers two mode of conversion, the
            line oriented one and the file oriented one. (Both will be
            described in the next section). If your log file is
            line-oriented (each lines is one log record) like most log
            files are, you should use the line-oriented conversion
            mode:
            </p><pre class="programlisting">

sub handle_log_lines {
    return 1;
}

            </pre><p>
          </p></div><div class="section" title="The Conversion Methods"><div class="titlepage"><div><div><h3 class="title"><a name="id402797"></a>The Conversion Methods</h3></div></div></div><p>The actual conversion process is handled through three
            methods: <code class="methodname">init_dlf_converter</code>,
            <code class="methodname">finish_conversion()</code> and either
            <code class="methodname">process_log_file()</code> or
            <code class="methodname">process_log_line()</code> depending on
            the conversion mode (as determined by
            <code class="methodname">handle_log_lines()</code>'s return value.
          </p><div class="section" title="Conversion Initialization"><div class="titlepage"><div><div><h4 class="title"><a name="id402823"></a>Conversion Initialization</h4></div></div></div><p>The method
              <code class="methodname">init_dlf_converter()</code> will be
              called once before the log file is processed. It should
              be use to initialize the state of your converter. Since
              our DLF Converter doesn't need any initialization and doesn't
              need any configuration, the method is simply empty:

              </p><pre class="programlisting">

sub init_dlf_converter {
    my ( $self, $process ) = @_;

    return;
}

              </pre><p>
            </p><p>The <code class="varname">$process</code> parameter which is
              passed to all the processing methods is an instance of
              <code class="classname">Lire::DlfConverterProcess</code>. This
              is the object which is driving the conversion process
              and it defines several methods which you will use in the
              actual conversion process.
            </p></div><div class="section" title="Conversion Finalization"><div class="titlepage"><div><div><h4 class="title"><a name="id402856"></a>Conversion Finalization</h4></div></div></div><p>The method
              <code class="methodname">finish_conversion()</code> will be
              called once after the log file has been completely
              processed. This method will be mostly of use to stateful
              converter, that is DLF converters which generates DLF
              records from more than one line. Since this is not our
              case, we simply leave the method empty:

              </p><pre class="programlisting">

sub finish_conversion {
    my ( $self, $process ) = @_;

    return;
}

              </pre><p>
            </p></div><div class="section" title="The DLF Conversion Process"><div class="titlepage"><div><div><h4 class="title"><a name="id402880"></a>The DLF Conversion Process</h4></div></div></div><p>Whether you are using the file-oriented or
              line-oriented conversion mode, the principles are the
              same. You extract information from the log file and
              creates DLF records from it. Your DLF converter
              communicates with the framework by calling methods on
              the <code class="classname">Lire::DlfConverterProcess</code>
              object which is passed as parameter to your methods.
            </p><p>Here is the complete code of our conversion method:
              </p><pre class="programlisting">

use Lire::Apache qw/parse_common/;

sub process_log_line {
    my ( $self, $process, $line ) = @_;

    my $sys_rec = eval { $self-&gt;{syslog_parser}-&gt;parse( $line ) };
    if ( $@ ) {
        $process-&gt;error( $@, $line );
        return;
    } elsif ( $sys_rec-&gt;{process} ne 'httpd' ) {
        $process-&gt;ignore_log_line( $line, "not an httpd record" );
        return;
    } else {
        my $common_dlf = {};
        eval { parse_common( $sys_rec-&gt;{content}, $common_dlf ) };
        if ( $@ ) {
            $sys_rec-&gt;{message} = $sys_rec-&gt;{content};
            $process-&gt;write_dlf( "syslog", $sys_rec );
        } else {
            $process-&gt;write_dlf( "www", $common_dlf );
        }
    }
    
}


              </pre><p>
            </p><p>The first thing that should be noted is that in the
              line-oriented conversion mode, the method
              <code class="methodname">process_log_line()</code> will be
              called once for each line in the log file.
            </p><p>Secondly, the actual parsing of the line is done
              using two functions: <code class="function">parse_common</code>
              and <code class="classname">Lire::Syslog</code>'s
              <code class="methodname">parse</code>. These methods simply
              uses regular expressions to extract the appropriate
              information from the line and put it in an hash
              reference. What is important is that these methods
              already uses as key names the schema's field names.
            </p><p>Finally, you can see that there are four different
              methods used on the <code class="varname">$process</code> object to
              report different kind of information:

              </p><div class="variablelist"><dl><dt><span class="term">Reporting Error</span></dt><dd><p>The example uses the
                      <code class="function">eval</code> statement to trap
                      errors during the syslog record parsing. If the
                      line cannot be parsed as a valid syslog record,
                      it is an error and it is reported through the
                      <code class="methodname">error()</code> method. The
                      first parameter is the error message and the
                      second one is the line to which the error is
                      associated. This last parameter is optional.
                    </p></dd><dt><span class="term">Ignoring Information</span></dt><dd><p>When the syslog event doesn't come from the
                      <span class="command"><strong>httpd</strong></span> process, we ignore the
                      line. Ignored line are reported to the framework
                      by using the
                      <code class="methodname">ignore_log_line()</code>
                      method. The first parameter is the line which is
                      ignored. The second optional parameter gives the
                      reason why the line was ignored.
                    </p></dd><dt><span class="term">Creating DLF Records</span></dt><dd><p>Finally, DLF records are created by using
                      the <code class="methodname">write_dlf()</code> method.
                      Its first parameter is the schema to which the
                      DLF record complies. This schema must be one
                      that is listed by your converter's
                      <code class="methodname">schemas()</code> method. The
                      second parameter is the DLF data contained in an
                      hash reference. The DLF record will be created
                      by taking for each field in the schema the value
                      under the same name in the hash. (Since in the
                      <span class="type">syslog</span> schema, the field which
                      contains the actual log message is called
                      <em class="structfield"><code>message</code></em>, this is the
                      reason we
                      are assigning the <span class="property">content</span>
                      value to the <span class="property">message</span> key.)
                      Missing fields
                      or fields whose value is
                      <code class="literal">undef</code> will contains the
                      special <code class="literal">LR_NA</code> missing value
                      marker. Keys in the hash that don't map to a
                      schema's field are simply ignored.
                    </p><p>In our example, we distinguish between the
                      server's error message (mapped to the
                      <span class="type">syslog</span> schema) and the request
                      information (mapped to the <span class="type">www</span>
                      schema) based on whether
                      <code class="function">parse_common</code> succeeded in
                      parsing the line.
                    </p></dd><dt><span class="term">Saving Log Line</span></dt><dd><p>Another possibility, not shown in our
                      example, is to ask that the line be saved for a
                      later processing. This is mostly of use to
                      converters who maitains state between lines. In
                      the cases, it is quite the case that there are
                      related lines that are missing from the end of
                      the log file. In that case, you save the line
                      and they will automatically seen by the next run
                      of your converter on the same DLF store. This
                      option is only available in the line-oriented
                      mode of conversion.
                    </p></dd></dl></div><p>
            </p><div class="section" title="File-Oriented Conversion"><div class="titlepage"><div><div><h5 class="title"><a name="id403097"></a>File-Oriented Conversion</h5></div></div></div><p>The same principles apply when you are using the
                file-oriented mode of conversion. This mode will
                usually be used for binary log formats or format which
                aren't line-oriented like XML. 
              </p><p>For demonstration purpose, the following code could be
                added to transform our line-oriented converter into a
                file-oriented one:

                </p><pre class="programlisting">

sub handle_log_lines { 
    return 0;
}

sub process_log_file {
    my ( $self, $process, $fh ) = @_;
    
    my $line;
    while ( defined( $line = &lt;$fh&gt; ) {
        chomp $line;
        $self-&gt;process_log_line( $process, $line );
    }
}


                </pre><p>
              </p><p>The difference between the above code and using
                the line oriented mode is that the framework won't be
                aware of the number of log lines processed and your
                converter might have troubles when processing log
                files which uses a different line-ending convention
                than the host you are runnig on. Bottom line is that
                you should use the line-oriented conversion mode when
                your log format is line oriented.
              </p></div></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="ch02s04.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="ch02.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="ch02s06.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Adding a Constructor </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> Registering Your DLF Converter with the <span class="application">Lire</span> Framework</td></tr></table></div></body></html>
lire-devel-doc 2:2.1.1-2.1 / usr / share / doc / lire / dev-manual / ch02s05.html