/usr/share/doc/lire/dev-manual/ch04.html

<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Chapter 4. Writing a New DLF Analyser</title><meta name="generator" content="DocBook XSL Stylesheets V1.75.2"><link rel="home" href="index.html" title="Lire Developer's Manual"><link rel="up" href="pt02.html" title="Part II. Using the Lire Framework"><link rel="prev" href="ch03.html" title="Chapter 3. Writing a DLF Schema"><link rel="next" href="ch04s02.html" title="Writing an Analyser"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter 4. Writing a New DLF Analyser</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="ch03.html">Prev</a> </td><th width="60%" align="center">Part II. Using the <span class="application">Lire</span> Framework</th><td width="20%" align="right"> <a accesskey="n" href="ch04s02.html">Next</a></td></tr></table><hr></div><div class="chapter" title="Chapter 4. Writing a New DLF Analyser"><div class="titlepage"><div><div><h2 class="title"><a name="chap:writing-analyser"></a>Chapter 4. Writing a New DLF Analyser</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="ch04.html#sect:writing-categoriser">Writing a Categoriser</a></span></dt><dd><dl><dt><span class="section"><a href="ch04.html#id403947">Defining The Extended Schema </a></span></dt><dt><span class="section"><a href="ch04.html#id404012">Defining the Categoriser</a></span></dt><dt><span class="section"><a href="ch04.html#id404050">Categoriser Configuration</a></span></dt><dt><span class="section"><a href="ch04.html#id404101">Categoriser Implementation</a></span></dt></dl></dd><dt><span class="section"><a href="ch04s02.html">Writing an Analyser</a></span></dt><dt><span class="section"><a href="ch04s03.html">DLF Analyser API</a></span></dt></dl></div><p>In <span class="application">Lire</span>, a DLF Analyser is a plugin that can extract or
        derived data from other DLF data. The idea is that these
        analysis do not depends on the underlying log format but that
        it can be found simply by using the data normalised in the DLF
        schema. 
      </p><p>For example, an analyser could assign category based
        on the url that was visited (like assigning the 'Public' or
        'Private' category). This categorising operation doesn't
        depends on the log format but only on the presence of the
        <em class="structfield"><code>requested_page</code></em> field in the schema.
        This would be an example of a special kind of analyser, a Lire
        DLF Categoriser. This is a simpler analyser that can create
        new fields based on one DLF record.
      </p><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The <code class="filename">doc/examples</code> in the source
          distribution contains the complete code for this categoriser.
        </p></div><p>There is a more generic kind of analysers that create data
        in another dlf streams based on arbitrary queries on the
        source DLF schema. An example of this kind is an analyser that
        construct session summary from the www requests. It reads the
        DLF records of the <span class="type">www</span> DLF schema and creates
        <span class="type">www-user_session</span> DLF records from that.
      </p><p>Writing an analyser is similar to writing a DLF converter,
      so consult <a class="xref" href="ch02.html" title="Chapter 2. Writing a New DLF Converter">Chapter 2, <i>Writing a New DLF Converter</i></a> for the  details converning
        registration and using configuration.
      </p><div class="section" title="Writing a Categoriser"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sect:writing-categoriser"></a>Writing a Categoriser</h2></div></div></div><p>The simplest form of analyser are categorisers. In this
          section, we will show an example of how to write a
          categoriser that can assign categories using regular
          expressions to each <span class="type">www</span> requested page.
        </p><div class="section" title="Defining The Extended Schema"><div class="titlepage"><div><div><h3 class="title"><a name="id403947"></a>Defining The Extended Schema </h3></div></div></div><p>A categoriser writes DLF in an extended schema. An
            extended schemas is an extension of a base schema. If you
            are familiar with SQL you can see it as an inner join with
            the main schema. That is each fields in the main schema
            will have the extension fields of the extended schema.
          </p><p>In our case our extended schema is very simple, it
            only adds one <em class="structfield"><code>category</code></em> field to
            the <span class="type">www</span> schema.
          </p><p>Defining an extended schema is identical to writing a
          DLF Schema with exception that we use a different top-level
          element. You should consult <a class="xref" href="ch03.html" title="Chapter 3. Writing a DLF Schema">Chapter 3, <i>Writing a DLF Schema</i></a> for all the details.
          Here is the extended schema that our categoriser will use:
            
            </p><pre class="programlisting">

&lt;?xml version="1.0"?&gt;
&lt;!DOCTYPE lire:extended-schema PUBLIC
  "-//LogReport.ORG//DTD Lire DLF Schema Markup Language V1.1//EN"
  "http://www.logreport.org/LDSML/1.1/ldsml.dtd"&gt;
&lt;lire:extended-schema id="www-category" base-schema="www" 
 xmlns:lire="http://www.logreport.org/LDSML/"&gt;

 &lt;lire:title&gt;Category Extended Schema for WWW service&lt;/lire:title&gt;

 &lt;lire:description&gt;
  &lt;para&gt;This is an extended schema for the WWW service which adds a
    category field based on the regexp matched by the requested_page.
  &lt;/para&gt;
 &lt;/lire:description&gt;

 &lt;lire:field name="category" type="string" label="Category"&gt;
  &lt;lire:description&gt;
   &lt;para&gt;This fields contain the page category.&lt;/para&gt;
  &lt;/lire:description&gt;
 &lt;/lire:field&gt;
&lt;/lire:extended-schema&gt;

            </pre><p>
          </p><p>The difference with a regular DLF schema is that it
            starts with the <code class="sgmltag-element">extended-schema</code> tag
            which has a <code class="sgmltag-attribute">base-schema</code> attribute which
            should contain the DLF schema or derived DLF schema that
            is extended.
          </p></div><div class="section" title="Defining the Categoriser"><div class="titlepage"><div><div><h3 class="title"><a name="id404012"></a>Defining the Categoriser</h3></div></div></div><p>Like a DLF Converter, the categoriser  s an object
            deriving from a base class which defines the categoriser
            interface. In the categoriser case, that interface is
            <code class="interfacename">Lire::DlfCategoriser</code>. 
            The categoriser also has to provide some meta-information
            to the framework. Here is the code for all of this:

            </p><pre class="programlisting">

package MyAnalysers::PageCategoriser;

use base qw/Lire::DlfCategoriser/;

sub new {
    return bless {}, shift;
}

sub name {
    return 'page-categoriser';
}

sub title {
    return "A page categoriser";
}

sub description {
    return "&lt;para&gt;A categoriser that assigns categories based on a map
    of regular expressions to categories.&lt;/para&gt;";
}

sub src_schema {
    return "www";
}

sub dst_schema {
    return "www-category";
}


            </pre><p>

            The methods different from the DLf converter case are the
            <code class="methodname">src_schema</code> which specifies the
            schema which to which fields are added and the
            <code class="methodname">dst_schema</code> which gives the schema
            specifying the fields that will be added.
          </p></div><div class="section" title="Categoriser Configuration"><div class="titlepage"><div><div><h3 class="title"><a name="id404050"></a>Categoriser Configuration</h3></div></div></div><p>Our categoriser will assign categories based on
            a mapping from regular expression to category names. To be
            useful, this mapping should be configurable. Like all
            plugins in <span class="application">Lire</span>, DLF categorisers can use the Lire
            Configuration Specification Markup Language to defines the
            configuration data they use (see <a class="xref" href="ch08.html" title="Chapter 8. The Lire Report Configuration Specification Markup Language">Chapter 8, <i>The Lire Report Configuration Specification Markup Language</i></a> for the full details). 
            The convention is that if there is a parameter named
            <code class="constant"><em class="replaceable"><code>yourname</code></em>_propeties</code>,
            this is considered the configuration specification for the
            plugin <code class="varname">yourname</code>. This will mean that a
            little button will appear in the <span class="command"><strong>lire</strong></span>
            user interface so that the user can configure your plugin data.
          </p><p>In our categoriser case, we will define a list of records
            which will enable the user to define many pairs of regular
            expression and category name:

            </p><pre class="programlisting">

&lt;?xml version="1.0"?&gt;
&lt;!DOCTYPE lrcsml:config-spec PUBLIC
  "-//LogReport.ORG//DTD Lire Report Configuration Specification Markup Language V1.0//EN"
  "http://www.logreport.org/LRCSML/1.1/lrcsml.dtd"&gt;
&lt;lrcsml:config-spec xmlns:lrcsml="http://www.logreport.org/LRCSML/"
                    xmlns:lrcml="http://www.logreport.org/LRCML/"&gt;

 &lt;lrcsml:list name="page-categoriser_properties"&gt;
  &lt;lrcsml:summary&gt;Page Categoriser Configuration&lt;/lrcsml:summary&gt;

  &lt;lrcsml:description&gt;
   &lt;para&gt;This is a list of regexp that will be apply in this order
    along the category that should be applied when the regexp match.
   &lt;/para&gt;
  &lt;/lrcsml:description&gt;

  &lt;lrcsml:record name="regex2category"&gt;
   &lt;lrcsml:summary&gt;The Regexp-Category Association&lt;/lrcsml:summary&gt;
   &lt;lrcsml:string name="regex"&gt;
    &lt;lrcsml:summary&gt;Regex&lt;/lrcsml:summary&gt;
    &lt;lrcsml:description&gt;
     &lt;para&gt;The regular expression to test.&lt;/para&gt;
    &lt;/lrcsml:description&gt;
   &lt;/lrcsml:string&gt;

   &lt;lrcsml:string name="category"&gt;
    &lt;lrcsml:summary&gt;Category&lt;/lrcsml:summary&gt;
    &lt;lrcsml:description&gt;
     &lt;para&gt;This field contains the category that should be assigned.&lt;/para&gt;
    &lt;/lrcsml:description&gt;
   &lt;/lrcsml:string&gt;
  &lt;/lrcsml:record&gt;
 &lt;/lrcsml:list&gt;
p &lt;lrcml:param name="page-categoriser_properties"&gt;
  &lt;lrcml:param name="regex2category"&gt;
   &lt;lrcml:param name="regex"&gt;.*&lt;/lrcml:param&gt;
   &lt;lrcml:param name="category"&gt;Unknown&lt;/lrcml:param&gt;
  &lt;/lrcml:param&gt;
 &lt;/lrcml:param&gt;
 &lt;/lrcsml:list&gt;
&lt;/lrcsml:config-spec&gt;

            </pre><p>

            This specification also sets a list containing one
            catchall regex with the category 'Uknown'. The user could
            add other values before that. An alternative
            implementation could define a field specifying the
            default category to assign when no regular expression matches.
          </p></div><div class="section" title="Categoriser Implementation"><div class="titlepage"><div><div><h3 class="title"><a name="id404101"></a>Categoriser Implementation</h3></div></div></div><p>Two methods are needed to implement the categoriser.
            The first is an initialisation method called
            <code class="methodname">initialise</code>. This method receives
            as parameter the configuration data entered by the user.
          </p><p>In our case, we will compile the regular expressions
            for faster processing later on :

            </p><pre class="programlisting">

sub initialise {
    my ( $self, $config ) = @_;

    foreach my $map ( @$config ) {
        $map-&gt;[0] = qr/$map-&gt;[0]/;
    }   

    $self-&gt;{'categories'} = $config;
    return;
}

            </pre><p>
          </p><p>The categorising is made in the
            <code class="methodname">categorise</code> method. This method
            receives as parameter the DLF record to which the extended
            fields should be added. This DLF record is an hash
            reference containing one key for each of the fields
            defined in the source DLF schema. We simply assign the
            extended fields by adding new keys to the hash reference :

            </p><pre class="programlisting">

sub categorise { 
    my ( $self, $dlf ) = @_;

    foreach my $map ( @{$self-&gt;{'categories'}} ) {
        if ( $dlf-&gt;{'requested_page'} =~ /$map-&gt;[0]/ ) {
            $dlf-&gt;{'category'} = $map-&gt;[1];
            return;
        }
    }
    return;
}

            </pre><p>

            That's all. Like for the DLF converter you'll need to
            register this analyser with the
            <code class="classname">Lire::PluginManager</code> (see <a class="xref" href="ch02s06.html" title="Registering Your DLF Converter with the Lire Framework">the section called &#8220;Registering Your DLF Converter with the <span class="application">Lire</span> Framework&#8221;</a> for more information.
          </p></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="ch03.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="pt02.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="ch04s02.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 3. Writing a DLF Schema </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> Writing an Analyser</td></tr></table></div></body></html>
lire-devel-doc 2:2.1.1-2.1 / usr / share / doc / lire / dev-manual / ch04.html