/usr/share/doc/lire/dev-manual/ch04.html is in lire-devel-doc 2:2.1.1-2.1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 | <html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Chapter 4. Writing a New DLF Analyser</title><meta name="generator" content="DocBook XSL Stylesheets V1.75.2"><link rel="home" href="index.html" title="Lire Developer's Manual"><link rel="up" href="pt02.html" title="Part II. Using the Lire Framework"><link rel="prev" href="ch03.html" title="Chapter 3. Writing a DLF Schema"><link rel="next" href="ch04s02.html" title="Writing an Analyser"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter 4. Writing a New DLF Analyser</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="ch03.html">Prev</a> </td><th width="60%" align="center">Part II. Using the <span class="application">Lire</span> Framework</th><td width="20%" align="right"> <a accesskey="n" href="ch04s02.html">Next</a></td></tr></table><hr></div><div class="chapter" title="Chapter 4. Writing a New DLF Analyser"><div class="titlepage"><div><div><h2 class="title"><a name="chap:writing-analyser"></a>Chapter 4. Writing a New DLF Analyser</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="ch04.html#sect:writing-categoriser">Writing a Categoriser</a></span></dt><dd><dl><dt><span class="section"><a href="ch04.html#id403947">Defining The Extended Schema </a></span></dt><dt><span class="section"><a href="ch04.html#id404012">Defining the Categoriser</a></span></dt><dt><span class="section"><a href="ch04.html#id404050">Categoriser Configuration</a></span></dt><dt><span class="section"><a href="ch04.html#id404101">Categoriser Implementation</a></span></dt></dl></dd><dt><span class="section"><a href="ch04s02.html">Writing an Analyser</a></span></dt><dt><span class="section"><a href="ch04s03.html">DLF Analyser API</a></span></dt></dl></div><p>In <span class="application">Lire</span>, a DLF Analyser is a plugin that can extract or
derived data from other DLF data. The idea is that these
analysis do not depends on the underlying log format but that
it can be found simply by using the data normalised in the DLF
schema.
</p><p>For example, an analyser could assign category based
on the url that was visited (like assigning the 'Public' or
'Private' category). This categorising operation doesn't
depends on the log format but only on the presence of the
<em class="structfield"><code>requested_page</code></em> field in the schema.
This would be an example of a special kind of analyser, a Lire
DLF Categoriser. This is a simpler analyser that can create
new fields based on one DLF record.
</p><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>The <code class="filename">doc/examples</code> in the source
distribution contains the complete code for this categoriser.
</p></div><p>There is a more generic kind of analysers that create data
in another dlf streams based on arbitrary queries on the
source DLF schema. An example of this kind is an analyser that
construct session summary from the www requests. It reads the
DLF records of the <span class="type">www</span> DLF schema and creates
<span class="type">www-user_session</span> DLF records from that.
</p><p>Writing an analyser is similar to writing a DLF converter,
so consult <a class="xref" href="ch02.html" title="Chapter 2. Writing a New DLF Converter">Chapter 2, <i>Writing a New DLF Converter</i></a> for the details converning
registration and using configuration.
</p><div class="section" title="Writing a Categoriser"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sect:writing-categoriser"></a>Writing a Categoriser</h2></div></div></div><p>The simplest form of analyser are categorisers. In this
section, we will show an example of how to write a
categoriser that can assign categories using regular
expressions to each <span class="type">www</span> requested page.
</p><div class="section" title="Defining The Extended Schema"><div class="titlepage"><div><div><h3 class="title"><a name="id403947"></a>Defining The Extended Schema </h3></div></div></div><p>A categoriser writes DLF in an extended schema. An
extended schemas is an extension of a base schema. If you
are familiar with SQL you can see it as an inner join with
the main schema. That is each fields in the main schema
will have the extension fields of the extended schema.
</p><p>In our case our extended schema is very simple, it
only adds one <em class="structfield"><code>category</code></em> field to
the <span class="type">www</span> schema.
</p><p>Defining an extended schema is identical to writing a
DLF Schema with exception that we use a different top-level
element. You should consult <a class="xref" href="ch03.html" title="Chapter 3. Writing a DLF Schema">Chapter 3, <i>Writing a DLF Schema</i></a> for all the details.
Here is the extended schema that our categoriser will use:
</p><pre class="programlisting">
<?xml version="1.0"?>
<!DOCTYPE lire:extended-schema PUBLIC
"-//LogReport.ORG//DTD Lire DLF Schema Markup Language V1.1//EN"
"http://www.logreport.org/LDSML/1.1/ldsml.dtd">
<lire:extended-schema id="www-category" base-schema="www"
xmlns:lire="http://www.logreport.org/LDSML/">
<lire:title>Category Extended Schema for WWW service</lire:title>
<lire:description>
<para>This is an extended schema for the WWW service which adds a
category field based on the regexp matched by the requested_page.
</para>
</lire:description>
<lire:field name="category" type="string" label="Category">
<lire:description>
<para>This fields contain the page category.</para>
</lire:description>
</lire:field>
</lire:extended-schema>
</pre><p>
</p><p>The difference with a regular DLF schema is that it
starts with the <code class="sgmltag-element">extended-schema</code> tag
which has a <code class="sgmltag-attribute">base-schema</code> attribute which
should contain the DLF schema or derived DLF schema that
is extended.
</p></div><div class="section" title="Defining the Categoriser"><div class="titlepage"><div><div><h3 class="title"><a name="id404012"></a>Defining the Categoriser</h3></div></div></div><p>Like a DLF Converter, the categoriser s an object
deriving from a base class which defines the categoriser
interface. In the categoriser case, that interface is
<code class="interfacename">Lire::DlfCategoriser</code>.
The categoriser also has to provide some meta-information
to the framework. Here is the code for all of this:
</p><pre class="programlisting">
package MyAnalysers::PageCategoriser;
use base qw/Lire::DlfCategoriser/;
sub new {
return bless {}, shift;
}
sub name {
return 'page-categoriser';
}
sub title {
return "A page categoriser";
}
sub description {
return "<para>A categoriser that assigns categories based on a map
of regular expressions to categories.</para>";
}
sub src_schema {
return "www";
}
sub dst_schema {
return "www-category";
}
</pre><p>
The methods different from the DLf converter case are the
<code class="methodname">src_schema</code> which specifies the
schema which to which fields are added and the
<code class="methodname">dst_schema</code> which gives the schema
specifying the fields that will be added.
</p></div><div class="section" title="Categoriser Configuration"><div class="titlepage"><div><div><h3 class="title"><a name="id404050"></a>Categoriser Configuration</h3></div></div></div><p>Our categoriser will assign categories based on
a mapping from regular expression to category names. To be
useful, this mapping should be configurable. Like all
plugins in <span class="application">Lire</span>, DLF categorisers can use the Lire
Configuration Specification Markup Language to defines the
configuration data they use (see <a class="xref" href="ch08.html" title="Chapter 8. The Lire Report Configuration Specification Markup Language">Chapter 8, <i>The Lire Report Configuration Specification Markup Language</i></a> for the full details).
The convention is that if there is a parameter named
<code class="constant"><em class="replaceable"><code>yourname</code></em>_propeties</code>,
this is considered the configuration specification for the
plugin <code class="varname">yourname</code>. This will mean that a
little button will appear in the <span class="command"><strong>lire</strong></span>
user interface so that the user can configure your plugin data.
</p><p>In our categoriser case, we will define a list of records
which will enable the user to define many pairs of regular
expression and category name:
</p><pre class="programlisting">
<?xml version="1.0"?>
<!DOCTYPE lrcsml:config-spec PUBLIC
"-//LogReport.ORG//DTD Lire Report Configuration Specification Markup Language V1.0//EN"
"http://www.logreport.org/LRCSML/1.1/lrcsml.dtd">
<lrcsml:config-spec xmlns:lrcsml="http://www.logreport.org/LRCSML/"
xmlns:lrcml="http://www.logreport.org/LRCML/">
<lrcsml:list name="page-categoriser_properties">
<lrcsml:summary>Page Categoriser Configuration</lrcsml:summary>
<lrcsml:description>
<para>This is a list of regexp that will be apply in this order
along the category that should be applied when the regexp match.
</para>
</lrcsml:description>
<lrcsml:record name="regex2category">
<lrcsml:summary>The Regexp-Category Association</lrcsml:summary>
<lrcsml:string name="regex">
<lrcsml:summary>Regex</lrcsml:summary>
<lrcsml:description>
<para>The regular expression to test.</para>
</lrcsml:description>
</lrcsml:string>
<lrcsml:string name="category">
<lrcsml:summary>Category</lrcsml:summary>
<lrcsml:description>
<para>This field contains the category that should be assigned.</para>
</lrcsml:description>
</lrcsml:string>
</lrcsml:record>
</lrcsml:list>
p <lrcml:param name="page-categoriser_properties">
<lrcml:param name="regex2category">
<lrcml:param name="regex">.*</lrcml:param>
<lrcml:param name="category">Unknown</lrcml:param>
</lrcml:param>
</lrcml:param>
</lrcsml:list>
</lrcsml:config-spec>
</pre><p>
This specification also sets a list containing one
catchall regex with the category 'Uknown'. The user could
add other values before that. An alternative
implementation could define a field specifying the
default category to assign when no regular expression matches.
</p></div><div class="section" title="Categoriser Implementation"><div class="titlepage"><div><div><h3 class="title"><a name="id404101"></a>Categoriser Implementation</h3></div></div></div><p>Two methods are needed to implement the categoriser.
The first is an initialisation method called
<code class="methodname">initialise</code>. This method receives
as parameter the configuration data entered by the user.
</p><p>In our case, we will compile the regular expressions
for faster processing later on :
</p><pre class="programlisting">
sub initialise {
my ( $self, $config ) = @_;
foreach my $map ( @$config ) {
$map->[0] = qr/$map->[0]/;
}
$self->{'categories'} = $config;
return;
}
</pre><p>
</p><p>The categorising is made in the
<code class="methodname">categorise</code> method. This method
receives as parameter the DLF record to which the extended
fields should be added. This DLF record is an hash
reference containing one key for each of the fields
defined in the source DLF schema. We simply assign the
extended fields by adding new keys to the hash reference :
</p><pre class="programlisting">
sub categorise {
my ( $self, $dlf ) = @_;
foreach my $map ( @{$self->{'categories'}} ) {
if ( $dlf->{'requested_page'} =~ /$map->[0]/ ) {
$dlf->{'category'} = $map->[1];
return;
}
}
return;
}
</pre><p>
That's all. Like for the DLF converter you'll need to
register this analyser with the
<code class="classname">Lire::PluginManager</code> (see <a class="xref" href="ch02s06.html" title="Registering Your DLF Converter with the Lire Framework">the section called “Registering Your DLF Converter with the <span class="application">Lire</span> Framework”</a> for more information.
</p></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="ch03.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="pt02.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="ch04s02.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 3. Writing a DLF Schema </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> Writing an Analyser</td></tr></table></div></body></html>
|