This file is indexed.

/usr/share/doc/python-pebl/html/tutorial.html is in python-pebl-doc 1.0.2-2.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>Tutorial &mdash; Pebl v1.0.1 documentation</title>
    <link rel="stylesheet" href="_static/default.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '',
        VERSION:     '1.0.1',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <link rel="top" title="Pebl v1.0.1 documentation" href="index.html" />
    <link rel="next" title="Developer’s Guide" href="devguide.html" />
    <link rel="prev" title="Installation" href="install.html" /> 
  </head>
  <body>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="devguide.html" title="Developer’s Guide"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="install.html" title="Installation"
             accesskey="P">previous</a> |</li>
        <li><a href="index.html">Pebl v1.0.1 documentation</a> &raquo;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body">
            
  <div class="section" id="tutorial">
<span id="id1"></span><h1>Tutorial<a class="headerlink" href="#tutorial" title="Permalink to this headline"></a></h1>
<p>This tutorial is organized as a series of examples that highlight some of
Pebl&#8217;s various features.  It is assumed that the reader is familiar with
Bayesian networks and the python language and has read the <a class="reference internal" href="intro.html#intro"><em>Pebl Introduction</em></a>.</p>
<p>Pebl includes a python library and a command line application that can be used
with a configuration file.  The pebl application is limited compared to the
library but requires no programming.  Each example in this tutorial will
include a python script and a pebl configuration file that runs the same
analysis (when possible).</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">This tutorial uses Pebl 1.0</p>
</div>
<div class="section" id="introducing-the-problem">
<h2>Introducing the Problem<a class="headerlink" href="#introducing-the-problem" title="Permalink to this headline"></a></h2>
<p>Bayesian networks have been used to model complex phenomena with non-linear,
multimodal relationships between high-dimensional variables. When used to model
gene regulatory networks, nodes usually represent the expression profile of
genes while edges represent dependencies between them. The learned networks can
be interpreted as causal models explaining the data and can hint at the
underlying biological mechanisms and functions.</p>
<p>For this tutorial, we use the Cell Cycle data from Spellman, et. al [1] as an
example dataset.  This dataset contains 76 gene expression measurements of 6177
S. cerevisiae ORFs. The experiments include six time series using different
cell cycle synchronization methods. In this example, we ignore the temporal
aspect of the dataset and treat each measurement as an independent sample from
the underlying biological phenomena.  Spellman et al. identified 800 genes as
cell cycle dependent and we use a small 12 gene subset of that (as show in Fig
8.11 of [2]).</p>
<p><a class="reference external" href="_static/tutorial/pebl-tutorial-data1.txt">pebl-tutorial-data1.txt</a> contains
the gene expression measurements for our 12 genes.  Each row contains one
sample (the gene expression data from one microarray assay) and each column
contains the gene expression profile for a given gene over the 76 measurements.
The first row containing the variable names (the gene associated with the
measured ORF) is required for all pebl data files.</p>
</div>
<div class="section" id="first-example">
<h2>First Example<a class="headerlink" href="#first-example" title="Permalink to this headline"></a></h2>
<p>This is the simplest analysis in Pebl:</p>
<div class="highlight-python"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1
2
3
4
5
6
7</pre></div></td><td class="code"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pebl</span> <span class="kn">import</span> <span class="n">data</span>
<span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pebl.learner</span> <span class="kn">import</span> <span class="n">greedy</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">dataset</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">fromfile</span><span class="p">(</span><span class="s">&quot;pebl-tutorial-data1.txt&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">dataset</span><span class="o">.</span><span class="n">discretize</span><span class="p">()</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">learner</span> <span class="o">=</span> <span class="n">greedy</span><span class="o">.</span><span class="n">GreedyLearner</span><span class="p">(</span><span class="n">dataset</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">ex1result</span> <span class="o">=</span> <span class="n">learner</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">ex1result</span><span class="o">.</span><span class="n">tohtml</span><span class="p">(</span><span class="s">&quot;example1-result&quot;</span><span class="p">)</span>
</pre></div>
</td></tr></table></div>
<p>The code above does the following:</p>
<blockquote>
<div><ul class="simple">
<li>Imports the required modules (lines 1-2)</li>
<li>Loads the dataset (line 3)</li>
<li>Discretizes the continuous values in the dataset (line 4)</li>
<li>Creates and runs a greedy learner with default parameters (lines 5-6)</li>
<li>Creates a html report of the results (line 7)</li>
</ul>
</div></blockquote>
<p>The same analysis can be run with the following configuration file:</p>
<div class="highlight-python"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre> 1
 2
 3
 4
 5
 6
 7
 8
 9
10</pre></div></td><td class="code"><div class="highlight"><pre><span class="p">[</span><span class="n">data</span><span class="p">]</span>
<span class="n">filename</span> <span class="o">=</span> <span class="n">pebl</span><span class="o">-</span><span class="n">tutorial</span><span class="o">-</span><span class="n">data1</span><span class="o">.</span><span class="n">txt</span>
<span class="n">discretize</span> <span class="o">=</span> <span class="mi">3</span>

<span class="p">[</span><span class="n">learner</span><span class="p">]</span>
<span class="nb">type</span> <span class="o">=</span> <span class="n">greedy</span><span class="o">.</span><span class="n">GreedyLearner</span>

<span class="p">[</span><span class="n">result</span><span class="p">]</span>
<span class="n">format</span> <span class="o">=</span> <span class="n">html</span>
<span class="n">outdir</span> <span class="o">=</span> <span class="n">example1</span><span class="o">-</span><span class="n">result</span>
</pre></div>
</td></tr></table></div>
<p>To use the configuration file above, save it as a text file (config1.txt) and
run the pebl application:</p>
<div class="highlight-python"><pre>$/usr/local/bin/pebl run config1.txt</pre>
</div>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">The location of the pebl application may be different based on your installation method</p>
</div>
<p>The result of this example is available <a class="reference external" href="_static/tutorial/example1-result/index.html">here</a>.  Keep in mind that structure
learning of Bayesian networks uses stochastic methods and the results from
different runs will not be identical. Also, the results from this short run are
not realistic or spectacular; they do, however, serve as a good demostration of
the features of the html report.</p>
<p>The result is organized intro three tabs. The first tab shows the log scores
for the top networks and some overall statistics.  The second tab shows
the top 10 networks found and the last tab shows consensus networks at
different confidence thresholds.  The consensus networks are built using
estimated marginal posterior probabililty of each edge.</p>
</div>
<div class="section" id="pebl-s-data-file-format">
<h2>Pebl&#8217;s Data File Format<a class="headerlink" href="#pebl-s-data-file-format" title="Permalink to this headline"></a></h2>
<p>Pebl uses tab-delimited text files for its data.  Each column represents a
variable and each row represents a sample or observation.  The data file can
contain any number of comment rows that begin with a &#8220;#&#8221;.  The first
non-comment row is expected to be a header row that specifies the name and type
of each variable. Pebl supports continuous, discrete and class variables. The
three variable types have different header formats as shown below:</p>
<table border="1" class="docutils">
<colgroup>
<col width="14%" />
<col width="51%" />
<col width="36%" />
</colgroup>
<thead valign="bottom">
<tr><th class="head">Type</th>
<th class="head">Header Format</th>
<th class="head">Examples</th>
</tr>
</thead>
<tbody valign="top">
<tr><td rowspan="2">continuous</td>
<td rowspan="2">&lt;variable-name&gt;[,continuous]</td>
<td rowspan="2">CLN2
CLN1,continuous</td>
</tr>
<tr></tr>
<tr><td>discrete</td>
<td>&lt;variable-name&gt;,discrete(&lt;variable-arity&gt;)</td>
<td>CLN2,discrete(2)
CLN1,discrete(3)</td>
</tr>
<tr><td>class</td>
<td>&lt;variable-name&gt;,class(&lt;class-labels&gt;)</td>
<td>CLN2,class(low,high)
phenotype,class(cancer,normal)</td>
</tr>
</tbody>
</table>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Although Pebl accepts continuous values in the data file, they must be discretized before use.</p>
</div>
<p>Each measured value can be annotated with two indicators.  Append or prepend a
&#8221;!&#8221; to the value to indicate that the variable was intervened upon for that
sample or observation.  The intervention can be a gene knockdown, RNA silencing
or any perturbation that directly affect the value for that variable. Missing
values are indicated by using &#8220;X&#8221;. This can be the result of a scratch on a
micrarray slide or, if all the rows for a variable include &#8220;X&#8221;, a variable that
wasn&#8217;t measured.</p>
<p>Each sample (row) can have a name which should be in the first column. This is
not used in learning a Bayesian network, but can be used to create subsets of
the data based on the sample names.</p>
<p><a class="reference external" href="_static/tutorial/pebl-tutorial-data2.txt">pebl-tutorial-data2.txt</a>  is the
discretized version of our data file with sample names and was created with the
following script:</p>
<div class="highlight-python"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1
2
3
4
5
6</pre></div></td><td class="code"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pebl</span> <span class="kn">import</span> <span class="n">data</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">dataset</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">fromfile</span><span class="p">(</span><span class="s">&quot;pebl-tutorial-data1.txt&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">dataset</span><span class="o">.</span><span class="n">discretize</span><span class="p">(</span><span class="n">numbins</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="k">for</span> <span class="n">i</span><span class="p">,</span><span class="n">s</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">dataset</span><span class="o">.</span><span class="n">samples</span><span class="p">):</span>
<span class="gp">&gt;&gt;&gt; </span>   <span class="n">s</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="s">&quot;sample-</span><span class="si">%d</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">i</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">dataset</span><span class="o">.</span><span class="n">tofile</span><span class="p">(</span><span class="s">&quot;pebl-tutorial-data2.txt&quot;</span><span class="p">)</span>
</pre></div>
</td></tr></table></div>
</div>
<div class="section" id="second-example">
<h2>Second Example<a class="headerlink" href="#second-example" title="Permalink to this headline"></a></h2>
<p>In the first example, we used the default parameters for the greedy learner
(1000 iterations) which is inadequate for a dataset of this size. In this
example, we use custom stopping criteria:</p>
<div class="highlight-python"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1
2
3
4
5
6
7
8
9</pre></div></td><td class="code"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pebl</span> <span class="kn">import</span> <span class="n">data</span><span class="p">,</span> <span class="n">result</span>
<span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pebl.learner</span> <span class="kn">import</span> <span class="n">greedy</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">dataset</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">fromfile</span><span class="p">(</span><span class="s">&quot;pebl-tutorial-data2.txt&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">learner1</span> <span class="o">=</span> <span class="n">greedy</span><span class="o">.</span><span class="n">GreedyLearner</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">max_iterations</span><span class="o">=</span><span class="mi">1000000</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">learner2</span> <span class="o">=</span> <span class="n">greedy</span><span class="o">.</span><span class="n">GreedyLearner</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">max_time</span><span class="o">=</span><span class="mi">120</span><span class="p">)</span> <span class="c"># in seconds</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">result1</span> <span class="o">=</span> <span class="n">learner1</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">result2</span> <span class="o">=</span> <span class="n">learner2</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">merged_result</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">merge</span><span class="p">(</span><span class="n">result1</span><span class="p">,</span> <span class="n">result2</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">merged_result</span><span class="o">.</span><span class="n">tohtml</span><span class="p">(</span><span class="s">&quot;example2-result&quot;</span><span class="p">)</span>
</pre></div>
</td></tr></table></div>
<p>The code above does the following:</p>
<blockquote>
<div><ul class="simple">
<li>Imports the required modules (lines 1-2)</li>
<li>Loads the discretized dataset (line 3)</li>
<li>Creates and runs two greedy learners with specified stopping criteria (lines 4-7)</li>
<li>Merges the two learner results and creates html report (lines 8-9)</li>
</ul>
</div></blockquote>
<p>A Pebl configuration file can be used to create multiple learners but they must
be of the same type and use the same parameters (stopping criteria in this
case). Thus, it is not possible to replicate the above code with a
configuration file but it can be approximated:</p>
<div class="highlight-python"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre> 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13</pre></div></td><td class="code"><div class="highlight"><pre><span class="p">[</span><span class="n">data</span><span class="p">]</span>
<span class="n">filename</span> <span class="o">=</span> <span class="n">pebl</span><span class="o">-</span><span class="n">tutorial</span><span class="o">-</span><span class="n">data2</span><span class="o">.</span><span class="n">txt</span>

<span class="p">[</span><span class="n">learner</span><span class="p">]</span>
<span class="nb">type</span> <span class="o">=</span> <span class="n">greedy</span><span class="o">.</span><span class="n">GreedyLearner</span>
<span class="n">numtasks</span> <span class="o">=</span> <span class="mi">2</span>

<span class="p">[</span><span class="n">greedy</span><span class="p">]</span>
<span class="n">max_iterations</span> <span class="o">=</span> <span class="mi">1000000</span>

<span class="p">[</span><span class="n">result</span><span class="p">]</span>
<span class="n">format</span> <span class="o">=</span> <span class="n">html</span>
<span class="n">outdir</span> <span class="o">=</span> <span class="n">example2</span><span class="o">-</span><span class="n">result</span>
</pre></div>
</td></tr></table></div>
</div>
<div class="section" id="third-example">
<h2>Third Example<a class="headerlink" href="#third-example" title="Permalink to this headline"></a></h2>
<p>For large datasets, we might wish to do multiple learner runs and use different
learners. The following example creates and runs 5 greedy and 5 simulated
annealing learners:</p>
<div class="highlight-python"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1
2
3
4
5
6
7</pre></div></td><td class="code"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pebl</span> <span class="kn">import</span> <span class="n">data</span><span class="p">,</span> <span class="n">result</span>
<span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pebl.learner</span> <span class="kn">import</span> <span class="n">greedy</span><span class="p">,</span> <span class="n">simanneal</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">dataset</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">fromfile</span><span class="p">(</span><span class="s">&quot;pebl-tutorial-data2.txt&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">learners</span> <span class="o">=</span> <span class="p">[</span> <span class="n">greedy</span><span class="o">.</span><span class="n">GreedyLearner</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">max_iterations</span><span class="o">=</span><span class="mi">1000000</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span> <span class="p">]</span> <span class="o">+</span> \
<span class="gp">&gt;&gt;&gt; </span>           <span class="p">[</span> <span class="n">simanneal</span><span class="o">.</span><span class="n">SimulatedAnnealingLearner</span><span class="p">(</span><span class="n">dataset</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span> <span class="p">]</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">merged_result</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">merge</span><span class="p">([</span><span class="n">learner</span><span class="o">.</span><span class="n">run</span><span class="p">()</span> <span class="k">for</span> <span class="n">learner</span> <span class="ow">in</span> <span class="n">learners</span><span class="p">])</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">merged_result</span><span class="o">.</span><span class="n">tohtml</span><span class="p">(</span><span class="s">&quot;example3-result&quot;</span><span class="p">)</span>
</pre></div>
</td></tr></table></div>
<p>The code above is similar to the last example except that we create a list of
10 learners of two different types. The corresponding configuration file has
the same caveat as in the previous example:</p>
<div class="highlight-python"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre> 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13</pre></div></td><td class="code"><div class="highlight"><pre><span class="p">[</span><span class="n">data</span><span class="p">]</span>
<span class="n">filename</span> <span class="o">=</span> <span class="n">pebl</span><span class="o">-</span><span class="n">tutorial</span><span class="o">-</span><span class="n">data2</span><span class="o">.</span><span class="n">txt</span>

<span class="p">[</span><span class="n">learner</span><span class="p">]</span>
<span class="nb">type</span> <span class="o">=</span> <span class="n">greedy</span><span class="o">.</span><span class="n">GreedyLearner</span>
<span class="n">numtasks</span> <span class="o">=</span> <span class="mi">10</span>

<span class="p">[</span><span class="n">greedy</span><span class="p">]</span>
<span class="n">max_iterations</span> <span class="o">=</span> <span class="mi">1000000</span>

<span class="p">[</span><span class="n">result</span><span class="p">]</span>
<span class="n">format</span> <span class="o">=</span> <span class="n">html</span>
<span class="n">outdir</span> <span class="o">=</span> <span class="n">example3</span><span class="o">-</span><span class="n">result</span>
</pre></div>
</td></tr></table></div>
</div>
<div class="section" id="fourth-example">
<h2>Fourth Example<a class="headerlink" href="#fourth-example" title="Permalink to this headline"></a></h2>
<p>In the previous example, we run 10 learners serially.  We can use Pebl&#8217;s
taskcontroller package to run these learners in parallel:</p>
<div class="highlight-python"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre> 1
 2
 3
 4
 5
 6
 7
 8
 9
10</pre></div></td><td class="code"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pebl</span> <span class="kn">import</span> <span class="n">data</span><span class="p">,</span> <span class="n">result</span>
<span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pebl.learner</span> <span class="kn">import</span> <span class="n">greedy</span><span class="p">,</span> <span class="n">simanneal</span>
<span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pebl.taskcontroller</span> <span class="kn">import</span> <span class="n">multiprocess</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">dataset</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">fromfile</span><span class="p">(</span><span class="s">&quot;pebl-tutorial-data2.txt&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">learners</span> <span class="o">=</span> <span class="p">[</span> <span class="n">greedy</span><span class="o">.</span><span class="n">GreedyLearner</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">max_iterations</span><span class="o">=</span><span class="mi">1000000</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span> <span class="p">]</span> <span class="o">+</span> \
<span class="gp">&gt;&gt;&gt; </span>           <span class="p">[</span> <span class="n">simanneal</span><span class="o">.</span><span class="n">SimulatedAnnealingLearner</span><span class="p">(</span><span class="n">dataset</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span> <span class="p">]</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">tc</span> <span class="o">=</span> <span class="n">multiprocess</span><span class="o">.</span><span class="n">MultiProcessController</span><span class="p">(</span><span class="n">poolsize</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">results</span> <span class="o">=</span> <span class="n">tc</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">learners</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">merged_result</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">merge</span><span class="p">(</span><span class="n">results</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">merged_result</span><span class="o">.</span><span class="n">tohtml</span><span class="p">(</span><span class="s">&quot;example4-result&quot;</span><span class="p">)</span>
</pre></div>
</td></tr></table></div>
<p>In this example, we import the multiprocess module (line 3), create a
multiprocess task controller with a pool size of two processes (line 7), run
the learners using the task controller (line 8) and merge the results and
create html report as before.</p>
<p>The corresponding configuration file (with the caveats mention in the previous
example) would be:</p>
<div class="highlight-python"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre> 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19</pre></div></td><td class="code"><div class="highlight"><pre><span class="p">[</span><span class="n">data</span><span class="p">]</span>
<span class="n">filename</span> <span class="o">=</span> <span class="n">pebl</span><span class="o">-</span><span class="n">tutorial</span><span class="o">-</span><span class="n">data1</span><span class="o">.</span><span class="n">txt</span>

<span class="p">[</span><span class="n">learner</span><span class="p">]</span>
<span class="nb">type</span> <span class="o">=</span> <span class="n">greedy</span><span class="o">.</span><span class="n">GreedyLearner</span>
<span class="n">numtasks</span> <span class="o">=</span> <span class="mi">10</span>

<span class="p">[</span><span class="n">greedy</span><span class="p">]</span>
<span class="n">max_iterations</span> <span class="o">=</span> <span class="mi">1000000</span>

<span class="p">[</span><span class="n">taskcontroller</span><span class="p">]</span>
<span class="nb">type</span> <span class="o">=</span> <span class="n">multiprocess</span><span class="o">.</span><span class="n">MultiProcessController</span>

<span class="p">[</span><span class="n">multiprocess</span><span class="p">]</span>
<span class="n">poolsize</span> <span class="o">=</span> <span class="mi">2</span>

<span class="p">[</span><span class="n">result</span><span class="p">]</span>
<span class="n">format</span> <span class="o">=</span> <span class="n">html</span>
<span class="n">outdir</span> <span class="o">=</span> <span class="n">example2</span><span class="o">-</span><span class="n">result</span>
</pre></div>
</td></tr></table></div>
<dl class="docutils">
<dt>Pebl provides three other task controllers:</dt>
<dd><ul class="first last simple">
<li><a class="reference internal" href="taskcontroller/xgrid.html#module-pebl.taskcontroller.xgrid" title="pebl.taskcontroller.xgrid: Xgrid task controller"><tt class="xref py py-mod docutils literal"><span class="pre">pebl.taskcontroller.xgrid</span></tt></a> for using Apple&#8217;s XGrid</li>
<li><a class="reference internal" href="taskcontroller/ipy1.html#module-pebl.taskcontroller.ipy1" title="pebl.taskcontroller.ipy1: IPython1 task controller"><tt class="xref py py-mod docutils literal"><span class="pre">pebl.taskcontroller.ipy1</span></tt></a> for using an Ipython1 cluster</li>
<li><a class="reference internal" href="taskcontroller/ec2.html#module-pebl.taskcontroller.ec2" title="pebl.taskcontroller.ec2: Amazon EC2 task controller"><tt class="xref py py-mod docutils literal"><span class="pre">pebl.taskcontroller.ec2</span></tt></a> for using Amazon EC2</li>
</ul>
</dd>
</dl>
<p>All task controllers can be used with the pebl application and configuration
file and the only difference between their usage are the parameters they
require. Thus, Pebl allows one to do preliminary analysis on their desktop with
perhaps the multiprocess task controller and then do the full analysis using an
XGrid or Amazon&#8217;s EC2 by simply changing one line of code or a few lines in a
configuration file. The EC2 task controller is an especially attractive option
for large analysis tasks because it allows one to rent the computing resources
on an as-needed basis and without any cluster installation or configuration.</p>
</div>
<div class="section" id="a-note-on-interpreting-the-results">
<h2>A Note on Interpreting the Results<a class="headerlink" href="#a-note-on-interpreting-the-results" title="Permalink to this headline"></a></h2>
<p>There is no principled way to determine the optimal stopping criteria or
simulated annealing parameters for analyzing a given dataset.  One common
strategy is to construct consensus networks that show network features found
with high confidence. Pebl&#8217;s html reports show such &#8220;model-averaged&#8221; networks in
the third tab and the <a class="reference internal" href="posterior.html#module-pebl.posterior" title="pebl.posterior: Posterior distribution"><tt class="xref py py-mod docutils literal"><span class="pre">pebl.posterior</span></tt></a> module has methods for creating
these programatically.</p>
<p>Another common strategy is to check for stability of results. You begin with
some learning, save the results, do futher learning, merge the two results and
see if the top networks and consensus networks have changed much. If they
remain relatively stable, you can assume that you&#8217;ve reached a good solution.
Keep in mind, however, that you can never guarantee that you&#8217;ve found the
optimal network (or that there is a singular optimal network to be found) since
structure learning of Bayesian network is a known NP-Hard problem.</p>
<p>In the examples above, we&#8217;ve been creating html reports of the results but these
cannot be later merged. A better option is to save the result using the
<a class="reference internal" href="result.html#pebl.result.LearnerResult.tofile" title="pebl.result.LearnerResult.tofile"><tt class="xref py py-meth docutils literal"><span class="pre">pebl.result.LearnerResult.tofile()</span></tt></a> method and then later read it with
<a class="reference internal" href="result.html#pebl.result.fromfile" title="pebl.result.fromfile"><tt class="xref py py-func docutils literal"><span class="pre">pebl.result.fromfile()</span></tt></a>:</p>
<div class="highlight-python"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1
2
3
4
5
6
7
8</pre></div></td><td class="code"><div class="highlight"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pebl</span> <span class="kn">import</span> <span class="n">data</span><span class="p">,</span> <span class="n">result</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">result1</span> <span class="o">=</span> <span class="n">learner</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">result1</span><span class="o">.</span><span class="n">tofile</span><span class="p">(</span><span class="s">&quot;result1.pebl&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">result1</span><span class="o">.</span><span class="n">tohtml</span><span class="p">(</span><span class="s">&quot;result1&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">result2</span> <span class="o">=</span> <span class="n">otherlearner</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">result1</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">fromfile</span><span class="p">(</span><span class="s">&quot;result1.pebl&quot;</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">merged_result</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">merge</span><span class="p">(</span><span class="n">result1</span><span class="p">,</span> <span class="n">result2</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">merged_result</span><span class="o">.</span><span class="n">tofile</span><span class="p">(</span><span class="s">&quot;results_sofar.pebl&quot;</span><span class="p">)</span>
</pre></div>
</td></tr></table></div>
<p>A third strategy is to calculate a p-value for each scored network. This will be added to the tutorial shortly.</p>
</div>
<div class="section" id="a-note-on-scale-of-problems-pebl-can-solve">
<h2>A Note on Scale of Problems Pebl Can Solve<a class="headerlink" href="#a-note-on-scale-of-problems-pebl-can-solve" title="Permalink to this headline"></a></h2>
<p>Pebl imposes few limits on the scale of problems it can handle other than the
limits imposed by the hardware you use. The
<a class="reference internal" href="paramref.html#confparam-localscore_cache.maxsize"><tt class="xref std std-confparam docutils literal"><span class="pre">localscore_cache.maxsize</span></tt></a> configuration parameter can be used to
control the size of the main cache used by pebl. With a value appropriate to
your memory availability, pebl can be used for quite large datasets. We have
successfully tested pebl with datasets of size (number variable, number
samples) = (10000,10000) and (1000,100000) on a machine with 2GB memory.</p>
<p>While pebl can handle such large datasets without crashing, because structure
learning is a known NP-Hard problem, using pebl with datasets containing more
than a few hundred variables will likely give poor results due to poor coverage
of the search space.</p>
</div>
<div class="section" id="more-coming-soon">
<h2>More Coming Soon<a class="headerlink" href="#more-coming-soon" title="Permalink to this headline"></a></h2>
<p>I will be adding examples for using prior knowledge and for calculating p-values using a bootstrapping approach.</p>
</div>
<div class="section" id="learning-more">
<h2>Learning More<a class="headerlink" href="#learning-more" title="Permalink to this headline"></a></h2>
<p>This tutorial should have given you an overview of using Pebl. For further
information about specific components, consult the <a class="reference internal" href="apiref.html#apiref"><em>API Reference</em></a>, which
contains detailed information about all parts of pebl.  If you would like to
add code to pebl, consult the <a class="reference internal" href="devguide.html#devguide"><em>Developer&#8217;s Guide</em></a>.  Feel free to contact me (Abhik
Shah &lt;<a class="reference external" href="mailto:abhikshah&#37;&#52;&#48;gmail&#46;com">abhikshah<span>&#64;</span>gmail<span>&#46;</span>com</a>&gt;) with any questions or comments.</p>
</div>
<div class="section" id="bibliography">
<h2>Bibliography<a class="headerlink" href="#bibliography" title="Permalink to this headline"></a></h2>
<dl class="docutils">
<dt>[1] Spellman et al., (1998).  Comprehensive Identification of Cell</dt>
<dd>Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray
Hybridization.  Molecular Biology of the Cell 9, 3273-3297.</dd>
<dt>[2] Husmeier et al., Probabilistic Modeling in Bioinformatics and Medical</dt>
<dd>Informatics. Springer, 2004. <a class="reference external" href="http://books.google.com/books?id=ND8rjHNkJ-QC">http://books.google.com/books?id=ND8rjHNkJ-QC</a></dd>
</dl>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar">
        <div class="sphinxsidebarwrapper">
  <h3><a href="index.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">Tutorial</a><ul>
<li><a class="reference internal" href="#introducing-the-problem">Introducing the Problem</a></li>
<li><a class="reference internal" href="#first-example">First Example</a></li>
<li><a class="reference internal" href="#pebl-s-data-file-format">Pebl&#8217;s Data File Format</a></li>
<li><a class="reference internal" href="#second-example">Second Example</a></li>
<li><a class="reference internal" href="#third-example">Third Example</a></li>
<li><a class="reference internal" href="#fourth-example">Fourth Example</a></li>
<li><a class="reference internal" href="#a-note-on-interpreting-the-results">A Note on Interpreting the Results</a></li>
<li><a class="reference internal" href="#a-note-on-scale-of-problems-pebl-can-solve">A Note on Scale of Problems Pebl Can Solve</a></li>
<li><a class="reference internal" href="#more-coming-soon">More Coming Soon</a></li>
<li><a class="reference internal" href="#learning-more">Learning More</a></li>
<li><a class="reference internal" href="#bibliography">Bibliography</a></li>
</ul>
</li>
</ul>

  <h4>Previous topic</h4>
  <p class="topless"><a href="install.html"
                        title="previous chapter">Installation</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="devguide.html"
                        title="next chapter">Developer&#8217;s Guide</a></p>
  <h3>This Page</h3>
  <ul class="this-page-menu">
    <li><a href="_sources/tutorial.txt"
           rel="nofollow">Show Source</a></li>
  </ul>
<div id="searchbox" style="display: none">
  <h3>Quick search</h3>
    <form class="search" action="search.html" method="get">
      <input type="text" name="q" size="18" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
    </p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="devguide.html" title="Developer’s Guide"
             >next</a> |</li>
        <li class="right" >
          <a href="install.html" title="Installation"
             >previous</a> |</li>
        <li><a href="index.html">Pebl v1.0.1 documentation</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer">
        &copy; Copyright 2008, Abhik Shah.
      Last updated on Apr 29, 2011.
      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.0.7.
    </div>
  </body>
</html>