This file is indexed.

/usr/share/doc/python-patsy-doc/html/expert-model-specification.html is in python-patsy-doc 0.4.1-2.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>Model specification for experts and computers &mdash; patsy 0.4.1 documentation</title>
    
    <link rel="stylesheet" href="_static/classic.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    <link rel="stylesheet" href="_static/facebox.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    './',
        VERSION:     '0.4.1',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <script type="text/javascript" src="_static/show-code.js"></script>
    <script type="text/javascript" src="_static/facebox.js"></script>
    <link rel="top" title="patsy 0.4.1 documentation" href="index.html" />
    <link rel="next" title="Using Patsy in your library" href="library-developers.html" />
    <link rel="prev" title="Spline regression" href="spline-regression.html" /> 
  </head>
  <body role="document">
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="library-developers.html" title="Using Patsy in your library"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="spline-regression.html" title="Spline regression"
             accesskey="P">previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="index.html">patsy 0.4.1 documentation</a> &raquo;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">
            
  <div class="section" id="model-specification-for-experts-and-computers">
<span id="expert-model-specification"></span><h1>Model specification for experts and computers<a class="headerlink" href="#model-specification-for-experts-and-computers" title="Permalink to this headline"></a></h1>
<p>While the formula language is great for interactive model-fitting and
exploratory data analysis, there are times when we want a different or
more systematic interface for creating design matrices. If you ever
find yourself writing code that pastes together bits of strings to
create a formula, then stop! And read this chapter.</p>
<p>Our first option, of course, is that we can go ahead and write some
code to construct our design matrices directly, just like we did in
the old days. Since this is supported directly by <a class="reference internal" href="API-reference.html#patsy.dmatrix" title="patsy.dmatrix"><code class="xref py py-func docutils literal"><span class="pre">dmatrix()</span></code></a> and
<a class="reference internal" href="API-reference.html#patsy.dmatrices" title="patsy.dmatrices"><code class="xref py py-func docutils literal"><span class="pre">dmatrices()</span></code></a>, it also works with any third-party library
functions that use Patsy internally. Just pass in an array_like or
a tuple <code class="docutils literal"><span class="pre">(y_array_like,</span> <span class="pre">X_array_like)</span></code> in place of the formula.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span class="gp">In [1]: </span><span class="kn">from</span> <span class="nn">patsy</span> <span class="kn">import</span> <span class="n">dmatrix</span>

<span class="gp">In [2]: </span><span class="n">X</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">10</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">20</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span><span class="p">]]</span>

<span class="gp">In [3]: </span><span class="n">dmatrix</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="gh">Out[3]: </span><span class="go"></span>
<span class="go">DesignMatrix with shape (3, 2)</span>
<span class="go">  x0  x1</span>
<span class="go">   1  10</span>
<span class="go">   1  20</span>
<span class="go">   1  -2</span>
<span class="go">  Terms:</span>
<span class="go">    &#39;x0&#39; (column 0)</span>
<span class="go">    &#39;x1&#39; (column 1)</span>
</pre></div>
</div>
<p>By using a <a class="reference internal" href="API-reference.html#patsy.DesignMatrix" title="patsy.DesignMatrix"><code class="xref py py-class docutils literal"><span class="pre">DesignMatrix</span></code></a> with <a class="reference internal" href="API-reference.html#patsy.DesignInfo" title="patsy.DesignInfo"><code class="xref py py-class docutils literal"><span class="pre">DesignInfo</span></code></a> attached, we
can also specify custom names for our custom matrix (or even term
slices and so forth), so that we still get the nice output and such
that Patsy would otherwise provide:</p>
<div class="highlight-ipython"><div class="highlight"><pre><span class="gp">In [4]: </span><span class="kn">from</span> <span class="nn">patsy</span> <span class="kn">import</span> <span class="n">DesignMatrix</span><span class="p">,</span> <span class="n">DesignInfo</span>

<span class="gp">In [5]: </span><span class="n">design_info</span> <span class="o">=</span> <span class="n">DesignInfo</span><span class="p">([</span><span class="s">&quot;Intercept!&quot;</span><span class="p">,</span> <span class="s">&quot;Not intercept!&quot;</span><span class="p">])</span>

<span class="gp">In [6]: </span><span class="n">X_dm</span> <span class="o">=</span> <span class="n">DesignMatrix</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">design_info</span><span class="p">)</span>

<span class="gp">In [7]: </span><span class="n">dmatrix</span><span class="p">(</span><span class="n">X_dm</span><span class="p">)</span>
<span class="gh">Out[7]: </span><span class="go"></span>
<span class="go">DesignMatrix with shape (3, 2)</span>
<span class="go">  Intercept!  Not intercept!</span>
<span class="go">           1              10</span>
<span class="go">           1              20</span>
<span class="go">           1              -2</span>
<span class="go">  Terms:</span>
<span class="go">    &#39;Intercept!&#39; (column 0)</span>
<span class="go">    &#39;Not intercept!&#39; (column 1)</span>
</pre></div>
</div>
<p>Or if all we want to do is to specify column names, we could also just
use a <code class="xref py py-class docutils literal"><span class="pre">pandas.DataFrame</span></code>:</p>
<div class="highlight-ipython"><div class="highlight"><pre><span class="gp">In [8]: </span><span class="kn">import</span> <span class="nn">pandas</span>

<span class="gp">In [9]: </span><span class="n">df</span> <span class="o">=</span> <span class="n">pandas</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">10</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">20</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span><span class="p">]],</span>
<span class="gp">   ...: </span>                      <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s">&quot;Intercept!&quot;</span><span class="p">,</span> <span class="s">&quot;Not intercept!&quot;</span><span class="p">])</span>
<span class="gp">   ...: </span>

<span class="gp">In [10]: </span><span class="n">dmatrix</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
<span class="gh">Out[10]: </span><span class="go"></span>
<span class="go">DesignMatrix with shape (3, 2)</span>
<span class="go">  Intercept!  Not intercept!</span>
<span class="go">           1              10</span>
<span class="go">           1              20</span>
<span class="go">           1              -2</span>
<span class="go">  Terms:</span>
<span class="go">    &#39;Intercept!&#39; (column 0)</span>
<span class="go">    &#39;Not intercept!&#39; (column 1)</span>
</pre></div>
</div>
<p>However, there is also a middle ground between pasting together
strings and going back to putting together design matrices out of
string and baling wire. Patsy has a straightforward Python
interface for representing the result of parsing formulas, and you can
use it directly. This lets you keep Patsy&#8217;s normal advantages &#8211;
handling of categorical data and interactions, predictions, term
tracking, etc. &#8211; while using a nice high-level Python API. An example
of somewhere this might be useful is if, say, you had a GUI with a
tick box next to each variable in your data set, and wanted to
construct a formula containing all the variables that had been
checked, and letting Patsy deal with categorical data handling. Or
this would be the approach you&#8217;d take for doing stepwise regression,
where you need to programatically add and remove terms.</p>
<p>Whatever your particular situation, the strategy is this:</p>
<ol class="arabic simple">
<li>Construct some factor objects (probably using <a class="reference internal" href="API-reference.html#patsy.LookupFactor" title="patsy.LookupFactor"><code class="xref py py-class docutils literal"><span class="pre">LookupFactor</span></code></a> or
<a class="reference internal" href="API-reference.html#patsy.EvalFactor" title="patsy.EvalFactor"><code class="xref py py-class docutils literal"><span class="pre">EvalFactor</span></code></a></li>
<li>Put them into some <a class="reference internal" href="API-reference.html#patsy.Term" title="patsy.Term"><code class="xref py py-class docutils literal"><span class="pre">Term</span></code></a> objects,</li>
<li>Put the <a class="reference internal" href="API-reference.html#patsy.Term" title="patsy.Term"><code class="xref py py-class docutils literal"><span class="pre">Term</span></code></a> objects into two lists, representing the
left- and right-hand side of your formula,</li>
<li>And then wrap the whole thing up in a <a class="reference internal" href="API-reference.html#patsy.ModelDesc" title="patsy.ModelDesc"><code class="xref py py-class docutils literal"><span class="pre">ModelDesc</span></code></a>.</li>
</ol>
<p>(See <a class="reference internal" href="formulas.html#formulas"><span>How formulas work</span></a> if you need a refresher on what each of these
things are.)</p>
<div class="highlight-ipython"><div class="highlight"><pre><span class="gp">In [11]: </span><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>

<span class="gp">In [12]: </span><span class="kn">from</span> <span class="nn">patsy</span> <span class="kn">import</span> <span class="p">(</span><span class="n">ModelDesc</span><span class="p">,</span> <span class="n">EvalEnvironment</span><span class="p">,</span> <span class="n">Term</span><span class="p">,</span> <span class="n">EvalFactor</span><span class="p">,</span>
<span class="gp">   ....: </span>                   <span class="n">LookupFactor</span><span class="p">,</span> <span class="n">demo_data</span><span class="p">,</span> <span class="n">dmatrix</span><span class="p">)</span>
<span class="gp">   ....: </span>

<span class="gp">In [13]: </span><span class="n">data</span> <span class="o">=</span> <span class="n">demo_data</span><span class="p">(</span><span class="s">&quot;a&quot;</span><span class="p">,</span> <span class="s">&quot;x&quot;</span><span class="p">)</span>

<span class="go"># LookupFactor takes a dictionary key:</span>
<span class="gp">In [14]: </span><span class="n">a_lookup</span> <span class="o">=</span> <span class="n">LookupFactor</span><span class="p">(</span><span class="s">&quot;a&quot;</span><span class="p">)</span>

<span class="go"># EvalFactor takes arbitrary Python code:</span>
<span class="gp">In [15]: </span><span class="n">x_transform</span> <span class="o">=</span> <span class="n">EvalFactor</span><span class="p">(</span><span class="s">&quot;np.log(x ** 2)&quot;</span><span class="p">)</span>

<span class="go"># First argument is empty list for dmatrix; we would need to put</span>
<span class="go"># something there if we were calling dmatrices.</span>
<span class="gp">In [16]: </span><span class="n">desc</span> <span class="o">=</span> <span class="n">ModelDesc</span><span class="p">([],</span>
<span class="gp">   ....: </span>                 <span class="p">[</span><span class="n">Term</span><span class="p">([</span><span class="n">a_lookup</span><span class="p">]),</span>
<span class="gp">   ....: </span>                  <span class="n">Term</span><span class="p">([</span><span class="n">x_transform</span><span class="p">]),</span>
<span class="gp">   ....: </span>                  <span class="n">Term</span><span class="p">([</span><span class="n">a_lookup</span><span class="p">,</span> <span class="n">x_transform</span><span class="p">])])</span>
<span class="gp">   ....: </span>

<span class="go"># Create the matrix (or pass &#39;desc&#39; to any statistical library</span>
<span class="go"># function that uses patsy.dmatrix internally):</span>
<span class="gp">In [17]: </span><span class="n">dmatrix</span><span class="p">(</span><span class="n">desc</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>
<span class="gh">Out[17]: </span><span class="go"></span>
<span class="go">DesignMatrix with shape (6, 4)</span>
<span class="go">  a[a1]  a[a2]  np.log(x ** 2)  a[T.a2]:np.log(x ** 2)</span>
<span class="go">      1      0         1.13523                 0.00000</span>
<span class="go">      0      1        -1.83180                -1.83180</span>
<span class="go">      1      0        -0.04298                -0.00000</span>
<span class="go">      0      1         1.61375                 1.61375</span>
<span class="go">      1      0         1.24926                 0.00000</span>
<span class="go">      0      1        -0.04597                -0.04597</span>
<span class="go">  Terms:</span>
<span class="go">    &#39;a&#39; (columns 0:2)</span>
<span class="go">    &#39;np.log(x ** 2)&#39; (column 2)</span>
<span class="go">    &#39;a:np.log(x ** 2)&#39; (column 3)</span>
</pre></div>
</div>
<p>Notice that no intercept term is included. Implicit intercepts are a
feature of the formula parser, not the underlying represention. If you
want an intercept, include the constant <a class="reference internal" href="API-reference.html#patsy.INTERCEPT" title="patsy.INTERCEPT"><code class="xref py py-const docutils literal"><span class="pre">INTERCEPT</span></code></a> in your
list of terms (which is just sugar for <code class="docutils literal"><span class="pre">Term([])</span></code>).</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p class="last">Another option is to just pass your term lists directly to
<a class="reference internal" href="API-reference.html#patsy.design_matrix_builders" title="patsy.design_matrix_builders"><code class="xref py py-func docutils literal"><span class="pre">design_matrix_builders()</span></code></a>, and skip the <a class="reference internal" href="API-reference.html#patsy.ModelDesc" title="patsy.ModelDesc"><code class="xref py py-class docutils literal"><span class="pre">ModelDesc</span></code></a>
entirely &#8211; all of the highlevel API functions like <a class="reference internal" href="API-reference.html#patsy.dmatrix" title="patsy.dmatrix"><code class="xref py py-func docutils literal"><span class="pre">dmatrix()</span></code></a>
accept <code class="xref py py-class docutils literal"><span class="pre">DesignMatrixBuilder</span></code> objects as well as
<a class="reference internal" href="API-reference.html#patsy.ModelDesc" title="patsy.ModelDesc"><code class="xref py py-class docutils literal"><span class="pre">ModelDesc</span></code></a> objects.</p>
</div>
<p>Example: say our data has 100 different numerical columns that we want
to include in our design &#8211; and we also have a few categorical
variables with a more complex interaction structure. Here&#8217;s one
solution:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="k">def</span> <span class="nf">add_predictors</span><span class="p">(</span><span class="n">base_formula</span><span class="p">,</span> <span class="n">extra_predictors</span><span class="p">):</span>
    <span class="n">desc</span> <span class="o">=</span> <span class="n">ModelDesc</span><span class="o">.</span><span class="n">from_formula</span><span class="p">(</span><span class="n">base_formula</span><span class="p">)</span>
    <span class="c"># Using LookupFactor here ensures that everything will work correctly even</span>
    <span class="c"># if one of the column names in extra_columns is named like &quot;weight.in.kg&quot;</span>
    <span class="c"># or &quot;sys.exit()&quot; or &quot;LittleBobbyTables()&quot;.</span>
    <span class="n">desc</span><span class="o">.</span><span class="n">rhs_termlist</span> <span class="o">+=</span> <span class="p">[</span><span class="n">Term</span><span class="p">([</span><span class="n">LookupFactor</span><span class="p">(</span><span class="n">p</span><span class="p">)])</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">extra_predictors</span><span class="p">]</span>
    <span class="k">return</span> <span class="n">desc</span>
</pre></div>
</div>
<div class="highlight-ipython"><div class="highlight"><pre><span class="gp">In [18]: </span><span class="n">extra_predictors</span> <span class="o">=</span> <span class="p">[</span><span class="s">&quot;x</span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">i</span><span class="p">,)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">)]</span>

<span class="gp">In [19]: </span><span class="n">desc</span> <span class="o">=</span> <span class="n">add_predictors</span><span class="p">(</span><span class="s">&quot;np.log(y) ~ a*b + c:d&quot;</span><span class="p">,</span> <span class="n">extra_predictors</span><span class="p">)</span>

<span class="gp">In [20]: </span><span class="n">desc</span><span class="o">.</span><span class="n">describe</span><span class="p">()</span>
<span class="gh">Out[20]: </span><span class="go">&#39;np.log(y) ~ a + b + a:b + c:d + x0 + x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9&#39;</span>
</pre></div>
</div>
<div class="section" id="the-factor-protocol">
<h2>The factor protocol<a class="headerlink" href="#the-factor-protocol" title="Permalink to this headline"></a></h2>
<p>If <a class="reference internal" href="API-reference.html#patsy.LookupFactor" title="patsy.LookupFactor"><code class="xref py py-class docutils literal"><span class="pre">LookupFactor</span></code></a> and <a class="reference internal" href="API-reference.html#patsy.EvalFactor" title="patsy.EvalFactor"><code class="xref py py-class docutils literal"><span class="pre">EvalFactor</span></code></a> aren&#8217;t enough for
you, then you can define your own factor class.</p>
<p>The full interface looks like this:</p>
<dl class="class">
<dt id="patsy.factor_protocol">
<em class="property">class </em><code class="descclassname">patsy.</code><code class="descname">factor_protocol</code><a class="headerlink" href="#patsy.factor_protocol" title="Permalink to this definition"></a></dt>
<dd><dl class="method">
<dt id="patsy.factor_protocol.name">
<code class="descname">name</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#patsy.factor_protocol.name" title="Permalink to this definition"></a></dt>
<dd><p>This must return a short string describing this factor. It will
be used to create column names, among other things.</p>
</dd></dl>

<dl class="attribute">
<dt id="patsy.factor_protocol.origin">
<code class="descname">origin</code><a class="headerlink" href="#patsy.factor_protocol.origin" title="Permalink to this definition"></a></dt>
<dd><p>A <a class="reference internal" href="API-reference.html#patsy.Origin" title="patsy.Origin"><code class="xref py py-class docutils literal"><span class="pre">patsy.Origin</span></code></a> if this factor has one; otherwise, just
set it to None.</p>
</dd></dl>

<dl class="method">
<dt id="patsy.factor_protocol.__eq__">
<code class="descname">__eq__</code><span class="sig-paren">(</span><em>obj</em><span class="sig-paren">)</span><a class="headerlink" href="#patsy.factor_protocol.__eq__" title="Permalink to this definition"></a></dt>
<dt id="patsy.factor_protocol.__ne__">
<code class="descname">__ne__</code><span class="sig-paren">(</span><em>obj</em><span class="sig-paren">)</span><a class="headerlink" href="#patsy.factor_protocol.__ne__" title="Permalink to this definition"></a></dt>
<dt id="patsy.factor_protocol.__hash__">
<code class="descname">__hash__</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#patsy.factor_protocol.__hash__" title="Permalink to this definition"></a></dt>
<dd><p>If your factor will ever contain categorical data or
participate in interactions, then it&#8217;s important to make sure
you&#8217;ve defined <code class="xref py py-meth docutils literal"><span class="pre">__eq__()</span></code> and
<code class="xref py py-meth docutils literal"><span class="pre">__ne__()</span></code> and that your type is
<span class="xref std std-term">hashable</span>. These methods will determine which factors
Patsy considers equal for purposes of redundancy elimination.</p>
</dd></dl>

<dl class="method">
<dt id="patsy.factor_protocol.memorize_passes_needed">
<code class="descname">memorize_passes_needed</code><span class="sig-paren">(</span><em>state</em>, <em>eval_env</em><span class="sig-paren">)</span><a class="headerlink" href="#patsy.factor_protocol.memorize_passes_needed" title="Permalink to this definition"></a></dt>
<dd><p>Return the number of passes through the data that this factor
will need in order to set up any <a class="reference internal" href="stateful-transforms.html#stateful-transforms"><span>Stateful transforms</span></a>.</p>
<p>If you don&#8217;t want to support stateful transforms, just return
0. In this case, <a class="reference internal" href="#patsy.factor_protocol.memorize_chunk" title="patsy.factor_protocol.memorize_chunk"><code class="xref py py-meth docutils literal"><span class="pre">memorize_chunk()</span></code></a> and
<a class="reference internal" href="#patsy.factor_protocol.memorize_finish" title="patsy.factor_protocol.memorize_finish"><code class="xref py py-meth docutils literal"><span class="pre">memorize_finish()</span></code></a> will never be called.</p>
<p><cite>state</cite> is an (initially) empty dict which is maintained by the
builder machinery, and that we can do whatever we like with. It
will be passed back in to all memorization and evaluation
methods.</p>
<p><cite>eval_env</cite> is an <a class="reference internal" href="API-reference.html#patsy.EvalEnvironment" title="patsy.EvalEnvironment"><code class="xref py py-class docutils literal"><span class="pre">EvalEnvironment</span></code></a> object, describing
the Python environment where the factor is being evaluated.</p>
</dd></dl>

<dl class="method">
<dt id="patsy.factor_protocol.memorize_chunk">
<code class="descname">memorize_chunk</code><span class="sig-paren">(</span><em>state</em>, <em>which_pass</em>, <em>data</em><span class="sig-paren">)</span><a class="headerlink" href="#patsy.factor_protocol.memorize_chunk" title="Permalink to this definition"></a></dt>
<dd><p>Called repeatedly with each &#8216;chunk&#8217; of data produced by the
<cite>data_iter_maker</cite> passed to <a class="reference internal" href="API-reference.html#patsy.design_matrix_builders" title="patsy.design_matrix_builders"><code class="xref py py-func docutils literal"><span class="pre">design_matrix_builders()</span></code></a>.</p>
<p><cite>state</cite> is the state dictionary. <cite>which_pass</cite> will be zero on
the first pass through the data, and eventually reach the
value you returned from <a class="reference internal" href="#patsy.factor_protocol.memorize_passes_needed" title="patsy.factor_protocol.memorize_passes_needed"><code class="xref py py-meth docutils literal"><span class="pre">memorize_passes_needed()</span></code></a>, minus
one.</p>
<p>Return value is ignored.</p>
</dd></dl>

<dl class="method">
<dt id="patsy.factor_protocol.memorize_finish">
<code class="descname">memorize_finish</code><span class="sig-paren">(</span><em>state</em>, <em>which_pass</em><span class="sig-paren">)</span><a class="headerlink" href="#patsy.factor_protocol.memorize_finish" title="Permalink to this definition"></a></dt>
<dd><p>Called once after each pass through the data.</p>
<p>Return value is ignored.</p>
</dd></dl>

<dl class="method">
<dt id="patsy.factor_protocol.eval">
<code class="descname">eval</code><span class="sig-paren">(</span><em>state</em>, <em>data</em><span class="sig-paren">)</span><a class="headerlink" href="#patsy.factor_protocol.eval" title="Permalink to this definition"></a></dt>
<dd><p>Evaluate this factor on the given <cite>data</cite>. Return value should
ideally be a 1-d or 2-d array or <code class="xref py py-func docutils literal"><span class="pre">Categorical()</span></code> object,
but this will be checked and converted as needed.</p>
</dd></dl>

</dd></dl>

<p>In addition, factor objects should be pickleable/unpickleable, so as
to allow models containing them to be pickled/unpickled. (Or, if for
some reason your factor objects are <em>not</em> safely pickleable, you
should consider giving them a <cite>__getstate__</cite> method which raises an
error, so that any users which attempt to pickle a model containing
your factors will get a clear failure immediately, instead of only
later when they try to unpickle.)</p>
<div class="admonition warning">
<p class="first admonition-title">Warning</p>
<p class="last">Do not store evaluation-related state in
attributes of your factor object! The same factor object may
appear in two totally different formulas, or if you have two
factor objects which compare equally, then only one may be
executed, and which one this is may vary randomly depending
on how <a class="reference internal" href="API-reference.html#patsy.build_design_matrices" title="patsy.build_design_matrices"><code class="xref py py-func docutils literal"><span class="pre">build_design_matrices()</span></code></a> is called! Use only the
<cite>state</cite> dictionary for storing state.</p>
</div>
<p>The lifecycle of a factor object therefore looks like:</p>
<ol class="arabic simple">
<li>Initialized.</li>
<li><code class="xref py py-meth docutils literal"><span class="pre">memorize_passes_needed()</span></code> is called.</li>
<li><code class="docutils literal"><span class="pre">for</span> <span class="pre">i</span> <span class="pre">in</span> <span class="pre">range(passes_needed):</span></code><ol class="arabic">
<li><code class="xref py py-meth docutils literal"><span class="pre">memorize_chunk()</span></code> is called one or more times</li>
<li><code class="xref py py-meth docutils literal"><span class="pre">memorize_finish()</span></code> is called</li>
</ol>
</li>
<li><code class="xref py py-meth docutils literal"><span class="pre">eval()</span></code> is called zero or more times.</li>
</ol>
</div>
<div class="section" id="alternative-formula-implementations">
<h2>Alternative formula implementations<a class="headerlink" href="#alternative-formula-implementations" title="Permalink to this headline"></a></h2>
<p>Even if you hate Patsy&#8217;s formulas all together, to the extent that
you&#8217;re going to go and implement your own competing mechanism for
defining formulas, you can still Patsy-based
interfaces. Unfortunately, this isn&#8217;t <em>quite</em> as clean as we&#8217;d like,
because for now there&#8217;s no way to define a custom
<code class="xref py py-class docutils literal"><span class="pre">DesignMatrixBuilder</span></code>. So you do still have to go through
Patsy&#8217;s formula-building machinery. But, this machinery simply
passes numerical data through unchanged, so in extremis you can:</p>
<ul class="simple">
<li>Define a special factor object that simply defers to your existing
machinery</li>
<li>Define the magic <code class="docutils literal"><span class="pre">__patsy_get_model_desc__</span></code> method on your
formula object. <a class="reference internal" href="API-reference.html#patsy.dmatrix" title="patsy.dmatrix"><code class="xref py py-func docutils literal"><span class="pre">dmatrix()</span></code></a> and friends check for the presence
of this method on any object that is passed in, and if found, it is
called (passing in the <a class="reference internal" href="API-reference.html#patsy.EvalEnvironment" title="patsy.EvalEnvironment"><code class="xref py py-class docutils literal"><span class="pre">EvalEnvironment</span></code></a>), and expected to
return a <a class="reference internal" href="API-reference.html#patsy.ModelDesc" title="patsy.ModelDesc"><code class="xref py py-class docutils literal"><span class="pre">ModelDesc</span></code></a>. And your <a class="reference internal" href="API-reference.html#patsy.ModelDesc" title="patsy.ModelDesc"><code class="xref py py-class docutils literal"><span class="pre">ModelDesc</span></code></a> can, of
course, include your special factor object(s).</li>
</ul>
<p>Put together, it looks something like this:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="k">class</span> <span class="nc">MyAlternativeFactor</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
    <span class="c"># A factor object that simply returns the design</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">alternative_formula</span><span class="p">,</span> <span class="n">side</span><span class="p">):</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">alternative_formula</span> <span class="o">=</span> <span class="n">alternative_formula</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">side</span> <span class="o">=</span> <span class="n">side</span>

    <span class="k">def</span> <span class="nf">name</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">side</span>

    <span class="k">def</span> <span class="nf">memorize_passes_needed</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">state</span><span class="p">):</span>
        <span class="k">return</span> <span class="mi">0</span>

    <span class="k">def</span> <span class="nf">eval</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">state</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
        <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">alternative_formula</span><span class="o">.</span><span class="n">get_matrix</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">side</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>

<span class="k">class</span> <span class="nc">MyAlternativeFormula</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
    <span class="o">...</span>

    <span class="k">def</span> <span class="nf">__patsy_get_model_desc__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">eval_env</span><span class="p">):</span>
        <span class="k">return</span> <span class="n">ModelDesc</span><span class="p">([</span><span class="n">Term</span><span class="p">([</span><span class="n">MyAlternativeFactor</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">side</span><span class="o">=</span><span class="s">&quot;left&quot;</span><span class="p">)])],</span>
                         <span class="p">[</span><span class="n">Term</span><span class="p">([</span><span class="n">MyAlternativeFactor</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">side</span><span class="o">=</span><span class="s">&quot;right&quot;</span><span class="p">)])],</span>


<span class="n">my_formula</span> <span class="o">=</span> <span class="n">MyAlternativeFormula</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
<span class="n">dmatrix</span><span class="p">(</span><span class="n">my_formula</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>
</pre></div>
</div>
<p>The only downside to this approach is that you can&#8217;t control the names
of individual columns. (A workaround would be to create multiple terms
each with its own factor that returns a different pieces of your
overall matrix.) If this is a problem for you, though, then let&#8217;s talk
&#8211; we can probably work something out.</p>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
  <h3><a href="index.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">Model specification for experts and computers</a><ul>
<li><a class="reference internal" href="#the-factor-protocol">The factor protocol</a></li>
<li><a class="reference internal" href="#alternative-formula-implementations">Alternative formula implementations</a></li>
</ul>
</li>
</ul>

  <h4>Previous topic</h4>
  <p class="topless"><a href="spline-regression.html"
                        title="previous chapter">Spline regression</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="library-developers.html"
                        title="next chapter">Using Patsy in your library</a></p>
  <div role="note" aria-label="source link">
    <h3>This Page</h3>
    <ul class="this-page-menu">
      <li><a href="_sources/expert-model-specification.txt"
            rel="nofollow">Show Source</a></li>
    </ul>
   </div>
<div id="searchbox" style="display: none" role="search">
  <h3>Quick search</h3>
    <form class="search" action="search.html" method="get">
      <input type="text" name="q" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
    </p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="library-developers.html" title="Using Patsy in your library"
             >next</a> |</li>
        <li class="right" >
          <a href="spline-regression.html" title="Spline regression"
             >previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="index.html">patsy 0.4.1 documentation</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer" role="contentinfo">
        &copy; Copyright 2011-2015, Nathaniel J. Smith.
      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.3.1.
    </div>
  </body>
</html>