This file is indexed.

/usr/share/doc/flex-old-doc/flex_18.html is in flex-old-doc 2.5.4a-10.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
html2
<html>
<!-- Created on May 24, 2011 by texi2html 1.82
texi2html was written by: 
            Lionel Cons <Lionel.Cons@cern.ch> (original author)
            Karl Berry  <karl@freefriends.org>
            Olaf Bachmann <obachman@mathematik.uni-kl.de>
            and many others.
Maintained by: Many creative people.
Send bugs and suggestions to <texi2html-bug@nongnu.org>
-->
<head>
<title>Flex - a scanner generator: 18. Performance considerations</title>

<meta name="description" content="Flex - a scanner generator: 18. Performance considerations">
<meta name="keywords" content="Flex - a scanner generator: 18. Performance considerations">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="texi2html 1.82">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.smallquotation {font-size: smaller}
pre.display {font-family: serif}
pre.format {font-family: serif}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: serif; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: serif; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.roman {font-family:serif; font-weight:normal;}
span.sansserif {font-family:sans-serif; font-weight:normal;}
ul.toc {list-style: none}
-->
</style>


</head>

<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">

<a name="Performance"></a>
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="flex_17.html#Options" title="Previous section in reading order"> &lt; </a>]</td>
<td valign="middle" align="left">[<a href="flex_19.html#C_002b_002b" title="Next section in reading order"> &gt; </a>]</td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left">[<a href="flex_17.html#Options" title="Beginning of this chapter or previous chapter"> &lt;&lt; </a>]</td>
<td valign="middle" align="left">[<a href="flex.html#Top" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="flex_19.html#C_002b_002b" title="Next chapter"> &gt;&gt; </a>]</td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left">[<a href="flex.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="flex_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[Index]</td>
<td valign="middle" align="left">[<a href="flex_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="Performance-considerations"></a>
<h1 class="chapter">18. Performance considerations</h1>

<p>The main design goal of <code>flex</code> is that it generate
high-performance scanners.  It has been optimized for dealing
well with large sets of rules.  Aside from the effects on
scanner speed of the table compression &lsquo;<samp>-C</samp>&rsquo; options outlined
above, there are a number of options/actions which degrade
performance.  These are, from most expensive to least:
</p>
<table><tr><td>&nbsp;</td><td><pre class="example">REJECT
%option yylineno
arbitrary trailing context

pattern sets that require backing up
%array
%option interactive
%option always-interactive

'^' beginning-of-line operator
yymore()
</pre></td></tr></table>

<p>with the first three all being quite expensive and the
last two being quite cheap.  Note also that &lsquo;<samp>unput()</samp>&rsquo; is
implemented as a routine call that potentially does quite
a bit of work, while &lsquo;<samp>yyless()</samp>&rsquo; is a quite-cheap macro; so
if just putting back some excess text you scanned, use
&lsquo;<samp>yyless()</samp>&rsquo;.
</p>
<p><code>REJECT</code> should be avoided at all costs when performance is
important.  It is a particularly expensive option.
</p>
<p>Getting rid of backing up is messy and often may be an
enormous amount of work for a complicated scanner.  In
principal, one begins by using the &lsquo;<samp>-b</samp>&rsquo; flag to generate a
&lsquo;<tt>lex.backup</tt>&rsquo; file.  For example, on the input
</p>
<table><tr><td>&nbsp;</td><td><pre class="example">%%
foo        return TOK_KEYWORD;
foobar     return TOK_KEYWORD;
</pre></td></tr></table>

<p>the file looks like:
</p>
<table><tr><td>&nbsp;</td><td><pre class="example">State #6 is non-accepting -
 associated rule line numbers:
       2       3
 out-transitions: [ o ]
 jam-transitions: EOF [ \001-n  p-\177 ]

State #8 is non-accepting -
 associated rule line numbers:
       3
 out-transitions: [ a ]
 jam-transitions: EOF [ \001-`  b-\177 ]

State #9 is non-accepting -
 associated rule line numbers:
       3
 out-transitions: [ r ]
 jam-transitions: EOF [ \001-q  s-\177 ]

Compressed tables always back up.
</pre></td></tr></table>

<p>The first few lines tell us that there&rsquo;s a scanner state
in which it can make a transition on an &rsquo;o&rsquo; but not on any
other character, and that in that state the currently
scanned text does not match any rule.  The state occurs
when trying to match the rules found at lines 2 and 3 in
the input file.  If the scanner is in that state and then
reads something other than an &rsquo;o&rsquo;, it will have to back up
to find a rule which is matched.  With a bit of
head-scratching one can see that this must be the state it&rsquo;s in
when it has seen &quot;fo&quot;.  When this has happened, if
anything other than another &rsquo;o&rsquo; is seen, the scanner will
have to back up to simply match the &rsquo;f&rsquo; (by the default
rule).
</p>
<p>The comment regarding State #8 indicates there&rsquo;s a problem
when &quot;foob&quot; has been scanned.  Indeed, on any character
other than an &rsquo;a&rsquo;, the scanner will have to back up to
accept &quot;foo&quot;.  Similarly, the comment for State #9
concerns when &quot;fooba&quot; has been scanned and an &rsquo;r&rsquo; does not
follow.
</p>
<p>The final comment reminds us that there&rsquo;s no point going
to all the trouble of removing backing up from the rules
unless we&rsquo;re using &lsquo;<samp>-Cf</samp>&rsquo; or &lsquo;<samp>-CF</samp>&rsquo;, since there&rsquo;s no
performance gain doing so with compressed scanners.
</p>
<p>The way to remove the backing up is to add &quot;error&quot; rules:
</p>
<table><tr><td>&nbsp;</td><td><pre class="example">%%
foo         return TOK_KEYWORD;
foobar      return TOK_KEYWORD;

fooba       |
foob        |
fo          {
            /* false alarm, not really a keyword */
            return TOK_ID;
            }
</pre></td></tr></table>

<p>Eliminating backing up among a list of keywords can also
be done using a &quot;catch-all&quot; rule:
</p>
<table><tr><td>&nbsp;</td><td><pre class="example">%%
foo         return TOK_KEYWORD;
foobar      return TOK_KEYWORD;

[a-z]+      return TOK_ID;
</pre></td></tr></table>

<p>This is usually the best solution when appropriate.
</p>
<p>Backing up messages tend to cascade.  With a complicated
set of rules it&rsquo;s not uncommon to get hundreds of
messages.  If one can decipher them, though, it often only
takes a dozen or so rules to eliminate the backing up
(though it&rsquo;s easy to make a mistake and have an error rule
accidentally match a valid token.  A possible future <code>flex</code>
feature will be to automatically add rules to eliminate
backing up).
</p>
<p>It&rsquo;s important to keep in mind that you gain the benefits
of eliminating backing up only if you eliminate <em>every</em>
instance of backing up.  Leaving just one means you gain
nothing.
</p>
<p><var>Variable</var> trailing context (where both the leading and
trailing parts do not have a fixed length) entails almost
the same performance loss as <code>REJECT</code> (i.e., substantial).
So when possible a rule like:
</p>
<table><tr><td>&nbsp;</td><td><pre class="example">%%
mouse|rat/(cat|dog)   run();
</pre></td></tr></table>

<p>is better written:
</p>
<table><tr><td>&nbsp;</td><td><pre class="example">%%
mouse/cat|dog         run();
rat/cat|dog           run();
</pre></td></tr></table>

<p>or as
</p>
<table><tr><td>&nbsp;</td><td><pre class="example">%%
mouse|rat/cat         run();
mouse|rat/dog         run();
</pre></td></tr></table>

<p>Note that here the special &rsquo;|&rsquo; action does <em>not</em> provide any
savings, and can even make things worse (see Deficiencies
/ Bugs below).
</p>
<p>Another area where the user can increase a scanner&rsquo;s
performance (and one that&rsquo;s easier to implement) arises from
the fact that the longer the tokens matched, the faster
the scanner will run.  This is because with long tokens
the processing of most input characters takes place in the
(short) inner scanning loop, and does not often have to go
through the additional work of setting up the scanning
environment (e.g., <code>yytext</code>) for the action.  Recall the
scanner for C comments:
</p>
<table><tr><td>&nbsp;</td><td><pre class="example">%x comment
%%
        int line_num = 1;

&quot;/*&quot;         BEGIN(comment);

&lt;comment&gt;[^*\n]*
&lt;comment&gt;&quot;*&quot;+[^*/\n]*
&lt;comment&gt;\n             ++line_num;
&lt;comment&gt;&quot;*&quot;+&quot;/&quot;        BEGIN(INITIAL);
</pre></td></tr></table>

<p>This could be sped up by writing it as:
</p>
<table><tr><td>&nbsp;</td><td><pre class="example">%x comment
%%
        int line_num = 1;

&quot;/*&quot;         BEGIN(comment);

&lt;comment&gt;[^*\n]*
&lt;comment&gt;[^*\n]*\n      ++line_num;
&lt;comment&gt;&quot;*&quot;+[^*/\n]*
&lt;comment&gt;&quot;*&quot;+[^*/\n]*\n ++line_num;
&lt;comment&gt;&quot;*&quot;+&quot;/&quot;        BEGIN(INITIAL);
</pre></td></tr></table>

<p>Now instead of each newline requiring the processing of
another action, recognizing the newlines is &quot;distributed&quot;
over the other rules to keep the matched text as long as
possible.  Note that <em>adding</em> rules does <em>not</em> slow down the
scanner!  The speed of the scanner is independent of the
number of rules or (modulo the considerations given at the
beginning of this section) how complicated the rules are
with regard to operators such as &rsquo;*&rsquo; and &rsquo;|&rsquo;.
</p>
<p>A final example in speeding up a scanner: suppose you want
to scan through a file containing identifiers and
keywords, one per line and with no other extraneous
characters, and recognize all the keywords.  A natural first
approach is:
</p>
<table><tr><td>&nbsp;</td><td><pre class="example">%%
asm      |
auto     |
break    |
&hellip; etc &hellip;
volatile |
while    /* it's a keyword */

.|\n     /* it's not a keyword */
</pre></td></tr></table>

<p>To eliminate the back-tracking, introduce a catch-all
rule:
</p>
<table><tr><td>&nbsp;</td><td><pre class="example">%%
asm      |
auto     |
break    |
... etc ...
volatile |
while    /* it's a keyword */

[a-z]+   |
.|\n     /* it's not a keyword */
</pre></td></tr></table>

<p>Now, if it&rsquo;s guaranteed that there&rsquo;s exactly one word per
line, then we can reduce the total number of matches by a
half by merging in the recognition of newlines with that
of the other tokens:
</p>
<table><tr><td>&nbsp;</td><td><pre class="example">%%
asm\n    |
auto\n   |
break\n  |
&hellip; etc &hellip;
volatile\n |
while\n  /* it's a keyword */

[a-z]+\n |
.|\n     /* it's not a keyword */
</pre></td></tr></table>

<p>One has to be careful here, as we have now reintroduced
backing up into the scanner.  In particular, while <em>we</em> know
that there will never be any characters in the input
stream other than letters or newlines, <code>flex</code> can&rsquo;t figure
this out, and it will plan for possibly needing to back up
when it has scanned a token like &quot;auto&quot; and then the next
character is something other than a newline or a letter.
Previously it would then just match the &quot;auto&quot; rule and be
done, but now it has no &quot;auto&quot; rule, only a &quot;auto\n&quot; rule.
To eliminate the possibility of backing up, we could
either duplicate all rules but without final newlines, or,
since we never expect to encounter such an input and
therefore don&rsquo;t how it&rsquo;s classified, we can introduce one
more catch-all rule, this one which doesn&rsquo;t include a
newline:
</p>
<table><tr><td>&nbsp;</td><td><pre class="example">%%
asm\n    |
auto\n   |
break\n  |
&hellip; etc &hellip;
volatile\n |
while\n  /* it's a keyword */

[a-z]+\n |
[a-z]+   |
.|\n     /* it's not a keyword */
</pre></td></tr></table>

<p>Compiled with &lsquo;<samp>-Cf</samp>&rsquo;, this is about as fast as one can get a
<code>flex</code> scanner to go for this particular problem.
</p>
<p>A final note: <code>flex</code> is slow when matching NUL&rsquo;s,
particularly when a token contains multiple NUL&rsquo;s.  It&rsquo;s best to
write rules which match <em>short</em> amounts of text if it&rsquo;s
anticipated that the text will often include NUL&rsquo;s.
</p>
<p>Another final note regarding performance: as mentioned
above in the section How the Input is Matched, dynamically
resizing <code>yytext</code> to accommodate huge tokens is a slow
process because it presently requires that the (huge) token
be rescanned from the beginning.  Thus if performance is
vital, you should attempt to match &quot;large&quot; quantities of
text but not &quot;huge&quot; quantities, where the cutoff between
the two is at about 8K characters/token.
</p>
<hr size="6">
<table cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="flex_17.html#Options" title="Beginning of this chapter or previous chapter"> &lt;&lt; </a>]</td>
<td valign="middle" align="left">[<a href="flex_19.html#C_002b_002b" title="Next chapter"> &gt;&gt; </a>]</td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left">[<a href="flex.html#Top" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="flex_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[Index]</td>
<td valign="middle" align="left">[<a href="flex_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<p>
 <font size="-1">
  This document was generated on <i>May 24, 2011</i> using <a href="http://www.nongnu.org/texi2html/"><i>texi2html 1.82</i></a>.
 </font>
 <br>

</p>
</body>
</html>