/usr/share/doc/javacc/doc/tokenmanager.html is in javacc-doc 5.0-5.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<!--
Copyright (c) 2006, Sun Microsystems, Inc.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the Sun Microsystems, Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
THE POSSIBILITY OF SUCH DAMAGE.
-->
<head>
<title>JavaCC: TokenManager MiniTutorial</title>
<!-- Changed by: Michael Van De Vanter, 14-Jan-2003 -->
</head>
<body bgcolor="#FFFFFF" >
<h1>JavaCC [tm]: TokenManager MiniTutorial</h1>
<p>
The JavaCC [tm] lexical specification is organized into a set of "lexical
states". Each lexical state is named with an identifier. There is a
standard lexical state called DEFAULT. The generated token manager is
at any moment in one of these lexical states. When the token manager
is initialized, it starts off in the DEFAULT state, by default. The
starting lexical state can also be specified as a parameter while
constructing a token manager object.
</p>
<p>
Each lexical state contains an ordered list of regular expressions;
the order is derived from the order of occurrence in the input file.
There are four kinds of regular expressions: SKIP, MORE, TOKEN, and
SPECIAL_TOKEN.
</p>
<p>
All regular expressions that occur as expansion units in the grammar
are considered to be in the DEFAULT lexical state and their order of
occurrence is determined by their position in the grammar file.
</p>
<p>
A token is matched as follows: All regular expressions in the current
lexical state are considered as potential match candidates. The
token manager consumes the maximum number of characters from the input
stream possible that match one of these regular expressions. That is,
the token manager prefers the longest possible match. If there are
multiple longest matches (of the same length), the regular expression
that is matched is the one with the earliest order of occurrence in
the grammar file.
</p>
<p>
As mentioned above, the token manager is in exactly one state at any
moment. At this moment, the token manager only considers the regular
expressions defined in this state for matching purposes. After a match,
one can specify an action to be executed as well as a new lexical
state to move to. If a new lexical state is not specified, the token
manager remains in the current state.
</p>
<p>
The regular expression kind specifies what to do when a regular
expression has been successfully matched:
</p>
<dl>
<dt>SKIP</dt>
<dd>Simply throw away the matched string (after executing any lexical action).</dd>
<dt>MORE</dt>
<dd>Continue (to whatever the next state is) taking the matched string along. T
his string will be a prefix of the new matched string.</dd>
<dt>TOKEN</dt>
<dd>Create a token using the matched string and send it to the parser (or any caller).</dd>
<dt>SPECIAL_TOKEN</dt>
<dd>Creates a special token that does not participate in parsing. Already described earlier.</dd>
</dl>
<p>
(The mechanism of accessing special tokens is at the end of this
page)
</p>
<p>
Whenever the end of file <EOF> is detected, it causes the creation of
an <EOF> token (regardless of the current state of the lexical
analyzer). However, if an <EOF> is detected in the middle of a match
for a regular expression, or immediately after a MORE regular
expression has been matched, an error is reported.
</p>
<p>
After the regular expression is matched, the lexical action is
executed. All the variables (and methods) declared in the
TOKEN_MGR_DECLS region (see below) are available here for use. In
addition, the variables and methods listed below are also available
for use.
</p>
<p>
Immediately after this, the token manager changes state to that
specified (if any).
</p>
<p>
After that the action specified by the kind of the regular expression
is taken (SKIP, MORE, ... ). If the kind is TOKEN, the matched token
is returned. If the kind is SPECIAL_TOKEN, the matched token is saved
to be returned along with the next TOKEN that is matched.
</p>
<hr />
<p>
The following variables are available for use within lexical actions:
</p>
<ol>
<li>
StringBuffer image (READ/WRITE):
<br />
<br />
"image" (different from the "image" field of the matched token) is a
StringBuffer variable that contains all the characters that have been
matched since the last SKIP, TOKEN, or SPECIAL_TOKEN. You are free
to make whatever changes you wish to it so long as you do not assign
it to null (since this variable is used by the generated token manager
also). If you make changes to "image", this change is passed on to
subsequent matches (if the current match is a MORE). The content of
"image" *does not* automatically get assigned to the "image" field
of the matched token. If you wish this to happen, you must explicitly
assign it in a lexical action of a TOKEN or SPECIAL_TOKEN regular
expression.
<br />
<br />
Example:
<pre>
<DEFAULT> MORE : { "a" : S1 }
<S1> MORE :
{
"b"
{ int l = image.length()-1; image.setCharAt(l, image.charAt(l).toUpperCase()); }
^1 ^2
: S2
}
<S2> TOKEN :
{
"cd" { x = image; } : DEFAULT
^3
}
</pre>
In the above example, the value of "image" at the 3 points marked by
^1, ^2, and ^3 are:
<pre>
At ^1: "ab"
At ^2: "aB"
At ^3: "aBcd"
</pre>
</li>
<li> int lengthOfMatch (READ ONLY):
<br />
<br />
This is the length of the current match (is not cumulative over MORE's).
See example below. You should not modify this variable.
<br />
<br />
Example:
<br />
<br />
Using the same example as above, the values of "lengthOfMatch" are:
<pre>
At ^1: 1 (the size of "b")
At ^2: 1 (does not change due to lexical actions)
At ^3: 2 (the size of "cd")
</pre>
</li>
<li> int curLexState (READ ONLY):
<br />
<br />
This is the index of the current lexical state. You should not modify
this variable. Integer constants whose names are those of the lexical
state are generated into the ...Constants file, so you can refer to
lexical states without worrying about their actual index value.
</li>
<li> inputStream (READ ONLY):
<br />
<br />
This is an input stream of the appropriate type (one of
ASCII_CharStream, ASCII_UCodeESC_CharStream, UCode_CharStream, or
UCode_UCodeESC_CharStream depending on the values of options
UNICODE_INPUT and JAVA_UNICODE_ESCAPE). The stream is currently at
the last character consumed for this match. Methods of inputStream
can be called. For example, getEndLine and getEndColumn can be called
to get the line and column number information for the current match.
inputStream may not be modified.
</li>
<li> Token matchedToken (READ/WRITE):
<br />
<br />
This variable may be used only in actions associated with TOKEN and
SPECIAL_TOKEN regular expressions. This is set to be the token that
will get returned to the parser. You may change this variable and
thereby cause the changed token to be returned to the parser instead
of the original one. It is here that you can assign the value of
variable "image" to "matchedToken.image". Typically that's how your
changes to "image" has effect outside the lexical actions.
<br />
<br />
Example:
<br />
<br />
If we modify the last regular expression specification of the
above example to:
<pre>
<S2> TOKEN :
{
"cd" { matchedToken.image = image.toString(); } : DEFAULT
}
</pre>
Then the token returned to the parser will have its ".image" field
set to "aBcd". If this assignment was not performed, then the
".image" field will remain as "abcd".
</li>
<li> void SwitchTo(int):
<br />
<br />
Calling this method switches you to the specified lexical state. This
method may be called from parser actions also (in addition to being
called from lexical actions). However, care must be taken when using
this method to switch states from the parser since the lexical
analysis could be many tokens ahead of the parser in the presence of
large lookaheads. When you use this method within a lexical action,
you must ensure that it is the last statement executed in the action
(otherwise, strange things could happen). If there is a state change
specified using the ": state" syntax, it overrides all switchTo calls,
hence there is no point having a switchTo call when there is an
explicit state change specified. In general, calling this method
should be resorted to only when you cannot do it any other way. Using
this method of switching states also causes you to lose some of the
semantic checking that JavaCC does when you use the standard syntax.
</li>
</ol>
<hr />
<p>
Lexical actions have access to a set of class level declarations.
These declarations are introduced within the JavaCC file using the
following syntax:
</p>
<pre>
token_manager_decls ::=
"TOKEN_MGR_DECLS" ":"
"{" java_declarations_and_code "}"
</pre>
<p>
These declarations are accessible from all lexical actions.
</p>
<h2>Examples</h2>
<h3>Example 1: Comments</h3>
<pre>
SKIP :
{
"/*" : WithinComment
}
<WithinComment> SKIP :
{
"*/" : DEFAULT
}
<WithinComment> MORE :
{
<~[]>
}
</pre>
<h3>Example 2: String Literals with actions to print the length of the string</h3>
<pre>
TOKEN_MGR_DECLS :
{
int stringSize;
}
MORE :
{
"\"" {stringSize = 0;} : WithinString
}
<WithinString> TOKEN :
{
<STRLIT: "\""> {System.out.println("Size = " + stringSize);} : DEFAULT
}
<WithinString> MORE :
{
<~["\n","\r"]> {stringSize++;}
}
</pre>
<h2>How special tokens are sent to the parser</h2>
<p>
Special tokens are like tokens, except that they are permitted to
appear anywhere in the input file (between any two tokens). Special
tokens can be specified in the grammar input file using the reserved
word "SPECIAL_TOKEN" instead of "TOKEN" as in:
</p>
<pre>
SPECIAL_TOKEN :
{
<SINGLE_LINE_COMMENT: "//" (~["\n","\r"])* ("\n"|"\r"|"\r\n")>
}
</pre>
<p>
Any regular expression defined to be a SPECIAL_TOKEN may be accessed
in a special manner from user actions in the lexical and grammar
specifications. This allows these tokens to be recovered during
parsing while at the same time these tokens do not participate in the
parsing.
</p>
<p>
JavaCC has been bootstrapped to use this feature to automatically
copy relevant comments from the input grammar file into the generated
files.
</p>
<p>
Details:
</p>
<p>
The class Token now has an additional field:
</p>
<pre>
Token specialToken;
</pre>
<p>
This field points to the special token immediately prior to the
current token (special or otherwise). If the token immediately prior
to the current token is a regular token (and not a special token),
then this field is set to null. The "next" fields of regular tokens
continue to have the same meaning - i.e., they point to the next
regular token except in the case of the EOF token where the "next"
field is null. The "next" field of special tokens point to the
special token immediately following the current token. If the token
immediately following the current token is a regular token, the "next"
field is set to null.
</p>
<p>
This is clarified by the following example. Suppose you wish to print
all special tokens prior to the regular token "t" (but only those that
are after the regular token before "t"):
</p>
<pre>
if (t.specialToken == null) return;
// The above statement determines that there are no special tokens
// and returns control to the caller.
Token tmp_t = t.specialToken;
while (tmp_t.specialToken != null) tmp_t = tmp_t.specialToken;
// The above line walks back the special token chain until it
// reaches the first special token after the previous regular
// token.
while (tmp_t != null) {
System.out.println(tmp_t.image);
tmp_t = tmp_t.next;
}
// The above loop now walks the special token chain in the forward
// direction printing them in the process.
</pre>
</body>
</html>
|