/usr/share/doc/styx-doc/html/styx-5.html is in styx-doc 2.0.1-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.66">
<TITLE>The Styx Handbook: Mapping Trees to Terms</TITLE>
<LINK HREF="styx-6.html" REL=next>
<LINK HREF="styx-4.html" REL=previous>
<LINK HREF="styx.html#toc5" REL=contents>
</HEAD>
<BODY>
<A HREF="styx-6.html">Next</A>
<A HREF="styx-4.html">Previous</A>
<A HREF="styx.html#toc5">Contents</A>
<HR>
<H2><A NAME="s5">5.</A> <A HREF="styx.html#toc5">Mapping Trees to Terms</A></H2>
<P>As outlined in the introduction, it is not only an advantage for a proper interface on a derivation tree
to use an (abstract) depth grammar, but also to switch from grammatical to algebraic notions. Basically
by this, non-terminals are mapped onto (abstract data-)<EM>types</EM> and productions are mapped onto
<EM>functions</EM>. Words in the domain of languages became <EM>terms</EM>. Grammars are treated as
<EM>signatures</EM> (loosely speaking "header files") of term algebras, then. This is far more than an
overall renaming, but a transition to a different more appropriate concept with different tools and
properties.</P>
<P>Within Styx, the transition from the concrete (surface) grammar to the corresponding term algebra is done
in one step, and one final outcome is the C interface of a concrete language. The abstract grammar is
somewhat bypassed, see the notes at the end of the section.</P>
<P>This section mainly defines how we derive the term algebra from the concrete grammar. Having this, the
C interface can be explained in more detail.</P>
<H2><A NAME="ss5.1">5.1</A> <A HREF="styx.html#toc5.1">Well-formed productions</A>
</H2>
<P>Before we can do our transformation, we have to place some requirements onto the concrete grammar first.
These conditions are tested by the Styx system and non-well-formed productions are diagnosed.</P>
<P>Within this and the following subsections, we ignore any keyword members in the production bodies. This
may or may not be indicated. Further, we treat the individual production rules (with keywords ignored)
as predicates.</P>
<P>A production is well-formed if it belongs to one of the following groups:</P>
<P>
<UL>
<LI><CODE>let X :nil#*:</CODE> where the production name starts with "nil", optionally
followed by a natural number, and the production contains
no (identifier) members.</LI>
<LI><CODE>let X :cons#*: Y Z</CODE> where the production name is "cons", optionally numbered,
and contains exactly two (identifier) members.</LI>
<LI><CODE>let X :ign#+ : Y</CODE> where the production name starts with "ign"i, followed by
a natural number and the production contains exactly
one (identifier) member.</LI>
<LI><CODE>let X :name : X1 .. Xn</CODE> where "name" does not start with "nil", "cons" or "ign".
No restriction apply to the production members.
</LI>
<LI><EM>EBNF extension (styx version >= 1.8):</EM></LI>
<LI><CODE>let X :name : X1 .. Xn</CODE> where "name" is not "none", "some" or
starts with "nil", "cons", "ign".
No restriction apply to the production members.</LI>
<LI><CODE>let X :none :</CODE> where the production contains no (identifier) members.</LI>
<LI><CODE>let X :some : Y</CODE>where the production contains exactly one (identifier) member.
</LI>
</UL>
</P>
<P>This grouping serves two purposes. The first two groups will be used to derive list-like productions,
while the "ign" production is used to define identity productions. The later typically occur with
expressions that have different levels binding strength or when likely classes of productions are excluded
or included into certain contexts. When producing the abstract grammar, we consider these non-terminals to
be equivalent. </P>
<P>As examples, see the definitions of Exp, Exp1-2 in the introduction and Exp, ExpDyck, ExpQuot, Exp0-4 in
the lexical Styx grammar itself. For lists, the context free grammar of Styx (Dfns, Prds, Mbrs) are proper
examples.</P>
<P>The last two groups will be used to derive option-like productions.</P>
<H2><A NAME="ss5.2">5.2</A> <A HREF="styx.html#toc5.2">An induced congruence relation</A>
</H2>
<P>We get rid of the superficial distinction between the different non-terminals by means of a congruence
relation over the non-terminal names induced by the special production names "cons#*","nil#*" and "ign#+".</P>
<P>The congruence relation is defined as follows:
<UL>
<LI>X <=> X</LI>
<LI>X <=> Y --> Y <=> X</LI>
<LI>X <=> Y && Y <=> Z --> X <=> Z</LI>
<LI>let X :ign#+: Y --> X <=> Y</LI>
<LI>let X :cons#*: Y Z --> X <=> Z </LI>
<LI>X <=> Y && let X :id: X1 .. Xn && let Y :id: Y1 .. Yn && 1 <= i <= n --> Xi <=> Yi</LI>
</UL>
</P>
<P>While the first three formulas define an equality by stating the properties of the relation
(reflexive,symmetric,transitive), the next two specify the equations induced by the concrete grammar.
By this, two non-terminals are treated as equivalent when they appear both on the left and right side
of an "ign" production or on the left side and on the end of a "cons" production (in which case they
both mean lists of the same type later).</P>
<P>The final rule makes a congruence from this equality. It states, that if we have two equivalent
non-terminals, that both contain productions with the same name, then the equality is extended over
the bodies of that productions by pairing each identifier successively and concluding the equality of
the so-yielded pairs (ignoring keyword members).</P>
<H2><A NAME="ss5.3">5.3</A> <A HREF="styx.html#toc5.3">Classes and Representatives</A>
</H2>
<P>What we have gained so far is that we have evtl. grouped different (terminal and non-terminal) identifiers
into the classes introduced by the above congruence relation. Using this relation each identifier corresponds
to a set of its equivalents.
As an example, "Exp2" in the introduction example expands to the set [Exp2] = {Exp,Exp1,Exp2}.
These classes will later be mapped to the abstract types of the term algebra to be produced.</P>
<P>[X] = { Y | Y <=> X }</P>
<P>We assign to each of these classes a unique name by picking the lexically smallest identifier as the
representative of the class. In our example, this is "Exp". We denote the so chosen representative
of [X] by X^.</P>
<H2><A NAME="ss5.4">5.4</A> <A HREF="styx.html#toc5.4">Compatibility Conditions</A>
</H2>
<P>Having set up equivalent identifiers, we now come to the productions. Basically, all we have to do is to
merge the productions of the different equivalent non-terminals and to drop the "ign" productions. But this
is only possible under additional conditions. Basically, what can go wrong is, that by the congruence
terminal and non-terminals have been concluded to be equivalent, that we cannot merge productions with same
names and different numbers of (identifier) members, and that lists would contain additional non-list
productions.</P>
<P>This leads to the following conditions:
<UL>
<LI>X <=> Y --> Type(X) = Type(Y), where Type(Z) = { terminal, nonterminal }</LI>
<LI>let X :id: X1 .. Xm && let Y :id: Y1 .. Yn && X <=> Y --> m = n</LI>
<LI>(let X :nil#*: || let X :cons#*: A B) -->
not exists P,prod: P <=> X && let P :prod: c && prod not in { ign#+, nil#*, cons#* }
</LI>
<LI><EM>EBNF extension (styx version >= 1.8):</EM></LI>
<LI>(let X^ :none: a || let X^ :some: b)
--> not exists P: P = let X^ :id: c && id =/= { ign#+, none, some }</LI>
<LI>(let X^ :none: a)
--> exists P: P = let X^ :some: b</LI>
<LI>(let X^ :some: a)
--> exists P: P = let X^ :none: b</LI>
</UL>
</P>
<P>While generating the abstract grammar, Styx will validate these compatibility conditions.</P>
<H2><A NAME="ss5.5">5.5</A> <A HREF="styx.html#toc5.5">Conversion to term algebras</A>
</H2>
<P>After all this preparation and conditions, we can finally convert the concrete grammar to a signature.</P>
<P>To do this, we map all non-terminals NT which does not have list-productions (those named "cons#*" or "nil#*")
or - optionally - option-productions (those named "none" or "some") to their representative names NT^.
Likely, all terminal names T are mapped to their representatives T^.
Collectively, these form the types of the algebra.</P>
<P>Every non-list and - optionally - non-option production (ignoring keywords) of the form "let X :prod: X1 .. Xn"
is mapped to a function "prod : |X1| .. |Xn| -> X^". "|Xi|" denotes here the right hand side translation of
the (non-)terminal names to types. The difference is, that we have to cope with list-production and - optionally -
option-production, which have been omitted earlier.
|X| is X^ if we have a non-list and - optionally - non-terminal or a terminal X.
If X is a non-terminal with a production "let A :cons#*: B C" and X <=> A, |X| is List(|B|).
(If we only have nil-productions, the translation is List(void)).
Optionally: If X is a non-terminal with a production "let A :some: B" and X <=> A, |X| is Option(|B|). </P>
<P>The set of the so-yielded functions forms the signature of the derived term algebra and what we finally get
as a data model for the (abstract) derivation tree is the initial term algebra that corresponds to this
signature.</P>
<P>To give another example for this derivation, here is the abstract of the Styx grammar itself:</P>
<P>
<PRE>
/* ------------------------------------------------------------------------ */
/* */
/* [styx.abs] Abstract Grammar */
/* */
/* ------------------------------------------------------------------------ */
LANGUAGE styx
TOKENS
Ide, Nat, Set, Seq
TYPES
styx = Start_Source(Source)
Source = root(OptNat, Ide, QlxDfn*, OptCfg)
OptCfg = non;
cfg(Dfn*, Conflict*)
QlxDfn = defd(Ide);
defn(QlxCat, QlxOpt, QlxGrp, Ide, QlxGrp, Exp);
igrp(Ide);
tgrp(Ide);
mgrp(Ide, Ide*);
xgrp(Ide)
QlxCat = comC;
indC;
letC;
tokC;
lanC;
ignC
QlxGrp = non;
pigrp;
pop;
igrp;
pgrp(Ide);
grp(Ide)
QlxOpt = ignca;
non
Exp = conc(Exp, Exp);
diff(Exp, Exp);
sequ(Seq);
plusn(Exp, Limit);
plus0(Exp);
dyck(Exp, Exp, Exp);
non;
opt(Exp);
range(Exp, Exp);
plus(Exp);
epat(Exp, Ide, Exp);
set(Set);
union(Exp, Exp);
quot(Exp, Exp);
ident(Ide);
star(Exp);
spat(Exp, Set, Exp)
OptNat = non;
nat(Nat)
Limit = range(Nat, OptNat);
ntime(Nat)
Dfn = defn(Cat, DfnOpt, Ide, Prd*)
Cat = letC;
bgnC
DfnOpt = non;
errnt
Lay = grp;
rec;
dft
Prd = prod(Lay, Ide, Mbr*)
Mbr = opt(Seq*, Mbr, Seq*);
dtok(Ide, Ide);
klst1(Seq*, Mbr, Seq*, Seq*);
tkm(Seq);
ntm(Ide);
klst0(Seq*, Mbr, Seq*, Seq*);
else
Conflict = defn(State, Token, Rule*)
State = nat(Nat);
ide(Ide);
seq(Seq)
Token = seq(Seq);
ide(Ide)
Rule = red(Ide, Ide)
</PRE>
</P>
<P>Two notes on notation: The form List(X) is denoted as X*, the form Option(X) as X?.
The functions are abbreviated for convenience extracting the result type,
so Exp = ... star(Exp) denotes the function star: Exp -> Exp. For constants, i.e.
functions with no parameters, the argument parenthesis are omitted, so "QlxOpt = non; ignca"
spells the two functions non: -> QlxOpt and ignca: -> QlxOpt.</P>
<P>There's an extra rule for the start production(s) one may deduce from the examples.</P>
<H2><A NAME="ss5.6">5.6</A> <A HREF="styx.html#toc5.6">A note on the implementation</A>
</H2>
<P>After all these definitions, we only have the mapping from a grammar to a signature. The mapping from the
concrete derivation tree onto corresponding terms is straight forward and will be informally explained
by having a look on the implementation.</P>
<P>One might expect that the derivation tree is copied to yield a term. But this is not the case. Instead, the
above introduced mapping has been carefully chosen to be done on the fly. So, what the Styx parser produces,
is the concrete derivation tree as shown in the introducing example. With delivering it, it's job is done.
The conventions defined in these section are implemented only within the C interface, which permits an
abstract access to the concrete derivation tree.</P>
<P>All the necessary normalization is done within the access functions, the "term destructors" of the
C interface. Looking closer at the structure of the derivation tree, one can already imagine what these
functions have to do.
Provided with a reference to the tree, they decent into it skipping every "ign" production they pass.
After this, they end at a normal production and have to check for the production identifier. If this is OK,
they start decomposing the children of the node into the result slot, thereby skipping all keywords and
comments they meet. That's all.</P>
<P>The advantage of doing so is, that while having a rather compact view on the derivation tree, complete source
information is still preserved and can be accessed from this abstract view whenever needed.</P>
<P>In practice, this interface have both been proven to be efficient as well during language design and
application. Often the design starts out with the abstract grammar finding an appropriate surface grammar,
or having the surface grammar already given, concentrates on extracting a proper abstract grammar, which can
be easily done just by assigning proper production names.</P>
<H2><A NAME="ss5.7">5.7</A> <A HREF="styx.html#toc5.7">Relation between the abstract grammar and the algebra.</A>
</H2>
<P>Only to prevent confusion between terms and words in abstract grammar which are sometimes loosely treated as
synonyms in the text, a few additional notes apply here.</P>
<P>When talking about terms, we're talking about values having specific types. These values are abstract in that
we do not offer details of their implementation through their interface. In fact, Styx can produce different
implementation of terms with an identical interface. Abstract data types are abstract with respect to their
implementation.</P>
<P>In contrary, abstract grammars are abstractions of the surface details of the concrete grammar. An abstract
grammar preserves the depth structure of the language, but simplifies the derivation tree by dropping
unnecessary details as done above, for example. One can easily see that the structure is preserved by
recognizing, that the mapping from the concrete derivation tree to it's abstract correspondent is a homomorphism.</P>
<P>The ".abs" file generated by Styx has both possible readings. On can read it as the abstract grammar as well
as the signature of the term algebra.</P>
<P>Referring to the first, we can write down words of the abstract grammar, too. The word "1+2*(3-4)/5" of the
introduction example would spell "add(1,div(mul(2,sub(3,4)),5))" in the abstract grammar. Superficially,
this looks precisely like a denoted term of the corresponding algebra.</P>
<HR>
<A HREF="styx-6.html">Next</A>
<A HREF="styx-4.html">Previous</A>
<A HREF="styx.html#toc5">Contents</A>
</BODY>
</HTML>
|