/usr/share/doc/happy/html/sec-glr-misc.html is in happy 1.19.4-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | <html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>3.4. Further information</title><link rel="stylesheet" type="text/css" href="fptools.css"><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" href="index.html" title="Happy User Guide"><link rel="up" href="sec-glr.html" title="Chapter 3. Generalized LR Parsing"><link rel="prev" href="sec-glr-semantics.html" title="3.3. Including semantic results"><link rel="next" href="sec-AttributeGrammar.html" title="Chapter 4. Attribute Grammars"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">3.4. Further information</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="sec-glr-semantics.html">Prev</a> </td><th width="60%" align="center">Chapter 3. Generalized LR Parsing</th><td width="20%" align="right"> <a accesskey="n" href="sec-AttributeGrammar.html">Next</a></td></tr></table><hr></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sec-glr-misc"></a>3.4. Further information</h2></div></div></div><p>
Other useful information...
</p><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="sec-glr-misc-examples"></a>3.4.1. The GLR examples</h3></div></div></div><p>
The directory <code class="literal">examples/glr</code> contains several examples
from the small to the large. Please consult these or use them as a
base for your experiments.
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="sec-glr-misc-graphs"></a>3.4.2. Viewing forests as graphs</h3></div></div></div><p>
If you run the examples with <span class="application">GHC</span>, each
run will produce a file <code class="literal">out.daVinci</code>. This is a
graph in the format expected by the <span class="emphasis"><em>daVinci</em></span>
graph visualization tool.
(See <a class="ulink" href="http://www.informatik.uni-bremen.de/~davinci/" target="_top">http://www.informatik.uni-bremen.de/~davinci/</a>
for more information. Educational use licenses are currently
available without charge.)
</p><p>
We highly recommend looking at graphs of parse results - it really
helps to understand the results.
The graphs files are created with Sven Panne's library for
communicating with <span class="emphasis"><em>daVinci</em></span>, supplemented
with some extensions due to Callaghan. Copies of this code are
included in the examples directory, for convenience.
If you are trying to view large and complex graphs, contact Paul
Callaghan (there are tools and techniques to make the graphs more
manageable).
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="sec-glr-misc-applications"></a>3.4.3. Some Applications of GLR parsing</h3></div></div></div><p>
GLR parsing (and related techniques) aren't just for badly written
grammars or for things like natural language (NL) where ambiguity is
inescapable. There are applications where ambiguity can represent
possible alternatives in pattern-matching tasks, and the flexibility
of these parsing techniques and the resulting graphs support deep
analyses. Below, we briefly discuss some examples, a mixture from
our recent work and from the literature.
</p><div class="variablelist"><dl class="variablelist"><dt><span class="term">Gene sequence analysis</span></dt><dd><p>
Combinations of structures within gene sequences can be
expressed as a grammar, for example a "start" combination
followed by a "promoter" combination then the gene proper.
A recent undergraduate project has used this GLR implementation
to detect candiate matches in data, and then to filter these
matches with a mixture of local and global information.
</p></dd><dt><span class="term">Rhythmic structure in poetry</span></dt><dd><p>
Rhythmic patterns in (English) poetry obey certain rules,
and in more modern poetry can break rules in particular ways
to achieve certain effects. The standard rhythmic patterns
(eg. iambic pentameter) can be encoded as a grammar, and
deviations from the patterns also encoded as rules.
The neutral reading can be parsed with this grammar, to
give a forest of alternative matches. The forest can be
analysed to give a preferred reading, and to highlight
certain technical features of the poetry.
An undergraduate project in Durham has used this implementation
for this purpose, with promising results.
</p></dd><dt><span class="term">Compilers -- instruction selection</span></dt><dd><p>
Recent work has phrased the translation problem in
compilers from intermediate representation to an
instruction set for a given processor as a matching
problem. Different constructs at the intermediate
level can map to several combinations of machine
instructions. This knowledge can be expressed as a
grammar, and instances of the problem solved by
parsing. The parse forest represents competing solutions,
and allows selection of optimum solutions according
to various measures.
</p></dd><dt><span class="term">Robust parsing of ill-formed input</span></dt><dd><p>
The extra flexibility of GLR parsing can simplify parsing
of formal languages where a degree of `informality' is allowed.
For example, Html parsing. Modern browsers contain complex
parsers which are designed to try to extract useful information
from Html text which doesn't follow the rules precisely,
eg missing start tags or missing end tags.
Html with missing tags can be written as an ambiguous grammar,
and it should be a simple matter to extract a usable
interpretation from a forest of parses.
Notice the technique: we widen the scope of the grammar,
parse with GLR, then extract a reasonable solution.
This is arguably simpler than pushing an LR(1) or LL(1)
parser past its limits, and also more maintainable.
</p></dd><dt><span class="term">Natural Language Processing</span></dt><dd><p>
Ambiguity is inescapable in the syntax of most human languages.
In realistic systems, parse forests are useful to encode
competing analyses in an efficient way, and they also provide
a framework for further analysis and disambiguation. Note
that ambiguity can have many forms, from simple phrase
attachment uncertainty to more subtle forms involving mixtures
of word senses. If some degree of ungrammaticality is to be
tolerated in a system, which can be done by extending the
grammar with productions incorporating common forms of
infelicity, the degree of ambiguity increases further. For
systems used on arbitrary text, such as on newspapers,
it is not uncommon that many sentences permit several
hundred or more analyses. With such grammars, parse forest
techniques are essential.
Many recent NLP systems use such techniques, including
the Durham's earlier LOLITA system - which was mostly
written in Haskell.
</p></dd></dl></div></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="sec-glr-misc-workings"></a>3.4.4. Technical details</h3></div></div></div><p>
The original implementation was developed by Ben Medlock,
as his undergraduate final year project,
using ideas from Peter Ljungloef's Licentiate thesis
(see <a class="ulink" href="http://www.cs.chalmers.se/~peb/parsing" target="_top">http://www.cs.chalmers.se/~peb/parsing</a>, and
we recommend the thesis for its clear analysis of parsing
algorithms).
Ljungloef's version produces lists of parse trees, but Medlock
adapted this to produce an explicit graph containing parse structure
information. He also incorporated
the code into <span class="application">Happy</span>.
</p><p>
After Medlock's graduation, Callaghan extended the code to
incorporate semantic information, and made several improvements
to the original code, such as improved local packing and
support for hidden left recursion. The performance of the
code was significantly improved, after changes of representation
(eg to a chart-style data structure)
and technique. Medlock's code was also used in several student
projects, including analysis of gene sequences (Fischer) and
analysis of rhythmic patterns in poetry (Henderson).
</p><p>
The current code implements the standard GLR algorithm extended
to handle hidden left recursion. Such recursion, as in the grammar
below from Rekers [1992], causes the standard algorithm to loop
because the empty reduction <code class="literal">A -> </code> is always
possible and the LR parser will not change state. Alternatively,
there is a problem because an unknown (at the start of parsing)
number of <code class="literal">A</code>
items are required, to match the number of <code class="literal">i</code>
tokens in the input.
</p><pre class="programlisting">
S -> A Q i | +
A ->
</pre><p>
The solution to this is not surprising. Problematic recursions
are detected as zero-span reductions in a state which has a
<code class="literal">goto</code> table entry looping to itself. A special
symbol is pushed to the stack on the first such reduction,
and such reductions are done at most once for any token
alternative for any input position.
When popping from the stack, if the last token being popped
is such a special symbol, then two stack tails are returned: one
corresponding to a conventional pop (which removes the
symbol) and the other to a duplication of the special symbol
(the stack is not changed, but a copy of the symbol is returned).
This allows sufficient copies of the empty symbol to appear
on some stack, hence allowing the parse to complete.
</p><p>
The forest is held in a chart-style data structure, and this supports
local ambiguity packing (chart parsing is discussed in Ljungloef's
thesis, among other places).
A limited amount of packing of live stacks is also done, to avoid
some repetition of work.
</p><p>
[Rekers 1992] Parser Generation for Interactive Environments,
PhD thesis, University of Amsterdam, 1992.
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="sec-glr-misc-filter"></a>3.4.5. The <code class="option">--filter</code> option</h3></div></div></div><p>
You might have noticed this GLR-related option. It is an experimental
feature intended to restrict the amount of structure retained in the
forest by discarding everything not required for the semantic
results. It may or it may not work, and may be fixed in a future
release.
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="sec-glr-misc-limitations"></a>3.4.6. Limitations and future work</h3></div></div></div><p>
The parser supports hidden left recursion, but makes no attempt
to handle cyclic grammars that have rules which do not consume any
input. If you have a grammar like this, for example with rules like
<code class="literal">S -> S</code> or
<code class="literal">S -> A S | x; A -> empty</code>, the implementation will
loop until you run out of stack - but if it will happen, it often
happens quite quickly!
</p><p>
The code has been used and tested frequently over the past few years,
including being used in several undergraduate projects. It should be
fairly stable, but as usual, can't be guaranteed bug-free. One day
I will write it in Epigram!
</p><p>
If you have suggestions for improvements, or requests for features,
please contact Paul
Callaghan. There are some changes I am considering, and some
views and/or encouragement from users will be much appreciated.
Further information can be found on Callaghan's
<a class="ulink" href="http://www.dur.ac.uk/p.c.callaghan/happy-glr" target="_top">GLR parser
page</a>.
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="sec-glr-misc-acknowledgements"></a>3.4.7. Thanks and acknowledgements</h3></div></div></div><p>
Many thanks to the people who have used and tested this software
in its various forms, including Julia Fischer, James Henderson, and
Aarne Ranta.
</p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="sec-glr-semantics.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="sec-glr.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="sec-AttributeGrammar.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">3.3. Including semantic results </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 4. Attribute Grammars</td></tr></table></div></body></html>
|