/usr/share/jed/doc/txt/dfa.txt is in jed-common 1:0.99.19-2.1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | DFA-based Syntax Highlighting
=============================
DFA highlighting is an alternative syntax highlighting mechanism to
Jed's original simple one. It's a lot more powerful, but it takes up
more memory and makes the executable larger if it's compiled in.
It's also more difficult to design new highlighting modes for.
DFA highlighting works *alongside* Jed's standard highlighting system in
the sense that the user can choose which scheme is to be used on a
mode-by-mode basis.
Some examples of what DFA highlighting can do that the standard scheme
can't are:
- Correct separation of numeric tokens in C. The text `2+3' would
get highlighted as a single number by the old scheme, since `+' is
a valid numeric character (when preceded by an E). DFA
highlighting can spot that the `+' is not a valid numeric
character in _this_ instance, though, and correctly interpret it
as an operator.
- Enhanced HTML mode, in which tags containing mismatched quotes
(such as `<a href="filename>') can be highlighted in a different
colour from correctly formed tags.
- Much improved Perl mode, in general.
- PostScript mode, in which up to two levels of nested parentheses
can be detected inside a string constant.
Limitations
-----------
- Jed's DFA highlight rules work only on a line-by-line basis. Using
the DFA scheme, it is impossible to highlight multiline comments or
string literals.
- DFA rules replace the "traditional" highlighting scheme, so you
cannot have both highlight of multiline tokens and
regualar-expression based highlight rules.
Using DFA Highlighting
----------------------
If Jed is compiled with DFA highlighting enabled, it will define the
S-Lang preprocessor name `HAS_DFA_SYNTAX', and also define three
extra functions: `dfa_enable_highlight_cache', `dfa_define_highlight_rule'
and `dfa_build_highlight_table'. These are documented in Jed's ordinary
function help.
To implement a DFA highlighting scheme, you define a number of
highlighting rules using `dfa_define_highlight_rule', and then enable
the scheme using `dfa_build_highlight_table', which will build the
internal data structure (DFA table) that is actually used to do the
highlighting.
Generating the DFA table can take a long time, especially for complex
modes such as C (or even more so, PostScript). For this reason, the
DFA tables can be cached by the use of `dfa_enable_highlight_cache'.
You call this routine before defining any highlighting rules. If the
cache file exists, the DFA table will be loaded directly from it, and
the subsequent calls to `dfa_define_highlight_rule' and
`dfa_build_highlight_table' will do nothing. If the cache file does
not exist, then after Jed has built the DFA table it will attempt to
create the cache.
Cache files are searched along the set of paths specified by the
`Jed_Highlight_Cache_Path' variable. The default value for
`Jed_Highlight_Cache_Path' is $JED_ROOT/lib, which assumes that cache
files were created when the editor was installed via the optional
installation step
jed -batch -l preparse
On systems such as Unix, the average user has no permission to create
cache files in $JED_ROOT/lib. Hence, if the necessary cache files
were not ceated during the installation step, it may be advantageous
for the user to set the `Jed_Highlight_Cache_Dir' variable to a
directory where cache files may be created.
Highlighting Rules
------------------
Highlighting rules are basically regular expressions. You define
regular-expression patterns for the objects that you want to
highlight, and specify the colour that each object should be
highlighted. Colours are specified as `keyword', `normal',
`operator', `delimiter' and so on.
A sample highlighting rule, from C mode, might look like this:
dfa_define_highlight_rule("0[xX][0-9A-Fa-f]*[LlUu]*", "number", "C");
This specified that in the syntax table called `C', any object
matching the regular expression `0[xX][0-9A-Fa-f]*[LU]*' should be
highlighted in the colour assigned to numbers. This regular
expression matches C hexadecimal integer constants: a zero, an X (of
either case), a sequence of hex digits, and optionally an L or a U
on the end (for `long' or `unsigned').
Regular expression syntax is as follows:
- A normal character matches itself. Normal characters include
everything except special characters, which are ^ $ | * + ? [ ] -
. ( ) and the backslash \.
- A character class [abcde] matches any one of the characters inside
it. Ranges can be specified with a dash, e.g. [a-e]. A character
class starting with a caret matches any single character _not_
inside it, e.g. [^a-e] matches anything except a, b, c, d or e.
- A period (.) matches any character.
- A character, or a character class, or a regular expression in
parentheses, can be followed by *, + or ?. If followed by * then
it will match any number of occurrences of the original
expression, including none at all; followed by + it will match any
number *not* including zero; followed by ? it will match zero or
one.
- Two regular expressions separated by | will match either one.
- A caret at the beginning of an expression causes it to match only
when at the beginning of a line. A dollar at the end causes it to
match only when at the end.
- If you want to match one of the special characters, you can remove
its special properties by placing a backslash before it. This
includes the backslash itself.
So, for example:
apple|banana matches `apple' or `banana'
(apple|banana)? matches `apple', `banana' or nothing
b[ae]d matches `bad' or `bed'
[a-e] matches `a', `b', `c', `d' or `e'
[a\-e] matches `a', `-' or `e'
^#include matches `#include', but only at the start
of a line
'[^']*' matches any sequence of non-single-quotes
with a single-quote at each end, such as
a Pascal string literal
'[^']$ matches any sequence of non-single-quotes
with a single-quote at the beginning and
occurring at the end of a line, such as
a Pascal string literal that the user has
not finished typing
To define a highlight rule, you think up the regular expression,
express it as an S-Lang string literal, and include it in a call to
`dfa_define_highlight_rule'.
CAUTION: S-Lang strings obey the same syntax as C strings. This
means that if you need a double quote or a backslash as part of your
regular expression, you have to put *another* backslash before it
when you write it as an S-Lang string. So the fifth example above
might read
dfa_define_highlight_rule ("[a\\-e]", ...);
with the backslash doubled. SLang-2 introduced a suffix notation for literal
strings, so now it is possible to avoid the doubling of backslashes by use
of the do-not-expand 'R' suffix. The above example can be written as
dfa_define_highlight_rule ("[a\-e]"R, ...);
Extra Magical Bits
------------------
The second argument to `dfa_define_highlight_rule' is a colour name.
This colour name can be prefixed by a few special letters for extra
magical effects:
`Q' causes the match to be _quick_. Most of the time, the regular
expression matcher finds the _longest_ string starting at the
current position that matches something. A `Q' rule will match with
far higher priority, and will match the _shortest_ string possible.
For example, consider the expression `/\*.*\*/' which matches `/*',
then any sequence of characters, then `*/' - a one-line C comment.
The difficulty is that C comments do not nest, and a sequence like
/* comment */ not comment */
should only be highlighted as a comment up to the _first_ `*/'. The
normal longest-match heuristic will highlight the _whole_ thing as a
comment, which is wrong. You can get round this by defining the rule
as quick, like this:
dfa_define_highlight_rule("/\\*.*\\*/", "Qcomment", "C");
`P' denotes a _preprocessor-type_ rule. Preprocessor-type rules
state that not only should the matched text be given the specified
colour, but so should everything on the rest of the line, _except_
things in the comment colour. This allows comments on preprocessor
lines, with quite a high level of sophistication: defining, in C mode,
dfa_define_highlight_rule("^[ \t]*#", "PQpreprocess", "C");
will cause the following effects:
#define FLAG comes up in preprocessor colour
#define FLAG /* comment */ the comment is highlighted right
#include "/*sdfs*/" the comment does _not_ get seen!
Finally, `K' defines a _keyword_ rule. In a keyword rule, the
matched text is compared to the active keyword tables for the syntax
scheme, and given the correct keyword colour if a match is found.
If no keyword matches the text, the text will be highlighted in the
colour that was _actually_ specified in the rule.
Further Reading
---------------
If you want to design _really_ complicated highlighting schemes, it
may be that a full understanding of the principles and theory behind
the DFA scheme may be helpful. Most books on compiler theory will
give a good discussion of this.
|