/usr/share/doc/ne/html/Regular-Expressions.html is in ne-doc 3.0.1-2build1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
<head>
<title>ne’s manual: Regular Expressions</title>
<meta name="description" content="ne’s manual: Regular Expressions">
<meta name="keywords" content="ne’s manual: Regular Expressions">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="index.html#Top" rel="start" title="Top">
<link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
<link href="Command-Index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Reference.html#Reference" rel="up" title="Reference">
<link href="Automatic-Preferences.html#Automatic-Preferences" rel="next" title="Automatic Preferences">
<link href="Prefs.html#Prefs" rel="prev" title="Prefs">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.indentedblock {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
div.smalllisp {margin-left: 3.2em}
kbd {font-style:oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nocodebreak {white-space:nowrap}
span.nolinebreak {white-space:nowrap}
span.roman {font-family:serif; font-weight:normal}
span.sansserif {font-family:sans-serif; font-weight:normal}
ul.no-bullet {list-style: none}
-->
</style>
</head>
<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
<a name="Regular-Expressions"></a>
<div class="header">
<p>
Next: <a href="Automatic-Preferences.html#Automatic-Preferences" accesskey="n" rel="next">Automatic Preferences</a>, Previous: <a href="Menus.html#Menus" accesskey="p" rel="prev">Menus</a>, Up: <a href="Reference.html#Reference" accesskey="u" rel="up">Reference</a> [<a href="Command-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<a name="Regular-Expressions-1"></a>
<h3 class="section">3.8 Regular Expressions</h3>
<a name="index-Regular-Expressions"></a>
<p>Regular expressions are a powerful way of specifying complex search and
replace operations. <code>ne</code> supports the full regular expression
syntax on US-ASCII and 8-bit buffers, but has to impose a restriction on
character sets when searching in UTF-8 text. See <a href="UTF_002d8-Support.html#UTF_002d8-Support">UTF-8 Support</a>.
</p>
<a name="Syntax-1"></a>
<h4 class="subsection">3.8.1 Syntax</h4>
<p>The following section is taken (with minor modifications) from the GNU regular
expression library documentation and is Copyright © Free Software
Foundation.
</p>
<p>A regular expression describes a set of strings. The simplest case is one
that describes a particular string; for example, the string ‘<samp>foo</samp>’ when
regarded as a regular expression matches ‘<samp>foo</samp>’ and nothing else.
Nontrivial regular expressions use certain special constructs so that they
can match more than one string. For example, the regular expression
‘<samp>foo|bar</samp>’ matches either the string ‘<samp>foo</samp>’ or the string
‘<samp>bar</samp>’; the regular expression ‘<samp>c[ad]*r</samp>’ matches any of the strings
‘<samp>cr</samp>’, ‘<samp>car</samp>’, ‘<samp>cdr</samp>’, ‘<samp>caar</samp>’, ‘<samp>cadddar</samp>’ and all other
such strings with any number of ‘<samp>a</samp>’’s and ‘<samp>d</samp>’’s.
</p>
<p>Regular expressions have a syntax in which a few characters are special
constructs and the rest are <em>ordinary</em>. An ordinary character is a
simple regular expression which matches that character and nothing else. The
special characters are ‘<samp>$</samp>’, ‘<samp>^</samp>’, ‘<samp>.</samp>’, ‘<samp>*</samp>’, ‘<samp>+</samp>’,
‘<samp>?</samp>’, ‘<samp>[</samp>’, ‘<samp>]</samp>’ , ‘<samp>(</samp>’, ‘<samp>)</samp>’ and ‘<samp>\</samp>’. Any other
character appearing in a regular expression is ordinary, unless a ‘<samp>\</samp>’
precedes it.
</p>
<p>For example, ‘<samp>f</samp>’ is not a special character, so it is ordinary,
and therefore ‘<samp>f</samp>’ is a regular expression that matches the string ‘<samp>f</samp>’
and no other string. (It does <em>not</em> match the string ‘<samp>ff</samp>’.) Likewise,
‘<samp>o</samp>’ is a regular expression that matches only ‘<samp>o</samp>’.
</p>
<p>Any two regular expressions <var>a</var> and <var>b</var> can be concatenated.
The result is a regular expression that matches a string if <var>a</var>
matches some amount of the beginning of that string and <var>b</var>
matches the rest of the string.
</p>
<p>As a simple example, we can concatenate the regular expressions
‘<samp>f</samp>’ and ‘<samp>o</samp>’ to get the regular expression ‘<samp>fo</samp>’,
which matches only the string ‘<samp>fo</samp>’. Still trivial.
</p>
<p>Note: special characters are treated as ordinary ones if they are in
contexts where their special meanings make no sense. For example,
‘<samp>*foo</samp>’ treats ‘<samp>*</samp>’ as ordinary since there is no preceding
expression on which the ‘<samp>*</samp>’ can act. It is poor practice to depend on
this behaviour; better to quote the special character anyway, regardless of
where is appears.
</p>
<p>The following are the characters and character sequences that have special
meaning within regular expressions. Any character not mentioned here is not
special; it stands for exactly itself for the purposes of searching and
matching.
</p>
<dl compact="compact">
<dt>‘<samp>.</samp>’</dt>
<dd><p>is a special character that matches anything except a newline. Using
concatenation, we can make regular expressions like ‘<samp>a.b</samp>’, which matches
any three-character string which begins with ‘<samp>a</samp>’ and ends with
‘<samp>b</samp>’.
</p>
</dd>
<dt>‘<samp>*</samp>’</dt>
<dd><p>is not a construct by itself; it is a suffix, which means the preceding
regular expression is to be repeated as many times as possible. In
‘<samp>fo*</samp>’, the ‘<samp>*</samp>’ applies to the ‘<samp>o</samp>’, so ‘<samp>fo*</samp>’ matches
‘<samp>f</samp>’ followed by any number of ‘<samp>o</samp>’’s.
</p>
<p>The case of zero ‘<samp>o</samp>’’s is allowed: ‘<samp>fo*</samp>’ does match
‘<samp>f</samp>’.
</p>
<p>‘<samp>*</samp>’ always applies to the <em>smallest</em> possible preceding
expression. Thus, ‘<samp>fo*</samp>’ has a repeating ‘<samp>o</samp>’, not a repeating
‘<samp>fo</samp>’.
</p>
</dd>
<dt>‘<samp>+</samp>’</dt>
<dd><p>‘<samp>+</samp>’ is like ‘<samp>*</samp>’ except that at least one match for the preceding
pattern is required for ‘<samp>+</samp>’. Thus, ‘<samp>c[ad]+r</samp>’ does not match
‘<samp>cr</samp>’ but does match anything else that ‘<samp>c[ad]*r</samp>’ would match.
</p>
</dd>
<dt>‘<samp>?</samp>’</dt>
<dd><p>‘<samp>?</samp>’ is like ‘<samp>*</samp>’ except that it allows either zero or one match for
the preceding pattern. Thus, ‘<samp>c[ad]?r</samp>’ matches ‘<samp>cr</samp>’ or ‘<samp>car</samp>’
or ‘<samp>cdr</samp>’, and nothing else.
</p>
</dd>
<dt>‘<samp>[ … ]</samp>’</dt>
<dd><p>‘<samp>[</samp>’ begins a <em>character set</em>, which is terminated by a ‘<samp>]</samp>’.
In the simplest case, the characters between the two form the set.
Thus, ‘<samp>[ad]</samp>’ matches either ‘<samp>a</samp>’ or ‘<samp>d</samp>’,
and ‘<samp>[ad]*</samp>’ matches any string of ‘<samp>a</samp>’’s and ‘<samp>d</samp>’’s
(including the empty string), from which it follows that
‘<samp>c[ad]*r</samp>’ matches ‘<samp>car</samp>’, <i>et cetera</i>.
</p>
<p>Character ranges can also be included in a character set, by writing two
characters with a ‘<samp>-</samp>’ between them. Thus, ‘<samp>[a-z]</samp>’ matches any
lower-case letter. Ranges may be intermixed freely with individual
characters, as in ‘<samp>[a-z$%.]</samp>’, which matches any lower case letter or
‘<samp>$</samp>’, ‘<samp>%</samp>’ or period.
</p>
<p>Note that the usual special characters are not special any more inside a
character set. A completely different set of special characters exists
inside character sets: ‘<samp>]</samp>’, ‘<samp>-</samp>’ and ‘<samp>^</samp>’.
</p>
<p>To include a ‘<samp>]</samp>’ in a character set, you must make it
the first character. For example, ‘<samp>[]a]</samp>’ matches ‘<samp>]</samp>’ or ‘<samp>a</samp>’.
To include a ‘<samp>-</samp>’, you must use it in a context where it cannot possibly
indicate a range: that is, as the first character, or immediately
after a range.
</p>
<p>Note that when searching in UTF-8 text, a character set may contain
US-ASCII characters only.
</p>
</dd>
<dt>‘<samp>[^ … ]</samp>’</dt>
<dd><p>‘<samp>[^</samp>’ begins a <em>complement character set</em>, which matches any
character except the ones specified. Thus, ‘<samp>[^a-z0-9A-Z]</samp>’ matches
all characters <em>except</em> letters and digits. Also in this case, when
searching in UTF-8 text a complemented character set may contain US-ASCII
characters only.
</p>
<p>‘<samp>^</samp>’ is not special in a character set unless it is the first character.
The character following the ‘<samp>^</samp>’ is treated as if it were first (it may
be a ‘<samp>-</samp>’ or a ‘<samp>]</samp>’).
</p>
</dd>
<dt>‘<samp>^</samp>’</dt>
<dd><p>is a special character that matches the empty string – but only if at the
beginning of a line in the text being matched. Otherwise it fails to match
anything. Thus, ‘<samp>^foo</samp>’ matches a ‘<samp>foo</samp>’ that occurs at the
beginning of a line.
</p>
</dd>
<dt>‘<samp>$</samp>’</dt>
<dd><p>is similar to ‘<samp>^</samp>’ but matches only at the end of a line. Thus,
‘<samp>xx*$</samp>’ matches a string of one or more ‘<samp>x</samp>’’s at the end of a
line.
</p>
</dd>
<dt>‘<samp>\</samp>’</dt>
<dd><p>has two functions: it quotes the above special characters (including
‘<samp>\</samp>’), and it introduces additional special constructs.
</p>
<p>Because ‘<samp>\</samp>’ quotes special characters, ‘<samp>\$</samp>’ is a regular
expression that matches only ‘<samp>$</samp>’, and ‘<samp>\[</samp>’ is a regular
expression that matches only ‘<samp>[</samp>’, and so on.
</p>
<p>For the most part, ‘<samp>\</samp>’ followed by any character matches only that
character. However, there are several exceptions: characters which, when
preceded by ‘<samp>\</samp>’, are special constructs. Such characters are always
ordinary when encountered on their own.
</p>
</dd>
<dt>‘<samp>|</samp>’</dt>
<dd><p>specifies an alternative. Two regular expressions <var>a</var> and <var>b</var> with
‘<samp>|</samp>’ in between form an expression that matches anything that either
<var>a</var> or <var>b</var> will match.
</p>
<p>Thus, ‘<samp>foo|bar</samp>’ matches either ‘<samp>foo</samp>’ or ‘<samp>bar</samp>’ but no other
string.
</p>
<p>‘<samp>|</samp>’ applies to the largest possible surrounding expressions. Only a
surrounding ‘<samp>( … )</samp>’ grouping can limit the grouping power of
‘<samp>|</samp>’.
</p>
</dd>
<dt>‘<samp>( … )</samp>’</dt>
<dd><p>is a grouping construct that serves three purposes:
</p>
<ol>
<li> To enclose a set of ‘<samp>|</samp>’ alternatives for other operations.
Thus, ‘<samp>(foo|bar)x</samp>’ matches either ‘<samp>foox</samp>’ or ‘<samp>barx</samp>’.
</li><li> To enclose a complicated expression for the postfix ‘<samp>*</samp>’ to operate on.
Thus, ‘<samp>ba(na)*</samp>’ matches ‘<samp>bananana</samp>’ <i>et cetera</i>, with any (zero or
more) number of ‘<samp>na</samp>’’s.
</li><li> To mark a matched substring for future reference.
</li></ol>
<p>This last application is not a consequence of the idea of a parenthetical
grouping; it is a separate feature that happens to be assigned as a second
meaning to the same ‘<samp>( … )</samp>’ construct because there is no
conflict in practice between the two meanings. Here is an explanation of
this feature:
</p>
</dd>
<dt>‘<samp>\<var>digit</var></samp>’</dt>
<dd><p>After the end of a ‘<samp>( … )</samp>’ construct, the matcher remembers the
beginning and end of the text matched by that construct. Then, later on in
the regular expression, you can use ‘<samp>\</samp>’ followed by <var>digit</var> to mean
“match the same text matched the <var>digit</var>’th time by the ‘<samp>(
… )</samp>’ construct.” The ‘<samp>( … )</samp>’ constructs are numbered
in order of commencement in the regexp.
</p>
<p>The strings matching the first nine ‘<samp>( … )</samp>’ constructs appearing
in a regular expression are assigned numbers 1 through 9 in order of their
beginnings.
‘<samp>\1</samp>’ through ‘<samp>\9</samp>’ may be used to refer to the text matched by
the corresponding ‘<samp>( … )</samp>’ construct.
</p>
<p>For example, ‘<samp>(.+)\1</samp>’ matches any non empty string that is composed of
two identical halves. The ‘<samp>(.+)</samp>’ matches the first half, which may be
anything non empty, but the ‘<samp>\1</samp>’ that follows must match the same exact
text.
</p>
</dd>
<dt>‘<samp>\b</samp>’</dt>
<dd><p>matches the empty string, but only if it is at the beginning or
end of a word. Thus, ‘<samp>\bfoo\b</samp>’ matches any occurrence of
‘<samp>foo</samp>’ as a separate word. ‘<samp>\bball(s|)\b</samp>’ matches
‘<samp>ball</samp>’ or ‘<samp>balls</samp>’ as a separate word.
</p>
</dd>
<dt>‘<samp>\B</samp>’</dt>
<dd><p>matches the empty string, provided it is <em>not</em> at the beginning or end
of a word.
</p>
</dd>
<dt>‘<samp>\<</samp>’</dt>
<dd><p>matches the empty string, but only if it is at the beginning
of a word.
</p>
</dd>
<dt>‘<samp>\></samp>’</dt>
<dd><p>matches the empty string, but only if it is at the end of a word.
</p>
</dd>
<dt>‘<samp>\w</samp>’</dt>
<dd><p>matches any word-constituent character. These are US-ASCII letters,
numbers and the underscore, independently of the buffer encoding.
</p>
</dd>
<dt>‘<samp>\W</samp>’</dt>
<dd><p>matches any character that is not a word-constituent.
</p></dd>
</dl>
<a name="Replacing-regular-expressions"></a>
<h4 class="subsection">3.8.2 Replacing regular expressions</h4>
<p>Also the replacement string has some special feature when doing a regular
expression search and replace. Exactly as during the search, ‘<samp>\</samp>’ followed
by <var>digit</var> stands for “the text matched the <var>digit</var>’th time by the
‘<samp>( … )</samp>’ construct in the search expression”. Moreover, ‘<samp>\0</samp>’
represent the whole string matched by the regular expression. Thus, for
instance, the replace string ‘<samp>\0\0</samp>’ has the effect of doubling any string
matched.
</p>
<p>Another example: if you search for ‘<samp>(a+)(b+)</samp>’, replacing with
‘<samp>\2x\1</samp>’, you will match any string composed by a series of ‘<samp>a</samp>’’s
followed by a series of ‘<samp>b</samp>’’s, and you will replace it with the
string obtained by moving the ‘<samp>a</samp>’ in front of the ‘<samp>b</samp>’’s, adding
moreover ‘<samp>x</samp>’ inbetween. For instance, ‘<samp>aaaab</samp>’ will be matched and
replaced by ‘<samp>bxaaaa</samp>’.
</p>
<p>Note that the backslash character can escape itself. Thus, to put a
backslash in the replacement string, you have to use ‘<samp>\\</samp>’.
</p>
<hr>
<div class="header">
<p>
Next: <a href="Automatic-Preferences.html#Automatic-Preferences" accesskey="n" rel="next">Automatic Preferences</a>, Previous: <a href="Menus.html#Menus" accesskey="p" rel="prev">Menus</a>, Up: <a href="Reference.html#Reference" accesskey="u" rel="up">Reference</a> [<a href="Command-Index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>
|