/usr/share/doc/refdb/refdb-manual/ch14s05.html

1
2

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Regular expressions</title><link rel="stylesheet" type="text/css" href="manual.css" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="home" href="index.html" title="RefDB handbook" /><link rel="up" href="ch14.html" title="Chapter 14. Tools for reference and notes management" /><link rel="prev" href="ch14s04.html" title="The query language" /><link rel="next" href="ch15.html" title="Chapter 15. Tools for bibliographies" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Regular expressions</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="ch14s04.html">Prev</a> </td><th width="60%" align="center">Chapter 14. Tools for reference and notes management</th><td width="20%" align="right"> <a accesskey="n" href="ch15.html">Next</a></td></tr></table><hr /></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="sect1-regular-expressions"></a>Regular expressions</h2></div></div></div><p>This section provides a brief overview over regular expressions. In the context of RefDB, we have to deal with two flavors of regular expressions: Unix-style and SQL. The former are more important as we use them to write queries. The latter are used sparingly, e.g. to search the filenames of databases.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>Some database engines like SQLite do not support Unix-style regular expressions. You have to use SQL regular expressions in this case.</p></div><p>The difference between a literal match and a regular expression match is that the latter allows some <span class="quote">“<span class="quote">fuzziness</span>”</span> in the search string. The former requires that the search string and the search result match character by character. In simple words, regular expressions allow to search for strings which are similar to some extent, and you can exactly specify to which extent.</p><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a id="sect2-regular-expressions-unix"></a>Unix-style regular expressions</h3></div></div></div><p>Regular expressions distinguish between regular characters and special characters (meta characters). The simplest regular expressions actually don't look like regular expressions, as the following example shows:</p><div class="informalexample"><pre class="screen">foo</pre></div><p>This will search for the string "foo" at any position in the target elements. This would find strings like <span class="quote">“<span class="quote">foobar</span>”</span>, <span class="quote">“<span class="quote">lifoo</span>”</span>, or <span class="quote">“<span class="quote">lifoobar</span>”</span>. That is: if there are no meta characters, a simple string match is attempted, however at any position in the element. This is different from search strategies in some other databases where a full match or a left-match is attempted by default.</p><p>We can now replace one <span class="quote">“<span class="quote">o</span>”</span> in the above sample with a meta character. We use the <span class="quote">“<span class="quote">.</span>”</span> (dot) which matches any single character, including a newline, at that position:</p><div class="informalexample"><pre class="screen">f.o</pre></div><p>This will find strings like <span class="quote">“<span class="quote">fao</span>”</span>, <span class="quote">“<span class="quote">fdo</span>”</span>, but as well all strings of the previous example.</p><p>Another very common meta character is the <span class="quote">“<span class="quote">*</span>”</span>, which matches zero or more instances of the <span class="emphasis"><em>previous</em></span> character. Thus,</p><div class="informalexample"><pre class="screen">fo*</pre></div><p>will now find things like <span class="quote">“<span class="quote">fo</span>”</span>, <span class="quote">“<span class="quote">foooo</span>”</span>, but also <span class="quote">“<span class="quote">fbar</span>”</span> and <span class="quote">“<span class="quote">lifooobar</span>”</span>. The meta character <span class="quote">“<span class="quote">+</span>”</span> is similar, but requires at least one instance of the <span class="emphasis"><em>previous</em></span> character:</p><div class="informalexample"><pre class="screen">fo+</pre></div><p>This would retrieve all strings of the last example except <span class="quote">“<span class="quote">fbar</span>”</span> as this contains the <span class="quote">“<span class="quote">o</span>”</span> zero times.</p><div class="informalexample"><pre class="screen">fo?</pre></div><p>The questionmark meta character will retrieve either zero or one instances of the <span class="emphasis"><em>previous</em></span> character. This would match <span class="quote">“<span class="quote">f</span>”</span> and <span class="quote">“<span class="quote">fo</span>”</span>, but not <span class="quote">“<span class="quote">foo</span>”</span>.</p><p>The meta characters <span class="quote">“<span class="quote">^</span>”</span> and <span class="quote">“<span class="quote">$</span>”</span> are important to determine the relation of the search string to the line start or line end:</p><div class="informalexample"><pre class="screen">^foo</pre></div><p>This will match <span class="quote">“<span class="quote">foo</span>”</span> only if it is located at the line start. Similarly,</p><div class="informalexample"><pre class="screen">foo$</pre></div><p>will find <span class="quote">“<span class="quote">foo</span>”</span> only when it is located at the line end. If you combine these two like in the next example:</p><div class="informalexample"><pre class="screen">^foo$</pre></div><p><span class="quote">“<span class="quote">foo</span>”</span> will be found only if this is the complete element, starting and ending the line.</p><p>The following list briefly explains some more terms which are helpful in regular expressions.</p><div class="variablelist"><dl class="variablelist"><dt><span class="term">()</span></dt><dd><p>Use the round brackets to group characters to a sequence. This is particularly useful with the above mentioned metacharacters *, +, and ?.</p><div class="informalexample"><pre class="screen">(foo)*</pre></div><p>This will match zero or more instances of the sequence <span class="quote">“<span class="quote">foo</span>”</span>. It will find e.g. <span class="quote">“<span class="quote">foo</span>”</span> and <span class="quote">“<span class="quote">foofoo</span>”</span>, but not <span class="quote">“<span class="quote">fofo</span>”</span>.</p></dd><dt><span class="term">[]</span></dt><dd><p>matches any single character between the brackets.</p><div class="informalexample"><pre class="screen">[0-9]</pre></div><p>This will match any digit. Continuous ranges of characters can be indicated with a dash, as seen here.</p></dd><dt><span class="term">[^]</span></dt><dd><p>matches any single character except the ones between the brackets</p><div class="informalexample"><pre class="screen">[^abc]</pre></div><p>This will match any character except <span class="quote">“<span class="quote">a</span>”</span>, <span class="quote">“<span class="quote">b</span>”</span>, and <span class="quote">“<span class="quote">c</span>”</span>.</p></dd><dt><span class="term">\</span></dt><dd><p>The backslash escapes the <span class="emphasis"><em>following</em></span> meta character and treats it as a literal character.</p><div class="informalexample"><pre class="screen">\.</pre></div><p>This will match only the dot instead of any single character.</p></dd><dt><span class="term">\{n,m\}</span></dt><dd><p>This will find n to m repeats of the <span class="emphasis"><em>previous</em></span> character.</p><div class="informalexample"><pre class="screen">fo\{2,3\}</pre></div><p>This regular expression will find <span class="quote">“<span class="quote">foo</span>”</span> and <span class="quote">“<span class="quote">fooo</span>”</span>, but not <span class="quote">“<span class="quote">fo</span>”</span> or <span class="quote">“<span class="quote">foooo</span>”</span>.</p></dd></dl></div><p>For further information about regular expressions, see the regex chapter in the <a class="ulink" href="http://www.mysql.com" target="_top">MySQL documentation</a>.</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a id="sect2-regular-expressions-sql"></a>SQL regular expressions</h3></div></div></div><p>SQL regular expressions are much simpler, as there are only two metacharacters:</p><div class="variablelist"><dl class="variablelist"><dt><span class="term">%</span></dt><dd><p>matches any string</p></dd><dt><span class="term">_ (underscore)</span></dt><dd><p>matches any single character</p></dd></dl></div><p>In order to match a SQL regular expression special character literally, you have to escape it by doubling.</p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="ch14s04.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="ch14.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="ch15.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">The query language </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 15. Tools for bibliographies</td></tr></table></div></body></html>

refdb-doc 1.0.2-3ubuntu1 / usr / share / doc / refdb / refdb-manual / ch14s05.html