/usr/share/doc/redet-doc/Manual/charset.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
   "http://www.w3.org/TR/html4/loose.dtd">
<HTML>
<!--Time-stamp: <2008-09-06 13:11:22 poser> -->
<HEAD>
   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
   <META NAME="Author" CONTENT="Bill Poser">

   <TITLE>Redet Reference Manual: Character Set</TITLE>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFE2C0" VLINK="#0000EE" LINK="#AA0066" ALINK="#FF0000">

<H2><a name="charset">Character Set</a></H2>
<P>
<i>redet</i> itself works with Unicode and by default reads and writes UTF-8
Unicode. Whether regular expression matching
works with characters outside the 7-bit ASCII range in the test data or the
regular expression depends on whether the program that <i>redet</i> calls
works with Unicode. Whether characters are properly displayed in <i>redet</i>
windows depends on the fonts that you have installed.
</P>
<P>
Test data and comparison data may be read in encodings other than UTF-8, and 
test data, comparison data, and results of matches and substitutions may be
written in encodings other than UTF-8. For each type of data, an encoding is specified,
which is used both to read and to write that type of data. By default, these are
all set to UTF-8. The encodings may be changed interactively, via the <i>File</i>
menu, or via the initialization file commands <i>TestDataEncoding</i>, <i>ResultEncoding</i>,
and <i>ComparisonDataEncoding</i>. The set of encodings available is that supported
by your Tcl/Tk installation. A full installation of Tcl/Tk currently provides 81
encodings, with good coverage of Europe and East Asia.
</p>
<P>
This image of the encoding menu on my machine illustrates the encodings available.
Notice that both the current encoding (EUC-JP) and the encoding under consideration,
(Big5), are highlighted, in different ways.
</p>
<br>
<div align="center">
<img src="Images/EncodingMenu.jpg" width="75%" alt="The Encoding Menu" border="2">
</div>
<br clear="all">
<p>
Since virtually all other encodings are subsets of Unicode, it is possible to attempt
to write out data in an encoding that does not support one or more of the Unicode
characters in the internal buffer. Redet detects this situation, aborts the write,
and prints a message indicating the problem and identifying the characters that
cannot be transcoded.
</p>
<P>
Note that some programs that do handle Unicode only work with Unicode in certain locale
settings, while others work with Unicode regardless of the locale. Members of the
latter category include <i>Python</i> and <i>Pike</i>. Programs that support Unicode
only in certain locales include <i>GNU ed</i>, <i>GNU grep</i>, <i>GNU sed</i> and <i>mawk</i>.
If you want to test this, try <i>zh_TW.UTF-8</i> (Taiwan Chinese in UTF-8 encoding)
or <i>es_ES.UTF-8</i> (Castillian in Spain with UTF-8 encoding) for a locale in which
Unicode should be supported and <i>es_ES</i> (Castillian in Spain, with default ISO-8859-1
encoding) for a locale in which Unicode is not supported.
</P>
<P>
<i>Perl</i> can be made to handle Unicode in a variety of ways determined by the
setting of an environment variable or command-line flag. <i>Redet</i> runs <i>Perl</i>
in such a way as to use UTF-8 Unicode for all input and output, regardless of locale.
</P>
<P>
Non-ASCII characters can be entered using whatever entry methods the user's system
provides or using a Unicode <a href="regexps.html#charmap">character map</a> such as
<a href="http://gucharmap.sourceforge.net/">gucharmap</a>. Widgets are provided
for entering characters from the International Phonetic Alphabet since these are
scattered through several Unicode ranges and are therefore inconvenient to enter
using a general purpose Unicode character map. A widget is also provided
for entering characters by their Unicode codepoint. Finally, it is possible
to create custom character entry widgets by loading definitions from a file.
</p>
<P>
As an aid to those working with Unicode, lists of Unicode ranges and general character properties
are available from the <i>Help</i> menu. Right-clicking over a character in the data,
result, comparison, or regular expression window causes the character code and name
to be displayed. The Unicode information version 5.1 of the Unicode standard.
</p>

<br>
<center><a href="help.html">Next</a></center>
<br>
<center><a href="Manual.html">Back to Table of Contents</a></center>
</body>
</html>
redet-doc 8.26-1.1 / usr / share / doc / redet-doc / Manual / charset.html