/usr/share/perl5/Unicode/MapUTF8.pod is in libunicode-maputf8-perl 1.11-2.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | =head1 NAME
Unicode::MapUTF8 - Conversions to and from arbitrary character sets and UTF8
=head1 SYNOPSIS
use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);
# Convert a string in 'ISO-8859-1' to 'UTF8'
my $output = to_utf8({ -string => 'An example', -charset => 'ISO-8859-1' });
# Convert a string in 'UTF8' encoding to encoding 'ISO-8859-1'
my $other = from_utf8({ -string => 'Other text', -charset => 'ISO-8859-1' });
# List available character set encodings
my @character_sets = utf8_supported_charset;
# Add a character set alias
utf8_charset_alias({ 'ms-japanese' => 'sjis' });
# Convert between two arbitrary (but largely compatible) charset encodings
# (SJIS to EUC-JP)
my $utf8_string = to_utf8({ -string =>$sjis_string, -charset => 'sjis'});
my $euc_jp_string = from_utf8({ -string => $utf8_string, -charset => 'euc-jp' })
# Verify that a specific character set is supported
if (utf8_supported_charset('ISO-8859-1') {
# Yes
}
=head1 DESCRIPTION
Provides an adapter layer between core routines for converting
to and from UTF8 and other encodings. In essence, a way to give multiple
existing Unicode modules a single common interface so you don't have to know
the underlaying implementations to do simple UTF8 to-from other character set
encoding conversions. As such, it wraps the Unicode::String, Unicode::Map8,
Unicode::Map and Jcode modules in a standardized and simple API.
This also provides general character set conversion operation based on UTF8 - it is
possible to convert between any two compatible and supported character sets
via a simple two step chaining of conversions.
As with most things Perlish - if you give it a few big chunks of text to chew on
instead of lots of small ones it will handle many more characters per second.
By design, it can be easily extended to encompass any new charset encoding
conversion modules that arrive on the scene.
This module is intended to provide good Unicode support to versions of Perl
prior to 5.8. If you are using Perl 5.8.0 or later, you probably want to be
using the Encode module instead. This module B<does> work with Perl 5.8,
but Encode is the preferred method in that environment.
=head1 CHANGES
1.11 2005.10.10 Documentation changes. Addition of Build.PL support.
Added various build tests, LICENSE, Artistic_License.txt,
GPL_License.txt. Split documentation into seperate
.pod file. Added Japanese translation of POD.
1.10 2005.05.22 - Fixed bug in conversion of ISO-2022-JP to UTF-8.
Problem and fix found by Masahiro HONMA
<masahiro.honma@tsutaya.co.jp>.
Similar bugs in conversions of shift_jis and euc-jp
to UTF-8 fixed as well.
1.09 2001.08.22 - Fixed multiple typo occurances of 'uft'
where 'utf' was meant in code. Problem affected
utf16 and utf7 encodings. Problem found
by devon smith <devon@taller.PSCL.cwru.edu>
1.08 2000.11.06 - Added 'utf8_charset_alias' function to
allow for runtime setting of character
set aliases. Added several alternate
names for 'sjis' (shiftjis, shift-jis,
shift_jis, s-jis, and s_jis).
Corrected 'croak' messages for
'from_utf8' functions to appropriate
function name.
Tightened up initialization encapsulation
Corrected fatal problem in jcode from
unicode internals. Problem and fix
found by Brian Wisti <wbrian2@uswest.net>.
1.07 2000.11.01 - Added 'croak' to use Carp declaration to
fix error messages. Problem and fix
found by Brian Wisti
<wbrian2@uswest.net>.
1.06 2000.10.30 - Fix to handle change in stringification
of overloaded objects between Perl 5.005
and 5.6. Problem noticed by Brian Wisti
<wbrian2@uswest.net>.
1.05 2000.10.23 - Error in conversions from UTF8 to
multibyte encodings corrected
1.04 2000.10.23 - Additional diagnostic messages added
for internal error conditions
1.03 2000.10.22 - Bug fix for load time autodetction of
Unicode::Map8 encodings
1.02 2000.10.22 - Added load time autodetection of
Unicode::Map8 supported character set
encodings.
Fixed internal calling error for some
character sets with 'from_utf8'. Thanks
goes to Ilia Lobsanov
<ilia@lobsanov.com> for reporting this
problem.
1.01 2000.10.02 - Fixed handling of empty strings and
added more identification for error
messages.
1.00 2000.09.29 - Pre-release version
=head1 FUNCTIONS
=over 4
=item utf8_charset_alias({ $alias => $charset });
Used for runtime assignment of character set aliases.
Called with no parameters, returns a hash of defined aliases and the character sets
they map to.
Example:
my $aliases = utf8_charset_alias;
my @alias_names = keys %$aliases;
If called with ONE parameter, returns the name of the 'real' charset
if the alias is defined. Returns undef if it is not found in the aliases.
Example:
if (! utf8_charset_alias('VISCII')) {
# No alias for this
}
If called with a list of 'alias' => 'charset' pairs, defines those aliases for use.
Example:
utf8_charset_alias({ 'japanese' => 'sjis', 'japan' => 'sjis' });
Note: It will croak if a passed pair does not map to a character set
defined in the predefined set of character encoding. It is NOT
allowed to alias something to another alias.
Multiple character set aliases can be set with a single call.
To clear an alias, pass a character set mapping of undef.
Example:
utf8_charset_alias({ 'japanese' => undef });
While an alias is set, the 'utf8_supported_charset' function
will return the alias as if it were a predefined charset.
Overriding a base defined character encoding with an alias
will generate a warning message to STDERR.
=back
=over 4
=item utf8_supported_charset($charset_name);
Returns true if the named charset is supported (including
user defined aliases).
Returns false if it is not.
Example:
if (! utf8_supported_charset('VISCII')) {
# No support yet
}
If called in a list context with no parameters, it will return
a list of all supported character set names (including user
defined aliases).
Example:
my @charsets = utf8_supported_charset;
=back
=over 4
=item to_utf8({ -string => $string, -charset => $source_charset });
Returns the string converted to UTF8 from the specified source charset.
=back
=over 4
=item from_utf8({ -string => $string, -charset => $target_charset});
Returns the string converted from UTF8 to the specified target charset.
=back
=head1 VERSION
1.11 2005.10.10
=head1 TODO
Regression tests for Jcode, 2-byte encodings and encoding aliases
=head1 SEE ALSO
L<Unicode::String> L<Unicode::Map8> L<Unicode::Map> L<Jcode> L<Encode>
=head1 COPYRIGHT
Copyright 2000-2005, Benjamin Franz. All rights reserved.
=head1 AUTHOR
Benjamin Franz <snowhare@nihongo.org>
=head1 LICENSE
This program is free software; you can redistribute it
and/or modify it under the same terms and conditions as
Perl itself.
This means that you can, at your option, redistribute it and/or modify it under
either the terms the GNU Public License (GPL) version 1 or later, or under the
Perl Artistic License.
See http://dev.perl.org/licenses/
=head1 DISCLAIMER
THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE.
Use of this software in any way or in any form, source or binary,
is not allowed in any country which prohibits disclaimers of any
implied warranties of merchantability or fitness for a particular
purpose or any disclaimers of a similar nature.
IN NO EVENT SHALL I BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT,
SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OF THIS SOFTWARE AND ITS DOCUMENTATION (INCLUDING, BUT NOT
LIMITED TO, LOST PROFITS) EVEN IF I HAVE BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE
=cut
|