/usr/share/perl5/SQL/Tokenizer.pm is in libsql-tokenizer-perl 0.24-2.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | package SQL::Tokenizer;
use warnings;
use strict;
use 5.006002;
use Exporter;
our @ISA = qw(Exporter);
our @EXPORT_OK= qw(tokenize_sql);
our $VERSION= '0.24';
my $re= qr{
(
(?:--|\#)[\ \t\S]* # single line comments
|
(?:<>|<=>|>=|<=|==|=|!=|!|<<|>>|<|>|\|\||\||&&|&|-|\+|\*(?!/)|/(?!\*)|\%|~|\^|\?)
# operators and tests
|
[\[\]\(\),;.] # punctuation (parenthesis, comma)
|
\'\'(?!\') # empty single quoted string
|
\"\"(?!\"") # empty double quoted string
|
"(?>(?:(?>[^"\\]+)|""|\\.)*)+"
# anything inside double quotes, ungreedy
|
`(?>(?:(?>[^`\\]+)|``|\\.)*)+`
# anything inside backticks quotes, ungreedy
|
'(?>(?:(?>[^'\\]+)|''|\\.)*)+'
# anything inside single quotes, ungreedy.
|
/\*[\ \t\r\n\S]*?\*/ # C style comments
|
(?:[\w:@]+(?:\.(?:\w+|\*)?)*)
# words, standard named placeholders, db.table.*, db.*
|
(?: \$_\$ | \$\d+ | \${1,2} )
# dollar expressions - eg $_$ $3 $$
|
\n # newline
|
[\t\ ]+ # any kind of white spaces
)
}smx;
sub tokenize_sql {
my ( $query, $remove_white_tokens )= @_;
my @query= $query =~ m{$re}smxg;
if ($remove_white_tokens) {
@query= grep( !/^[\s\n\r]*$/, @query );
}
return wantarray ? @query : \@query;
}
sub tokenize {
my $class= shift;
return tokenize_sql(@_);
}
1;
=pod
=head1 NAME
SQL::Tokenizer - A simple SQL tokenizer.
=head1 VERSION
0.20
=head1 SYNOPSIS
use SQL::Tokenizer qw(tokenize_sql);
my $query= q{SELECT 1 + 1};
my @tokens= SQL::Tokenizer->tokenize($query);
# @tokens now contains ('SELECT', ' ', '1', ' ', '+', ' ', '1')
@tokens= tokenize_sql($query); # procedural interface
=head1 DESCRIPTION
SQL::Tokenizer is a simple tokenizer for SQL queries. It does not claim to be
a parser or query verifier. It just creates sane tokens from a valid SQL
query.
It supports SQL with comments like:
-- This query is used to insert a message into
-- logs table
INSERT INTO log (application, message) VALUES (?, ?)
Also supports C<''>, C<""> and C<\'> escaping methods, so tokenizing queries
like the one below should not be a problem:
INSERT INTO log (application, message)
VALUES ('myapp', 'Hey, this is a ''single quoted string''!')
=head1 API
=over 4
=item tokenize_sql
use SQL::Tokenizer qw(tokenize_sql);
my @tokens= tokenize_sql($query);
my $tokens= tokenize_sql($query);
$tokens= tokenize_sql( $query, $remove_white_tokens );
C<tokenize_sql> can be imported to current namespace on request. It receives a
SQL query, and returns an array of tokens if called in list context, or an
arrayref if called in scalar context.
=item tokenize
my @tokens= SQL::Tokenizer->tokenize($query);
my $tokens= SQL::Tokenizer->tokenize($query);
$tokens= SQL::Tokenizer->tokenize( $query, $remove_white_tokens );
This is the only available class method. It receives a SQL query, and returns an
array of tokens if called in list context, or an arrayref if called in scalar
context.
If C<$remove_white_tokens> is true, white spaces only tokens will be removed from
result.
=back
=head1 ACKNOWLEDGEMENTS
=over 4
=item
Evan Harris, for implementing Shell comment style and SQL operators.
=item
Charlie Hills, for spotting a lot of important issues I haven't thought.
=item
Jonas Kramer, for fixing MySQL quoted strings and treating dot as punctuation character correctly.
=item
Emanuele Zeppieri, for asking to fix SQL::Tokenizer to support dollars as well.
=item
Nigel Metheringham, for extending the dollar signal support.
=item
Devin Withers, for making it not choke on CR+LF in comments.
=item
Luc Lanthier, for simplifying the regex and make it not choke on backslashes.
=back
=head1 AUTHOR
Copyright (c) 2007, 2008, 2009, 2010, 2011 Igor Sutton Lopes "<IZUT@cpan.org>". All rights
reserved.
This module is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.
=cut
|