Select an installed module below |
NAMEText::Soundex - Implementation of the Soundex Algorithm as Described by Knuth
SYNOPSISuse Text::Soundex; $code = soundex($name); # Get the soundex code for a name. @codes = soundex(@names); # Get the list of codes for a list of names. # This is how you define what you want to be returned if your # input string has no indentifiable codes within it. # Make the change permanent. $Text::Soundex::nocode = 'Z000';
# Make the change temporary.
{
local $Text::Soundex::nocode = 'Z000'; # Temporary change.
$code = soundex($name);
}
DESCRIPTIONThis module implements the soundex algorithm as described by Donald Knuth in Volume 3 of The Art of Computer Programming. The algorithm is intended to hash words (in particular surnames) into a small space using a simple model which approximates the sound of the word when spoken by an English speaker. Each word is reduced to a four character string, the first character being an upper case letter and the remaining three being digits. The value returned for strings which have no soundex encoding is set in
the scalar For backward compatibility with older versions of this module the
In scalar context @codes = soundex qw(Mike Stok); leaves If you wish to use
# First method:
use Text::Soundex qw(:NARA-Ruleset);
$code = soundex($name);
# Second method:
use Text::Soundex qw(soundex_nara);
$code = soundex_nara($name);
This is necessary, as the algorithm used by the US Censuses is slightly different than that defined by Knuth and others. The descrepancy can be shown using the name ``Ashcraft'':
print soundex("Ashcraft"), "\n"; # prints: A226
print soundex_nara("Ashcraft"), "\n"; # prints: A261
Their is also a speed hit involved when using the NARA ruleset. (The encoding is slightly more complicated)
EXAMPLESKnuth's examples of various names and the soundex codes they map to are listed below: Euler, Ellery -> E460 Gauss, Ghosh -> G200 Hilbert, Heilbronn -> H416 Knuth, Kant -> K530 Lloyd, Ladd -> L300 Lukasiewicz, Lissajous -> L222 so: $code = soundex 'Knuth'; # $code contains 'K530' @list = soundex qw(Lloyd Gauss); # @list contains 'L300', 'G200'
UNDERNEATH THE COVERS (a word from Mark)To ease use for the user, the XS version is transparently accessible via
# The following calls are split up by functionality.
# Always uses the 100% perl version.
... = Text::Soundex::soundex_noxs(...);
# Always uses the XS version. (7X faster)
... = Text::Soundex::soundex_xs(...);
# Use the XS version if possible, otherwise
# it will revert to the 100% perl version.
... = Text::Soundex::soundex(...);
=head1 LIMITATIONS As the soundex algorithm was originally used a long time ago in the US it considers only the English alphabet and pronunciation. As it is mapping a large space (arbitrary length strings) onto a small
space (single letter plus 3 digits) no inference can be made about the
similarity of two strings which end up with the same soundex code. For
example, both
AUTHORThis code was originally implemented by Mike Stok ( Ian Phillips ( Dave Carlson (
|
|