Motivation
There always is a need for speed. During tuning one of my applications in the field of OCR postprocessing, the profiler identified the LCS (Longest Common Subsequence) routine as the most time consuming part. But the big question was: Is it possible to make the already fast Algorithm::Diff and Algorithm::Diff::XS faster.
Reading the code again and again, trying some micro-optimizations there was only small progress.
OK, don't twiddle, try better algorithms.
So I spent a lot of time reading papers, then trying to implement the described algorithms. Each implementation took some time to debug. Some do not work as they should, i.e. they fail test cases. Some are slow in Perl.
After a while I had a collection of various implementations of the so called Hunt-Szymansky algorithm and improvements of it. This is the same algorithm Algorithm::Diff is based on.
During tuning a bit-vector implementation I compared it to my already tuned version of Algorithm::Diff, and merged a part of code of the bit-vector variant back. The result of the benchmark surprised me, as the now tuned Algorithm::Diff was only 10 percent slower than the bit-vector variant.
This motivated me to publish it as a reliable and portable pure Perl module under the name LCS::Tiny.
Here are the steps making it so fast.
Reading the code again and again, trying some micro-optimizations there was only small progress.
OK, don't twiddle, try better algorithms.
So I spent a lot of time reading papers, then trying to implement the described algorithms. Each implementation took some time to debug. Some do not work as they should, i.e. they fail test cases. Some are slow in Perl.
After a while I had a collection of various implementations of the so called Hunt-Szymansky algorithm and improvements of it. This is the same algorithm Algorithm::Diff is based on.
During tuning a bit-vector implementation I compared it to my already tuned version of Algorithm::Diff, and merged a part of code of the bit-vector variant back. The result of the benchmark surprised me, as the now tuned Algorithm::Diff was only 10 percent slower than the bit-vector variant.
This motivated me to publish it as a reliable and portable pure Perl module under the name LCS::Tiny.
Here are the steps making it so fast.
Remove features
Algorithm::Diff::LCSidx has two additional parameters:
- a code ref to a key generating function
- a code ref to a compare function
Usually they would not be specified and the defaults are used, but there is still overhead for checking and calling.
Inlining
When Algorithm::Diff::LCSidx is called then four subroutines are involved. Inlining the subroutines into one subroutine reduces the lines of code and runtime overhead. Of course the code would maybe become less readable.
Squeezing
Try to use less instructions. Use Devel::NYTProf and benchmarks to check the changes. This sort of tuning can be very time consuming.
Here is an example from Algorithm::Diff:
Here is an example from Algorithm::Diff:
sub _withPositionsOfInInterval
{
my $aCollection = shift; # array ref
my $start = shift;
my $end = shift;
my $keyGen = shift;
my %d;
my $index;
for ( $index = $start ; $index <= $end ; $index++ )
{
my $element = $aCollection->[$index];
my $key = &$keyGen( $element, @_ );
if ( exists( $d{$key} ) )
{
unshift ( @{ $d{$key} }, $index );
}
else
{
$d{$key} = [$index];
}
}
return wantarray ? %d : \%d;
}
This is how it looks after squeezing:
my $bMatches;
unshift @{ $bMatches->{$b->[$_]} },$_ for $bmin..$bmax;
Resulting module - LCS::Tiny
As bit-vector solutions can have portability problems, I decided to release the now very fast non-bit-vector implementation on CPAN as LCS::Tiny.Additional speed - LCS::BV
After release of LCS::Tiny I got back to work on the bit-vector implementation, to make it portable for different lengths of words, and support sequences longer than word-length (32 or 64 bit).Also I tried out improvements for a part of the algorithm. One resulted in a significant accelleration and is now implemented in the released module LCS::BV.
Benchmarks
Short sequences
A nice and typical average case for spelling-correction is the comparison of 'Chrerrplzon' with 'Choerephon'.
use Algorithm::Diff;
use Algorithm::Diff::XS;
use String::Similarity;
use Benchmark qw(:all) ;
use LCS::Tiny;
use LCS;
use LCS::BV;
my @data = (
[split(//,'Chrerrplzon')],
[split(//,'Choerephon')]
);
my @strings = qw(Chrerrplzon Choerephon);
if (1) {
cmpthese( 50_000, {
'LCS' => sub {
LCS->LCS(@data)
},
'LCSidx' => sub {
Algorithm::Diff::LCSidx(@data)
},
'LCSXS' => sub {
Algorithm::Diff::XS::LCSidx(@data)
},
'LCSbv' => sub {
LCS::BV->LCS(@data)
},
'LCStiny' => sub {
LCS::Tiny->LCS(@data)
},
'S::Sim' => sub {
similarity(@strings)
},
});
}
Gives the following result:
$ perl 50_diff_bench.t
(warning: too few iterations for a reliable count)
Rate LCS LCSidx LCStiny LCSXS LCSbv S::Sim
LCS 7022/s -- -71% -82% -87% -87% -99%
LCSidx 23923/s 241% -- -40% -55% -57% -98%
LCStiny 39683/s 465% 66% -- -25% -29% -96%
LCSXS 53191/s 657% 122% 34% -- -4% -95%
LCSbv 55556/s 691% 132% 40% 4% -- -94%
S::Sim 1000000/s 14140% 4080% 2420% 1780% 1700% --
Here the pure Perl LCS::BV beats Algorithm::Diff::XS (written in C).
Worst case
This example shows the weakness of the algorithms against more complicated cases.
my @data3 = ([qw/a b d/ x 50], [qw/b a d c/ x 50]);
$ perl 50_diff_bench.t
(warning: too few iterations for a reliable count)
Rate LCS LCStiny LCSidx LCSbv LCSXS
LCS 35.2/s -- -33% -38% -97% -99%
LCStiny 52.7/s 50% -- -7% -95% -98%
LCSidx 56.5/s 60% 7% -- -94% -98%
LCSbv 1020/s 2795% 1836% 1705% -- -67%
LCSXS 3125/s 8766% 5828% 5428% 206% --
Here the XS is fastest, because the stressed part is in pure C. The bit-vector implementation scales good, because of the parallelisms. The others fall back to worst case performance near the traditional implementation.
Keine Kommentare:
Kommentar veröffentlichen