Donnerstag, 4. Juni 2015

Loopify Recursions


During development it is sometimes convenient to use recursions. But in Perl recursions don't scale very well, they are consuming memory.

It happened to me, that one of my CPAN modules LCS made problems after extending the  test cases to large sequences (more than 200 elements) as input. Avoiding the warning with 

no warnings 'recursion';

is easy but not a good solution.

Here is the code of the traditional LCS algorithm using recursion. The first subroutine calculates the match matrix $c using a nested loop. At the end it calls _print_lcs() to backtrace the resulting LCS (Longest Common Sequence). See the full source code at github.

sub LCS {
  my ($self,$X,$Y) = @_;

  my $m = scalar @$X;
  my $n = scalar @$Y;

  my $c = [];
  my ($i,$j);
  for ($i=0;$i<=$m;$i++) {
    for ($j=0;$j<=$n;$j++) {
      $c->[$i][$j]=0;
    }
  }
  for ($i=1;$i<=$m;$i++) {
    for ($j=1;$j<=$n;$j++) {
      if ($X->[$i-1] eq $Y->[$j-1]) {
        $c->[$i][$j] = $c->[$i-1][$j-1]+1;
      }
      else {
        $c->[$i][$j] = max($c->[$i][$j-1], $c->[$i-1][$j]);
      }
    }
  }
  my $path = $self->_print_lcs($X,$Y,$c,$m,$n,[]);
  return $path;
}

sub max {
  ($_[0] > $_[1]) ? $_[0] : $_[1];
}

sub _print_lcs {
  my ($self,$X,$Y,$c,$i,$j,$L) = @_;

  if ($i==0 || $j==0) { return ([]); }
  if ($X->[$i-1] eq $Y->[$j-1]) {
    $L = $self->_print_lcs($X,$Y,$c,$i-1,$j-1,$L);
    push @{$L},[$i-1,$j-1];
  }
  elsif ($c->[$i][$j] == $c->[$i-1][$j]) {
    $L = $self->_print_lcs($X,$Y,$c,$i-1,$j,$L);
  }
  else {
    $L = $self->_print_lcs($X,$Y,$c,$i,$j-1,$L);
  }
  return $L;
}

Each recursion can also be implemented as a loop. So it should be possible to rewrite the subroutine _print_lcs() without calling itself.

First let's analyze the control flow. The recursion begins with the maximum values of $i=$m, and $j=$n, the length of the two sequences, which now are also indices of the matching matrix $c, i.e. the subroutine begins in bottom right corner and counts down $i and $j to the upper left corner of the matrix.

Along the way matching points are recorded in the array $L.

The recursive solutions has the convenience, that first the subresult can be calculated, and after that the current match is pushed at the end of the subresults. This way the resulting array is sorted from lower to higher while the recursion tracked back from higher to lower.

The loopified solution also counts down but now has to use unshift for recording the match points. See how simple the code looks now (full version at github):

sub _lcs {
  my ($self,$X,$Y,$c,$i,$j,$L) = @_;

  while ($i > 0 && $j > 0) {
    if ($X->[$i-1] eq $Y->[$j-1]) {
      unshift @{$L},[$i-1,$j-1];
      $i--;
      $j--;
    }
    elsif ($c->[$i][$j] == $c->[$i-1][$j]) {
      $i--;
    }
    else {
      $j--;
    }
  }
  return $L;
}

Keine Kommentare:

Kommentar veröffentlichen