Mailing-List: contact hop-discuss-help@plover.com; run by ezmlm
Message-ID: <20051021055941.14303.qmail@plover.com>
To: hop-discuss@plover.com
Subject: Re: Fibonacci 
In-reply-to: Your message of "Thu, 20 Oct 2005 18:51:48 PDT."
             <20051021015147.GD98193@kamidake.apricot.com> 
Date: Fri, 21 Oct 2005 01:59:41 -0400
From: Mark Jason Dominus <mjd@plover.com>


Caution: extensive and not-very-programming-related mathematics ahead.

Zed Lopez:
> On Thu, Oct 20, 2005 at 10:29:17AM -0400, Mark Jason Dominus wrote:
> > 
> > Back in April, I said:
> > 
> > > The other thing I wish I had managed to work in was that the
> > > really-best way to calculate Fibonacci numbers is
> > > 
> > >         sub fib { _fib(@_))[1] }
> > > 
> > >         sub _fib {
> > >           my $n = shift;
> > >           if ($n == 0) { return (1, 0) }
> > >           my ($a, $b) = _fib(int($n/2)) ;
> > >           return $n%2 ? ((2*$b+$a)*$a, $a*$a+$b*$b) :
> > >                                      ($a*$a+$b*$b, $b*(2*$a-$b))
> > >         }
> > > 
> 
> That is neat. When reading
> http://mathworld.wolfram.com/FibonacciNumber.html in hopes of making
> sense of this, when I first saw formulae 54 and 55:
> 
> for n odd,  F(n) = F((n+1)/2)**2 + F((n-1)/2)**2
> for n even, F(n) = F(n/2 + 1)**2 - F((n/2 - 1)**2
> 
> I thought "Ah ha! That must be it!." But on second glimpse, it's
> clearly not, of course. 

Actually it is, sort of.  But I got to this version by a different
path.  Here's the deal.  You start with the equations:

        F(n+1)   = F(n) + F(n-1)
        F(n)     = F(n)


You can express this in matrix form like this:

        [  F(n+1) ]      [  F(n)  ]   [ 1  1 ]
        [         ]  =   [        ] * [      ]
        [   F(n)  ]      [ F(n-1) ]   [ 1  0 ]

(See footnote 1 if you don't know about the matrices.)  

Let's call the

        [ 1 1 ]
        [ 1 0 ]

thing "A".    We can repeat this process, multiplying again by A, and get:

        [  F(n+2) ]      [  F(n)  ]   [ 1  1 ]   [ 1  1 ]
        [         ]  =   [        ] * [      ] * [      ]
        [  F(n+1) ]      [ F(n-1) ]   [ 1  0 ]   [ 1  0 ]

                                              2
                         [  F(n)  ]   [ 1  1 ]
                     =   [        ] * [      ]                 
                         [ F(n-1) ]   [ 1  0 ] 

and in general, since multiplying by A gets us one step further, 
multiplying by A several times get us several steps further:

                                              k
        [ F(n+k) ]       [  F(n)  ]   [ 1  1 ]  
        [        ]   =   [        ] * [      ] 
        [F(n+k-1)]       [ F(n-1) ]   [ 1  0 ] 

Putting n = 1, we get:

                                           k
        [ F(k+1) ]       [  1  ]   [ 1  1 ]  
        [        ]   =   [     ] * [      ] 
        [  F(k)  ]       [  0  ]   [ 1  0 ] 


so if we could figure out a fast way to calculate powers of A, we
would have a fast elgorithm for Fibonacci numbers.

Certainly we can calculate A^k by doing k-1 matrix multiplications,
each of which requires 8 ordinary multiplications and 4 additions
(footnote 2) and so we can compute F(k) in 8k-8 multiplications plus
4k-4 additions.  But the ordinary algorithm computes F_k using 0
multiplications and k-1 additions, so this is not yet an improvement.

But as a surprisingly small number of computer programmers know, there
is a much better way to compute powers of anything.  This has a lot of
general utility, so I'm going to leave the matrices aside for a while
and just deal with ordinary numbers.  Suppose you want to write pow()
so that pow(n, k) = n**k.  The obvious way is:

        sub pow {
          my ($n, $k) = @_;
          my $total = 1;
          while ($k--) { $total *= $n }
          return $total;
        }

But there's a much better way.  Suppose that you need to calculate
2**12, say.  Sure, you can multiply 2*2*2*2*2*2*2*2*2*2*2*2.  That's
11 multiplications.  But it's a lot simpler to notice that 2**12 =
(2**6) * (2**6), calculate 2**6 once and then square it, which
requires no more than 6 multiplications.

And of course you can do better, because you can calculate 2**6 by
calculating 2**3 once (two multiplications) and squaring that, for a
total of 3 multiplications, so you've gotten to 2**12 with a grant
total of four multiplications instead of 11.

That is, you want 2**12, so you calculate 2**3 = 8, then 8*8=64, and
64*64 = 4096, and you're done.

Now suppose you wanted 2**13 instead.   Well, that's just one extra
multiplication.  So a much better pow() algorithm is:

        sub pow {
          my ($n, $k) = @_;
          return 1 if $k == 0;
          my $sub = pow($n, int($k/2));
          my $ssub = $sub * $sub;
          if ($k % 2 == 1) { $n * $ssub }
          else             {      $ssub }
        }

When calculating 2**1000, for example, this method requires only 13
multiplications, instead of the 999 that the obvious algorithm
requires.  (Footnote 3)

Eliminating the recursion here is a bit of a pain.  There is no tail
recursion, so we might use the stack method (see chapter 5 of HOP):

        sub pow {
          my ($n, $k) = @_;
          my @stack;
          while ($k) { push @stack, $k; $k = int($k/2) }
          my $total = 1;
          while (@stack) {
            $k = pop @stack;
            if ($k % 2 == 1) { $total *= $total * $n }
            else             { $total *= $total      }
          }
          return $total;
        }

Now consider what happens when $k is 32, which is 100101 in binary.
The function puts the following values onto the stack:

        100101, 10010, 1001, 100, 10, 1

and then pops them off in reverse order.  Since the ($k % 2 == 1) test
is only looking at the final bit of each stack item, it should be
clear that what the stack is really doing here is just allowing us to
scan the bits of $k one at a time starting from the left end.  That
is, the bits we're considering in this example are just

        1, 0, 0, 1, 0, 1

and $k is indeed 100101.  So we can dispense with the stack if we just
find another way to scan the bits of $k from left to right.  One such
way is to use Perl's "sprintf" operator to get the bits:

        sub mpow {
          my ($n, $k) = @_;
          my @bits = split //, sprintf "%b", $k;
          my $total = 1;
          for my $bit (@bits) {
            $total *= $total;
            if ($bit) { $total *= $n }
          }
          return $total;
        }

This is how to calculate powers quickly.

The same general technique works just fine for calculating powers of a
matrix.  If A is a matrix and we want A**k, we eep an accumulator
matrix (corresponding to $total) which is initially the unit matrix:

        [ 1 0 ]
        [ 0 1 ]

and scan the bits of $k left to right.  At each bit, we multiply the
accumulator by itself; if the current bit is set, we also multiply by A.    

Now recall that the A we're interested in is:

        [ 1  1 ]
        [      ]
        [ 1  0 ]

If we calculate the first few powers of A by hand, we see that they
are:

        [ 1  1 ]   [ 2  1 ]   [ 3  2 ]   [ 5  3 ]   [ 8  5 ]   [ 13  8 ]
        [      ]   [      ]   [      ]   [      ]   [      ]   [       ]
        [ 1  0 ]   [ 1  1 ]   [ 2  1 ]   [ 3  2 ]   [ 5  3 ]   [  8  5 ]


which should lead us to conjecture that A**k is:

        [ F(k+1)  F(k)   ]
        [                ]
        [ F(k)    F(k-1) ]


This is true, and easily shown.  It's obviously true for k=1, and a
technique called "induction" shows that it's true in general.  

So what we take away from al this is:  if we could calculate powers of
A quickly, we could calculate Fibonacci numbers quickly.  But we *can*
calculate powers of A quickly, using the speedy method shown earlier.
For example, 

                 [ F(2k+1)  F(2k)   ]
        A**2k =  [                  ]
                 [ F(2k)    F(2k-1) ]
        

but A**2k also equals (A**k)**2, and since A**k is:

                          2              
        [ F(k+1)  F(k)   ]       [ F(2k+1)  F(2k)   ]
        [                ]   =   [                  ]
        [ F(k)    F(k-1) ]       [ F(2k)    F(2k-1) ]

We can evaluate the left-hand side, getting:

                      
   [ F(k+1)F(k+1) + F(k)F(k)   F(k+1)F(k) + F(k)F(k-1) ]   [ F(2k+1)  F(2k)   ]
   [                                                   ] = [                  ]
   [ F(k+1)F(k) + F(k)F(k-1)   F(k)F(k) + F(k-1)F(k-1) ]   [ F(2k)    F(2k-1) ]


And then equating the elements on either side:

        F(k+1)F(k+1) + F(k)F(k)  = F(2k+1)
        F(k+1)F(k) + F(k)F(k-1)  = F(2k)
        F(k)F(k) + F(k-1)F(k-1)  = F(2k-1)


which are the identities you mentioned back at the very beginning of
the article.  So yes, there is a close relation.

I started writing the _fib function you quoted at the top of the
message by doing the matrix arithmetic explicitly.  Say the 4 entries
in the current total matrix are:

        [ a b ]
        [     ]
        [ c d ]

Then:
                       [ a b ]   [ a b ]   [ a*a + b*c   a*b + b*d ]
        total*total =  [     ] * [     ] = [                       ]
                       [ c d ]   [ c d ]   [ a*c + c*d   b*c + d*d ]


And the other thing we need to know is:

                       [ a b ]   [ 1 1 ]   [ a + b   a ]
        total * A   =  [     ] * [     ] = [           ]
                       [ c d ]   [ 1 0 ]   [ c + d   c ]


So we'll have something like this:

        # Calculate Fibonacci numbers by efficient calculation of
        # entries of A**k
        #
        sub fib {
          my ($k) = @_;
          my @bits = split //, sprintf "%b", $k;
          my ($a, $b, $c, $d) = (1, 0, 0, 1);
          for my $bit (@bits) {
            # Analogous to $total *= $total
            ($a, $b, $c, $d) = ($a*$a+$b*$c,  $a*$b+$b*$d,
                                $a*$c+$c*$d,  $b*$c+$d*$d);
            if ($bit) {
              # Analogous to $total *= A
              ($a, $b, $c, $d) = ($a+$b, $a, $c+d, $c);
            }
          }
          return $b;
        }

And in fact this works really well.   From here on it is just trimming
and cleanup.  The most obvious such cleanup is to observe that $c is
totally superfluous, since it's always equal to $b, so we can kill it:

        sub fib {
          my ($k) = @_;
          my @bits = split //, sprintf "%b", $k;
          my ($a, $b, $d) = (1, 0, 1);
          for my $bit (@bits) {
            # Analogous to $total *= $total
            ($a, $b, $d) = ($a*$a+$b*$b,  $a*$b+$b*$d,  $b*$b+$d*$d);
            if ($bit) {
              # Analogous to $total *= A
              ($a, $b, $d) = ($a+$b, $a, $b);
            }
          }
          return $b;
        }


The next cleanup is a little more mathematical.  A**k is:

        [ F(k+1)  F(k)   ]
        [                ]
        [ F(k)    F(k-1) ]


and in our program, the upper-left entry is represented by $a, the
upper-right by $b, and the lower right by $d.  This means that
a=F(k+1), b=F(k), and d=F(k-1) at all times.  But then by the
definition of the Fibonacci numbers, d = F(k-1) = F(k+1) - F(k) = a-b.
So we can get rid of d also, replacing it everywhere with a-b:

        sub fib {
          my ($k) = @_;
          my @bits = split //, sprintf "%b", $k;
          my ($a, $b) = (1, 0);
          for my $bit (@bits) {
            # Analogous to $total *= $total
            ($a, $b) = ($a*$a+$b*$b,  $a*$b+$b*($a-$b));
            if ($bit) {
              # Analogous to $total *= A
              ($a, $b) = ($a+$b, $a);
            }
          }
          return $b;
        }

Collapsing this to 

> > >           return $n%2 ? ((2*$b+$a)*$a, $a*$a+$b*$b) :
> > >                                      ($a*$a+$b*$b, $b*(2*$a-$b))

is straightforward.  When $n is even (so $bit is false) you can see
the $a*$a+$b*$b expression right there, and $b*(2*$a-$b) is just a
simplified version of $a*$b+$b*($a-$b).  In the $n odd case, you can
again see the  $a*$a+$b*$b expression, this time being assigned to $b,
and (2*$b+$a)*$a is just a simplification of $a*$a+$b*$b + $b*(2*$a-$b).

So there you have it; as nearly as I can remember, that is exactly how
I did it.


> > I needed this for something or other today, so I did the exercise and
> > eliminated the recursion.  Here's the result:
> > 
> >         unsigned fib (unsigned n)
> >         {
> >           unsigned a = 1, b = 0;
> >           unsigned i = 0x8000;
> > 
> >           while (i) {
> >             unsigned aa;
> >             aa = n&i ? (2*b+a)*a : a*a+b*b;
> >             b  = n&i ?             a*a+b*b : b*(2*a-b);
> >             a  = aa;
> >             i >>= 1;
> >           }
> >           return b;
> >         }
> > 
> > (This is C, not Perl.  A Perl version is left as another exercise.)
> > 
> > (The 0x8000 will appear to the casual user to be a portability problem,
> > but it isn't.)
> 
> I'm still not sure why this works, 

I hope this is clearer now.  We are scanning the bits of n from left
to right.  The variable i selects the bit we want to use next.  It
starts with the leftmost bit of n, assuming that n is at most 16 bits
wide, and at each stage moves one bit to the right; that's the >>= 1.

> and why 0x8000 wouldn't be a portability problem,

Oh, I mostly just said that to keep Ben Tilly from chiming in.  But my
idea was this: It can only go wrong if the value of n exceeds 65535.
On a machine with 16-bit integers, this can't happen.  On a machine
with larger integers, you will get the wrong answer for n > 65536.
But since F(65536) is 45,498 bits long, you were going to get the
wrong answer anyway.  So the 0x8000 is not making anything any worse.

> or whether it's fair to call this the equivalent program with
> recursion eliminated. This is O(1), always going through its loop
> exactly 16 times. The Perl program is O(log n). I'll have to think
> about it some more.

Yes.  From a Perl point of view, I cheated on the thing with the
machine integers by limiting the input to 16 bits.  But from a C point
of view, I did the exact right thing.  Different languages have
different esthetics. 


FOOTNOTES

1.  A matrix is a mathematical object that behaves in some ways like a
    table of numbers, and in other ways like a special, complicated
    kind of number itself.  Matrices can be added, if they have the
    same number of rows and columns, and multiplied, according to a
    rather interesting rule.

    Suppose you have three factories, A, B, and C, and each one is
    producing two kinds of candy, say candy X and candy Y:

          X Y

        [ 4 3 ]  A 
        [ 8 1 ]  B 
        [ 7 0 ]  C

    The entries here indicate that factory A produces 4 tons of candy
    X and 3 tons of candy Y each month; other entries similarly.

    Now let's suppose that candies X and Y generate income, and
    manufacturing expenses, and cavities, as follows:

          inc. exp. cav.

        [ $125  $80  24 ]  X
        [ $150  $90  36 ]  Y
        

    That is, each ton of candy X costs us $80,000, brings in $125,000
    in revenue, and causes 24,000 cavities.  We want to know the total
    production of each factory.   To find the total income generated by
    factory A, for example, we take 4 * $125 + 3 * $150 = $950.  To
    find the total cavities generated by factory B, we calculate 8 *
    24 + 1 * 36 = 228.  Other entries similarly; the complete result
    is:

        A       B       C
        
     [ 950    1150     875  ]  income
     [ 590     730     560  ]  expenses
     [ 204     228     168  ]  cavities

    This is the result of multiplying the two example matrices.  In
    general, you are allowed to multiply an AxB matrix by a CxD matrix
    only if B=C, and the result is then an AxD matrix.  Here we had a
    3x2 and a 2x3 matrix, so the result was a 3x3 matrix.

    When you abstract away the factories and the candies, the numbers
    remain, and that's what mathematicians mean by multiplying
    matrices.  The important thing about the matrices for this article
    is that the multiplication is "associative", which means that
    (A*B)*C = A*(B*C) for all matrices A, B, and C, and, in
    particular, the notation A**k as shorthand for A*A*A*A...*A is
    well-defined.  

2.  The straightforward method for multiplying 2x2 matrices requires 8
    multiplications and 4 additions, but methods have been known since
    1969 for reducing the number of multiplications to 7 at the cost
    of increasing the additions to 15.  This might sometimes be a win.
    For an exhaustive treatment of this, see pages 499--501 of Knuth
    _Seminumerical Algorithms_, 3 ed.  

3.  Perhaps surprisingly, this is not optimal.  No algorithm is known
    that is optimal in all cases!  For example, consider the
    calculation of 2**15.  The algorithm given above says to calculate
    2**2, 2**3, 2**6, 2**7, 2**14, 2**15, for a total of 6
    multiplications.  But if you calculate 2**2, 2**3, 2**5, 2**10,
    2**15, you get away with only 5.  (You can get 2**15 by
    multiplying 2**5 and 2**10, and other items similarly.)  For an
    exhaustive treatment of this, see pages 463--481 of Knuth
    _Seminumerical Algorithms_, 3 ed.