Tail recursion
When a function’s last action is calling itself, we say it’s tail recursive.
For instance, a tail recursive implementation of gcd
(the greatest common
divisor) in Haskell is as follows:
gcd :: (Integral a) => a -> a -> a
gcd x y = gcd' (abs x) (abs y)
where gcd' a 0 = a
gcd' a b = gcd' b (a `rem` b)
The interest in these functions is that they can be optimized easily by compilers, which can replace the recursive implementation by a more performant iterative one. A tail-recursive implementation is able to execute an iterative process in constant space, even if the process is described by a recursive procedure.
The automatic optimization of tail recursions was popularized by
Guy L. Steele Jr.
(although replacing a JSR
+ RET
with JMP
was possibly known earlier).
If there is a tail-recursive implementation of a function, then special iteration constructs (e.g. while and for loops in you average imperative or object-oriented language) are useful only as syntactic sugar, since the iteration can otherwise be expressed by the usual function call.
Further Examples:
A common definition of the length
of a list that can be found in books is as
follows:
length :: [a] -> Int
length [] = 0
length (x:xs) = 1 + length xs
This is not tail-recursive. Given that when asking for the length of a list, we know that we will need to go to the end of it, it makes sense to define length in a tail-recursive way:
length :: [a] -> Int
length xs = lenAcc xs 0
where lenAcc [] n = n
lenAcc (_:ys) n = lenAcc ys (n+1)
This definition is (with exception of where
) verbatim from Haskell’s prelude.
The standard definition of filter is also not tail-recursive:
filter _ [] = []
filter p (x:xs)
| p x = x : filter p xs
| otherwise = filter p xs
We can define a tail-recursive version as follows:
filter f xs = filter' xs []
where filter' [] rs = reverse rs
filter' x:xs rs
| f x = filter xs (x :xs)
| otherwise = filter' xs rs
However, tail recursion imposes strictness, since only the very last call can
return something. This implementation thus fails for infinite lists (e.g. we
can’t take 10 (filter even [1..])
), which is generally undesirable.
The same happens with the standard map:
map f [] = []
map f (x:xs) = f x : map f xs
which may be defined tail-recursively as follows:
map f (x:xs) = map' xs []
where map' [] rs = reverse rs
map' (x:xs) rs = map' xs (f x : rs)
The second equation for map'
is clearly tail-recursive, and since reverse is
tail-recursive, the whole of map
is. This has, however, the same problem as
any tail-recursive function has: it prevents returning a partial result under
lazy evaluation.
Note that foldl
is tail-recursive:
foldl f e [] = e
foldl f e (x:xs) = foldl f (f e x) xs
However, foldl
is discouraged in Haskell because even though it is
tail-recursive, its accumulating parameter isn’t evaluated before the recursive
call due to Haskell’s normal-order evaluation. For example, an execution of
foldl (+) 0 [1,2,3,4]
is as follows:
foldl (+) 0 [1,2,3,4]
foldl (+) (0 + 1) [2,3,4]
foldl (+) ((0 + 1) + 2) [3,4]
foldl (+) (((0 + 1) + 2) + 3) [4]
foldl (+) ((((0 + 1) + 2) + 3) + 4) []
((((0 + 1) + 2) + 3) + 4)
(((1 + 2) + 3) + 4)
((3 + 3) + 4)
(6 + 4)
10
As can be seen, thunks are created and kept in memory until the end of the list
is reached and they can start to be evaluated. This is unnecessarily costly,
can lead to stack overflows, and is contrary to what we would normally expect of a
tail-recursive call. That’s why there is foldl'
, a strict variant of foldl
which forces evaluation of each thunk before recursing:
foldl' (+) 0 [1,2,3,4]
foldl' (+) (0 + 1) [2,3,4]
foldl' (+) (1 + 2) [3,4]
foldl' (+) (3 + 3) [4]
foldl' (+) (6 + 4) []
10
Comparison of two factorial implementations in C
Given the two following simple definitions of a factorial function in C:
/* tail-recursive-factorial.c */
unsigned
fact(unsigned n, unsigned acc) {
if (n == 0)
return acc;
return fact(n - 1, n * acc);
}
/* iterative-factorial.c */
unsigned
fact(unsigned n) {
unsigned ret = 1;
while (n != 0) {
ret *= n--;
}
return ret;
}
Both are equivalent, provided we pass 1
as the accumulating parameter when
calling the first one (that is, $5!$ would be fact(5, 1)
). The first function
is defined recursively, while the second one iteratively. One might mistakenly
assume the iterative one to be more efficient, but that doesn’t need to be the
case. As an example, after calling GCC (version 7.3.1) with -O2 -S
(to enable
optimizations, and generate assembly output), I get the following definitions of
fact
:
# iterative-factorial.s
fact:
.LFB11:
.cfi_startproc
test edi, edi # Test whether n == 0.
mov eax, 1 # ret = 1;
je .L4 # Go to .L4 if edi was 0
.p2align 4,,10
.p2align 3
.L3:
imul eax, edi # ret *= n;
sub edi, 1 # n--;
jne .L3 # if n != 0, loop once more.
rep ret
.p2align 4,,10
.p2align 3
.L4:
rep ret
.cfi_endproc
# tail-recursive-factorial.s
.LFB0:
.cfi_startproc
test edi, edi
mov eax, esi # store in eax the value of acc.
je .L5
.p2align 4,,10
.p2align 3
.L2:
imul eax, edi
sub edi, 1
jne .L2
.L5:
rep ret
.cfi_endproc
Aside from label differences, and some alignment instructions, both codes are doing exactly the same: multiply eax and edi, decrease edi, and loop until edi is 0.
Final remarks
A downside of tail-recursion is that, since the code is compiled into a loop, there is no stack trace, which can be counterintuitive when debugging. This is why python (purposely) doesn’t optimize tail-recursive calls.
References:
- History of tail-call optimization (https://erlang.org/pipermail/erlang-questions/2006-August/022055.html)
- Structure and Interpretation of Computer Programs, p.35-36.
- https://wiki.haskell.org/Fold
- https://mail.haskell.org/pipermail/haskell-cafe/2011-March/090237.html
- http://neopythonic.blogspot.com.ar/2009/04/tail-recursion-elimination.html