Skip to main content

Accumulators

Efficiency of reverse

Let’s look at the standard solutions for (++) and reverse earlier:

(++) :: [a] -> [a] -> [a]
[] ++ ys = ys
(x : xs) ++ ys = x : (xs ++ ys)
reverse :: [a] -> [a]
reverse [] = []
reverse (x : xs) = reverse xs ++ [x]

What is the efficiency of reverse?

Answer: it has quadratic complexity, as (++) is linear in its left argument and reverse reduces to a linear chain of (++) invocations.

A better reverse

Let’s build the reversed list as we go in an additional, accumulating argument:

reverseAcc :: [a] -> [a] -> [a]
reverseAcc acc [] = acc
reverseAcc acc (x : xs) = reverseAcc (x : acc) xs
reverse :: [a] -> [a]
reverse = reverseAcc []
note
  • We need the extra creative step to move from reverse to reverseAcc ,
  • But the function reverseAcc uses the standard design principle for lists again;
  • We now traverse the list only once.

Accumulators in general

We have seen that reverse benefits significantly from introducing an accumulator.

Can the same idea be applied elsewhere?

The sum function

Applying the standard design pattern:

sum :: [Int] -> Int
sum [] = 0
sum (x : xs) = x + sum xs

Trying in GHCi with artificially limited stack size (ghci +RTS -K1M):

GHCi> sum [1 . . 100000]
*** Exception: stack overflow

Why is the function consuming so much stack space?

Equational reasoning

sum [1, 2, 3]
= sum (1 : 2 : 3 : []) -- removing syntactic sugar
= 1 + sum (2 : 3 : []) -- by definition of sum
= 1 + (2 + sum (3 : [])) -- by definition of sum
= 1 + (2 + (3 + sum [])) -- by definition of sum
= 1 + (2 + (3 + 0)) -- by definition of sum
= 1 + (2 + 3) -- by definition of (+)
= 1 + 5 -- by definition of (+)
= 6 -- by definition of (+)

We build an entire right-nested chain of additions until we reach the end of the list. Only then can we start reducing them.

An accumulator for the sum?

Can an accumulator help? Let’s try:

sumAcc :: Int -> [Int] -> Int
sumAcc acc [] = acc
sumAcc acc (x : xs) = sumAcc (x + acc) xs
sum :: [Int] -> Int
sum = sumAcc 0

Trying in stack-limited GHCi again:

GHCi> sum [1 . . 100000]
*** Exception: stack overflow

More equational reasoning

sum [1, 2, 3]
= sum (1 : 2 : 3 : []) -- removing syntactic sugar
= sumAcc 0 (1 : 2 : 3 : []) -- by definition of sum
= sumAcc (0 + 1) (2 : 3 : []) -- by definition of sumAcc
= sumAcc ((0 + 1) + 2) (3 : []) -- by definition of sumAcc
= sumAcc (((0 + 1) + 2) + 3) [] -- by definition of sumAcc
= ((0 + 1) + 2) + 3 -- by definition of sumAcc
= (1 + 2) + 3 -- by definition of (+)
= 3 + 3 -- by definition of (+)
= 6 -- by definition of (+)
= 1 + (2 + (3 + sum [])) -- by definition of sum
= 1 + (2 + (3 + 0)) -- by definition of sum
= 1 + (2 + 3) -- by definition of (+)
= 1 + 5 -- by definition of (+)
= 6 -- by definition of (+)

The problem

We build the expression

sumAcc (0 + 1) (2 : 3 : [])

Perhaps surprisingly, 0 + 1 is not reduced, because it is not needed until much later.

We therefore build a (left-nested) chain of additions again until we reach the end of the list ...

Accumulators are problematic in a lazy evaluation scenario: we update them often, but do not really need them evaluated for a long time.

Perhaps we can provide a hint to the compiler that we want this evaluated sooner?

Making the accumulator strict with bang patterns

sumAcc :: Int -> [Int] -> Int
sumAcc !acc [] = acc
sumAcc !acc (x : xs) = sumAcc (x + acc) xs
sum :: [Int] -> Int
sum = sumAcc 0

Note that we write !acc now. A bang pattern means that GHC will evaluate the passed value to its outermost constructor (but only to that!) even though the pattern is just a variable and would normally match without evaluation.

For an Int, evaluating to the outermost constructor means evaluating it completely.

Bang patterns language extension

Bang patterns require the BangPatterns language extension, which can be enabled via a compiler pragma:

{-# LANGUAGE BangPatterns #-}

at the top of the file.

Problem fixed

In our stack-restricted GHCi:

GHCi> sum [1 . . 100000]
5000050000

Or even:

GHCi> sum [1 . . 10000000]
50000005000000

This version of sum actually runs in constant space. The equational reasoning is similar to the previous accumulating version, only that the accumulator is now evaluated immediately on every update.

Make accumulators strict

Accumulators should nearly always be strict. Make them strict by default. There are few situations (such as reverse) where strictness of the accumulator is not required, but even there it is not harmful.

Always accumulators?

No!

The accumulating parameter pattern always has to traverse the entire list before producing a result! It cannot stop early or produce results incrementally. It can also not work for infinite lists:

GHCi> and (False : repeat True)
False
GHCi> take 10 (map (+ 1) [1 . . 10])
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Both of these functions should be defined using the standard design pattern for lists – without an accumulator – to enable the behaviour observed above.