Decoupling Algorithms from Containers
An iterator decouples an algorithm from the container it runs on. Code written against an iterator does not care whether the data came from a list, a file, a database cursor, or a computation: it just asks for the next item. Alexander Stepanov, who designed the C++ STL, put iterators at the center of generic programming for exactly this reason.
In many languages this is a pattern you assemble by hand, with an
Iterator interface and classes that implement it. In
Python it is built into the language. Any object that follows the
iterator protocol works with for,
comprehensions, sum(), sorted(),
unpacking, and every function that takes an iterable. The pattern is
the language.
Two methods make up the protocol. An iterable has
__iter__(), which returns an iterator. An
iterator has __next__(), which returns the next item or
raises StopIteration. The for loop calls
these for you, so you almost never call them directly. Because every
container speaks this one protocol, a function written against an
iterable is automatically decoupled from the container.
You rarely need to write
__iter__()/__next__() by hand, because a
generator writes them for you. A function with a
yield statement returns an iterator that produces each
yielded value in turn, pausing and resuming its own state. A class
becomes iterable by writing __iter__() as a
generator:
# iterators.py
# Iterators and generators are built into Python.
from collections.abc import Iterable, Iterator
# A generator function is the easy way to produce an iterator:
def fibonacci(n: int) -> Iterator[int]:
a, b = 0, 1
for _ in range(n):
yield a
a, b = b, a + b
# A class becomes iterable by implementing __iter__, often as a
# generator:
class Countdown:
def __init__(self, start: int) -> None:
self.start = start
def __iter__(self) -> Iterator[int]:
n = self.start
while n > 0:
yield n
n -= 1
# Any function written against an iterable is decoupled from its
# source:
def total(numbers: Iterable[int]) -> int:
return sum(numbers)
print(list(fibonacci(8)))
print(list(Countdown(5)))
print(total(fibonacci(8))) # works on a generator
print(total([1, 2, 3, 4])) # and on a list
print(total(Countdown(5))) # and on a custom iterabletotal() is the decoupled algorithm Stepanov was
after: it works on the generator, the list, and the custom
Countdown without knowing or caring which. Generators
are also lazy. fibonacci(1_000_000) computes nothing
until you iterate, and produces one value at a time, so it works on
streams too large to hold in memory. A generator can even be
infinite: a while True loop that yields
forever, or itertools.count(), produces values on
demand with no end. You take only as many as you need (see
itertools.islice below), which a list could never
do.
Generic iterator algorithms ship in the standard library’s
itertools module: chain,
islice, groupby, takewhile,
and more, each consuming and producing iterators. Combined with
generator expressions, such as
(x * x for x in data if x > 0), they let you build
pipelines that stay lazy end to end. This is the “say what, not how”
style the STL aimed for, available out of the box.
Sometimes you want to wrap an existing iterator and change its behavior. That is the Decorator pattern: produce a new iterator with the same interface but added behavior. Here is one that enforces that every item is of an expected type, raising otherwise:
# typed_iterator.py
from collections.abc import Iterator
from typing import Any
class TypedIterator(Iterator[Any]):
def __init__(self, it: Iterator[Any], expected: type) -> None:
self.imp = it
self.expected = expected
def __next__(self) -> Any:
obj = next(self.imp)
if not isinstance(obj, self.expected):
raise TypeError(
f"TypedIterator for {self.expected} "
f"encountered {type(obj).__name__}")
return objSubclassing collections.abc.Iterator provides
__iter__() automatically, so only
__next__() is needed. A generator wraps an iterator
just as well and in fewer lines:
# typed_generator.py
# A generator that type-checks each item as it passes through.
from collections.abc import Iterable, Iterator
from typing import Any
def typed(it: Iterable[Any], expected: type) -> Iterator[Any]:
for obj in it:
if not isinstance(obj, expected):
raise TypeError(
f"expected {expected}, got {type(obj).__name__}")
yield obj
if __name__ == "__main__":
print(list(typed([1, 2, 3], int)))Use the class when the wrapper needs its own state or extra methods; use the generator when it does not. Either way, the result plugs into every place that accepts an iterator, because they all speak the same protocol.
Iterators are easy to test by collecting them into a list and
comparing. Worth covering: the generated sequences, the empty edge
cases, that a custom iterable can be iterated more than once, that a
function written against Iterable works on every
source, and that the type-checking wrappers raise on a mismatch.
# test_iterators.py
import pytest
from iterators import Countdown, fibonacci, total
from typed_generator import typed
from typed_iterator import TypedIterator
def test_fibonacci_sequence() -> None:
assert list(fibonacci(8)) == [0, 1, 1, 2, 3, 5, 8, 13]
assert list(fibonacci(0)) == []
assert list(fibonacci(1)) == [0]
def test_countdown_sequence() -> None:
assert list(Countdown(5)) == [5, 4, 3, 2, 1]
assert list(Countdown(0)) == []
def test_countdown_is_reiterable() -> None:
c = Countdown(3)
assert list(c) == [3, 2, 1]
assert list(c) == [3, 2, 1] # __iter__ yields a fresh generator
def test_total_over_any_iterable() -> None:
assert total([1, 2, 3, 4]) == 10
assert total(fibonacci(8)) == 33
assert total(Countdown(5)) == 15
def test_typed_generator_passes_and_rejects() -> None:
assert list(typed([1, 2, 3], int)) == [1, 2, 3]
with pytest.raises(TypeError):
list(typed([1, "two", 3], int))
def test_typed_iterator_passes_and_rejects() -> None:
assert list(TypedIterator(iter([1, 2, 3]), int)) == [1, 2, 3]
with pytest.raises(TypeError):
list(TypedIterator(iter([1, "two"]), int))evens(n) that yields the first
n even numbers, and confirm total() from
iterators.py sums them without modification.Countdown to also support
len(), then explain why a plain generator cannot.itertools.islice to take the first 10 values of
fibonacci(1_000_000) without computing the rest.