Object-Oriented Programming 2

Duck typing, inheritance, and protocols

Sébastien Boisgérault
Associate Professor, ITN Mines Paris – PSL

Duck Typing

If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

Case Study

The function below reads the content of a file object and writes it to another one:

def copy_file(input, output):
    data = input.read()
    output.write(data)

Let’s create a (very small) binary file image.png on our hard drive

>>> with open("image.png", mode="bw") as image_file:
...     image_file.write(b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01\x00\x00\x00\x01\x00\x01\x03\x00\x00\x00f\xbc:%\x00\x00\x00\x03PLTE\xb5\xd0\xd0c\x04\x16\xea\x00\x00\x00\x1fIDATh\x81\xed\xc1\x01\r\x00\x00\x00\xc2\xa0\xf7Om\x0e7\xa0\x00\x00\x00\x00\x00\x00\x00\x00\xbe\r!\x00\x00\x01\x9a`\xe1\xd5\x00\x00\x00\x00IEND\xaeB`\x82')
...

Then let’s use copy_file to create a copy named image-copy.png.

>>> input = open("image.png", mode="br")
>>> output = open("image-copy.png", mode="bw")
>>> copy_file(input, output)

Everything goes as planned! However, we could have saved ourselves the creation of the initial file and create an object similar to a file, but which stores its content in memory rather than on our hard drive.

>>> import io
>>> buffer = io.BytesIO(b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01\x00\x00\x00\x01\x00\x01\x03\x00\x00\x00f\xbc:%\x00\x00\x00\x03PLTE\xb5\xd0\xd0c\x04\x16\xea\x00\x00\x00\x1fIDATh\x81\xed\xc1\x01\r\x00\x00\x00\xc2\xa0\xf7Om\x0e7\xa0\x00\x00\x00\x00\x00\x00\x00\x00\xbe\r!\x00\x00\x01\x9a`\xe1\xd5\x00\x00\x00\x00IEND\xaeB`\x82')
>>> buffer.seek(0)

(The call buffer.seek(0) reposition the read/write cursor at the beginning of the file.)

We can then copy its content the same way as before

>>> input = buffer
>>> output = open("image-copy.png", mode="bw")
>>> copy_file(input, output)

In fact the original image is a blue-gray tile used by the OpenStreetMap mapping project (cf. “The smallest 256x256 single-color PNG file, and where you’ve seen it”).

The smallest 256x256 single-color PNG file

It is available online at the address https://www.mjt.me.uk/assets/images/smallest-png/openstreetmap.png. So we could have created an object similar to a file but that knows how to open web resources rather than manually copying its content.

>>> from urllib.request import urlopen
>>> url = "https://www.mjt.me.uk/assets/images/smallest-png/openstreetmap.png"
>>> input = urlopen(url)

Again, the copy between this remote file and its local copy takes place as before.

>>> output = open("image-copy.png", mode="bw")
>>> copy_file(input, output)

Protocols

What matters in the three previous use cases is not that the object input is a real file, but that it behaves like one. Here, precisely, the function copy_file needs an object input that:

has a method read,
which is invoked without arguments,
and returns an object of type bytes.

That’s all the function copy_file asks of its argument input for it to work: that it be sufficiently similar to a “real” file. We don’t ask that it be of a particular type, for example that it passes the isinstance test isinstance(input, File).

This less demanding type of typing is what in Python is called duck typing.

Note that for the moment, the constraints that argument input of the function copy_file must satisfy is only a (moral) contract between the designer of the function and its user: as long as the user respects the contract, everything will go as planned. We sometimes speak of protocol to refer to this kind of contract.

At this stage, the Python interpreter is not informed of this contract and does nothing to ensure that the mutual commitment is respected. It is up to the developer of the function to document this protocol and to the user to read this documentation and respect it.

Static Verification

There are tools that allow (partially) formalizing the contracts on which your programs depend, for example mypy.

In return for the work that will consist of describing the protocols, you will have a tool that informs you of certain violations of the contracts when writing the code, and not much later, during execution.

For example, we can formalize the two protocols associated with the arguments of our function copy_file

from typing import Protocol

class Readable(Protocol):
    def read(self) -> bytes:
        pass

class Writable(Protocol):
    def write(self, data: bytes):
        pass

Then annotate the type of the function arguments to indicate which protocol must be respected.

def copy_file(input: Readable, output: Writable):
    data = input.read()
    output.write(data)

If we use the client code

from urllib.request import urlopen
url = "https://www.mjt.me.uk/assets/images/smallest-png/openstreetmap.png"
input = urlopen(url)
output = open("image-copy.png", mode="bw")
copy_file(input, output)

Mypy will affirm that from its point of view, everything is fine

$ mypy main.py 
Success: no issues found in 1 source file

However, if we make a mistake by providing, for example as second argument to the function copy_file a file name rather than a file object

from urllib.request import urlopen
url = "https://www.mjt.me.uk/assets/images/smallest-png/openstreetmap.png"
input = urlopen(url)
output = open("image-copy.png", mode="bw")
copy_file(input, "image-copy.png")

Then mypy will inform us.

$ mypy main.py 
main.py: error: Argument 2 to "copy_file" has incompatible type "str"; expected "Writable"
Found 1 error in 1 file (checked 1 source file)

Inheritance

Let’s consider our “homemade” complex number class again.

class Complex:
    def __init__(self, real, imag):
        self.set_real(real)
        self.set_imag(imag)
    def get_real(self):
        return self._real
    def set_real(self, real):
        if isinstance(real, float):
            self._real = real
        else:
            raise TypeError(f"{real!r} is not a float")
    real = property(get_real, set_real)
    def get_imag(self):
        return self._imag
    def set_imag(self, imag):
        if isinstance(imag, float):
            self._imag = imag
        else:
            raise TypeError(f"{imag!r} is not a float")
    imag = property(get_imag, set_imag)
    def conjugate(self):
        return Complex(self._real, -self._imag)
    def __repr__(self):
        # ⚠️ weird output when self.imag < 0
        return f"({self._real}+{self._imag}j)"
    def __add__(self, other):
        return Complex(
            self._real + other._real, 
            self._imag + other._imag
        )

We will try to equip ourselves with a new complex number class, Complex2 whose instances will have a behavior that suits us better, without modifying the source code of Complex, but by exploiting its functionality to the maximum.

To do this, we will derive the class Complex2 from the class Complex; the new class will inherit the behavior of the previous class. At minimum, this means a declaration of the form

class Complex2(Complex):
    pass

At this stage, essentially no change in the behavior of complex numbers that are instances of it, because all methods of Complex2 are inherited from those of Complex:

>>> z = Complex2(0.5, 1.5)
>>> z
(0.5+1.5j)
>>> z.real
0.5
>>> z.real = -0.5
>>> z
(-0.5+1.5j)
>>> z.real = "Hello"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 11, in set_real
TypeError: 'Hello' is not a float
>>> w = Complex2.conjugate(z)
>>> w.real
0.5
>>> w.imag
-1.5
>>> z + z.conjugate()
(1+0j)

We even have

>>> isinstance(z, Complex)
True

As a consequence, we can substitute an instance of class Complex2 to a function that expects an instance of class Complex. The function is said to be polymorphic: it works with a given type of object, but also with derived types designed by the programmer.

The only visible changes between Complex and Complex2 are the tests that explicitly ask for the type of the complex object z and the test isinstance(z, Complex2).

>>> type(z) is Complex
False
>>> type(z) is Complex2
True
>>> isinstance(z, Complex2)

What motivates the introduction of a new complex number class initially is that we forgot to implement multiplication:

>>> Complex(1.0, 0.0) * Complex(0.0, 1.0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for *: 'Complex' and 'Complex'

Let’s fix this oversight by adding a method __mul__ to the class Complex2

class Complex2(Complex):
    def __mul__(self, other):
        r1, i1 = self.real, self.imag
        r2, i2 = other.real, other.imag
        real = r1*r2 - i1*i2
        imag = r1*i2 + r2*i1
        return Complex2(real, imag)

>>> Complex2(1.0, 0.0) * Complex2(0.0, 1.0)
(0.0+1.0j)

It’s better! There is actually a subtle bug (do you see which one?) but we will wait a bit to fix it, we will soon be better placed to correct the problem.

In the meantime, we want to make our constructor a bit more versatile; we would like to be able to construct a complex number from any object that has numeric attributes real and imag, for example, a built-in complex number, instance of the class complex. With the Complex class, this doesn’t work:

>>> Complex(0.5+1.5j)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __init__() missing 1 required positional argument: 'imag'

And not more with the Complex2 class:

>>> Complex2(0.5+1.5j)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __init__() missing 1 required positional argument: 'imag'

Indeed, in the absence of a constructor __init__ that is its own, new complexes are instantiated using the inherited method __init__.

But we can define a new constructor __init__ that will have priority. To do this, we test if the first argument named real_or_complex has fields real and imag. If not, we interpret it as a complex number; otherwise, we use this argument as the real part and the second as the imaginary part.

class Complex2(Complex):
    def __init__(self, real_or_complex, imag=None):
        try:
            real = real_or_complex.real
            imag = real_or_complex.imag
        except AttributeError:
            real = real_or_complex
            imag = imag
        self.real = real
        self.imag = imag
    def __mul__(self, other):
        r1, i1 = self.real, self.imag
        r2, i2 = other.real, other.imag
        real = r1*r2 - i1*i2
        imag = r1*i2 + r2*i1
        return Complex2(real, imag)

Note that the last two lines of the constructor are a copy-paste from the parent constructor. We might as well directly call that one! We can either use the explicit syntax Complex.__init__(self, real, imag) or the super() construction as below:

class Complex2(Complex):
    def __init__(self, real_or_complex, imag=None):
        try:
            real = real_or_complex.real
            imag = real_or_complex.imag
        except AttributeError:
            real = real_or_complex
            imag = imag
        super().__init__(real, imag)
    def __mul__(self, other):
        r1, i1 = self.real, self.imag
        r2, i2 = other.real, other.imag
        real = r1*r2 - i1*i2
        imag = r1*i2 + r2*i1
        return Complex2(real, imag)

Now, the constructor of Complex2 accepts complex arguments:

>>> Complex2(0.5+1.5j)
(0.5+1.5j)
>>> Complex2(Complex(0.5, 1.5))
(0.5+1.5j)

It’s time to come back to the subtle bug we mentioned. By inheriting the method __add__ from the parent class Complex, we will unfortunately always get an instance of Complex when adding instances of Complex2.

>>> z = Complex2(0.5, 1.5)
>>> w = z + z
>>> type(w)
<class '__main__.Complex'>

It is possible to fix this directly by reimplementing __add__ in the derived class

class Complex2(Complex):
    def __init__(self, real_or_complex, imag=None):
        try:
            real = real_or_complex.real
            imag = real_or_complex.imag
        except AttributeError:
            real = real_or_complex
            imag = imag
        super().__init__(real, imag)
    def __add__(self, other):
        return Complex2(
            self.real + other.real, 
            self.imag + other.imag
        )
    def __mul__(self, other):
        r1, i1 = self.real, self.imag
        r2, i2 = other.real, other.imag
        real = r1*r2 - i1*i2
        imag = r1*i2 + r2*i1
        return Complex2(real, imag)

It works, but it means losing the benefit of what has already been implemented. We can be more subtle, call the parent class method for addition and correct the result type afterward, with our brand-new constructor:

class Complex2(Complex):
    def __init__(self, real_or_complex, imag=None):
        try:
            real = real_or_complex.real
            imag = real_or_complex.imag
        except AttributeError:
            real = real_or_complex
            imag = imag
        super().__init__(real, imag)
    def __add__(self, other):
        # sum = Complex.__add__(self, other) would also work.
        sum = super().__add__(other) 
        return Complex2(sum)
    def __mul__(self, other):
        r1, i1 = self.real, self.imag
        r2, i2 = other.real, other.imag
        real = r1*r2 - i1*i2
        imag = r1*i2 + r2*i1
        return Complex2(real, imag)

And now the sum behaves as expected

>>> z = Complex2(0.5, 1.5)
>>> w = z + z
>>> type(w)
<class '__main__.Complex2'>

Incidentally, let’s note that if a future generation of developers needs to take over our work and introduce a class Complex3 that derives from Complex2, they will face the same problem. To make their lives easier, we can use code that will adapt the type of the returned value to the type of self and which can therefore be inherited as-is in Complex3.

class Complex2(Complex):
    def __init__(self, real_or_complex, imag=None):
        try:
            real = real_or_complex.real
            imag = real_or_complex.imag
        except AttributeError:
            real = real_or_complex
            imag = imag
        super().__init__(real, imag)
    def __add__(self, other):
        ComplexType = type(self)
        sum = super().__add__(other)
        return ComplexType(sum)
    def __mul__(self, other):
        ComplexType = type(self)
        r1, i1 = self.real, self.imag
        r2, i2 = other.real, other.imag
        real = r1*r2 - i1*i2
        imag = r1*i2 + r2*i1
        return ComplexType(real, imag)

The Standard Library

`pathlib`

The pathlib module of the Python standard library provides path classes representing files and directories in a file system. More precisely

Path classes are divided into pure paths, which provide only file system manipulation operations without I/O, and concrete paths, which inherit from pure paths and also provide I/O operations.

In other words, pure paths (instances of PurePath) allow us to designate files but without accessing the file system properly. Instances of Path — which derive from PurePath — allow it.

Path classes are also distinguished depending on whether the file system is Windows or POSIX (Linux and MacOS), but we won’t worry about that here.

For example, on my Linux computer, I can designate the root of the file system with a pure path and use it to construct the (pure) path to the home directory of hypothetical users linus and boisgera:

>>> ROOT = PurePath("/")
>>> LINUS_HOMEDIR = ROOT / "home" / "linus"
>>> BOISGERA_HOMEDIR = ROOT / "home" / "boisgera"

But I can’t test if these directories actually exist:

>>> LINUS_HOMEDIR.exists()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'PurePosixPath' object has no attribute 'exists'
>>> BOISGERA_HOMEDIR.exists()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'PurePosixPath' object has no attribute 'exists'

However, I can do it after converting these to instances of Path:

>>> LINUS_HOMEDIR = Path(LINUS_HOMEDIR)
>>> BOISGERA_HOMEDIR = Path(BOISGERA_HOMEDIR)
>>> LINUS_HOMEDIR.exists()
False
>>> BOISGERA_HOMEDIR.exists()
True

Alternatively, and that’s probably the simplest, we could have started with a Path to designate the root

>>> ROOT = Path("/")
>>> LINUS_HOMEDIR = ROOT / "home" / "linus"
>>> BOISGERA_HOMEDIR = ROOT / "home" / "boisgera"
>>> LINUS_HOMEDIR.exists()
False
>>> BOISGERA_HOMEDIR.exists()
True

Since Path inherits from PurePath, Path instances can be used everywhere PurePath instances would work.

`random`

Introduction

The random module of the Python standard library generates pseudo-random numbers.

>>> import random

The random function of this module generates floating-point numbers uniformly distributed between $0.0$ and $1.0$.

>>> random.random()
0.17288416418484898
>>> random.random()
0.7270494197615684
>>> random.random()
0.22967289202282093

Multiple functions are provided to generate pseudo-random numbers following various probability distributions. For example, to generate numbers distributed according to the Gaussian with mean $\mu = 0.0$ and standard deviation $\sigma = 1.0$, we can invoke

>>> random.gauss(mu=0.0, sigma=1.0)
0.7010040262172509
>>> random.gauss(mu=0.0, sigma=1.0)
0.11430668630347102
>>> random.gauss(mu=0.0, sigma=1.0)
-0.49389733826503307

Object-Oriented Interface

Studying the source file random.py informs us that the classic module interface is only a thin veneer over an object architecture. The module defines a class Random, then creates a private instance _inst in this module. The “functions” of the module random like gauss are simply shortcuts to the methods of this instance

>>> random.random
<built-in method random of Random object at 0x55a5a09ad260>
>>> random.gauss
<bound method Random.gauss of <random.Random object at 0x55a5a09ad260>>
>>> r = random._inst
>>> type(r)
<class 'random.Random'>
>>> r.random
<built-in method random of Random object at 0x55a5a09ad260>
>>> r.gauss
<bound method Random.gauss of <random.Random object at 0x55a5a09ad260>>

The default method random generates random integers between $0$ and $2^{53} - 1$ (the probability of each integer being identical), then divides the result by $2^{53}$. The downside of this approach: random() returns a value that is always a multiple of $2^{-53}$. The floating-point number $2^{-1074}$, for example, which is the smallest strictly positive floating-point number, has no chance of ever being produced.

>>> r.random() * 2**53
4346481833061509.0
>>> r.random() * 2**53
6826402970501312.0
>>> r.random() * 2**53
5570978756682725.0

If that’s a problem for you, it is possible to correct this behavior as suggested by the documentation of the module random by defining a class derived from Random that overrides the method random

from math import ldexp

class AltRandom(random.Random):
    def random(self):
        mantissa = 0x10_0000_0000_0000 | self.getrandbits(52)
        exponent = -53
        x = 0
        while not x:
            x = self.getrandbits(32)
            exponent += x.bit_length() - 32
        return ldexp(mantissa, exponent)

Usage is immediate

>>> r = AltRandom()
>>> r.random()
0.2768487552410033
>>> r.random()
0.08881389087065399
>>> r.random()
0.28173863914986846

The values produced by the method random are no longer necessarily multiples of $2^{-53}$.

>>> r.random() * 2**53
6118147054761291.0
>>> r.random() * 2**53
1809975186779188.8
>>> r.random() * 2**53
6828617072759119.0

Since the other probability distributions use the method random as source of random values, we don’t need to reimplement anything else to benefit from this improved source of randomness.

>>> r.gauss(mu=0.0, sigma=1.0)
-0.28865100238160024
>>> r.gauss(mu=0.0, sigma=1.0)
-0.5190938357947126
>>> r.gauss(mu=0.0, sigma=1.0)
1.0356452612439027

`doctest`

Doctest is a unit testing module in the standard library. It checks that the examples in your documentation conform to the actual behavior of your code.

For example, with the code

# file: add.py
def add(x, y):
    """
    Numerical sum of two objects

    Usage:

    >>> add(1, 1)
    2
    >>> add(0.5, 0.25)
    0.75
    >>> add([1], [2])
    [3]
    """
    return x + y

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Executing the file tells you that among the three usage examples of your function add, the result for one of them is different from what was expected:

$ python add.py 
**********************************************************************
File "add.py", line 13, in __main__.add
Failed example:
    add([1], [2])
Expected:
    [3]
Got:
    [1, 2]
**********************************************************************
1 items had failures:
   1 of   3 in __main__.add
***Test Failed*** 1 failures.

Indeed, if we want an addition of lists “NumPy style”, then the current code is not the right one! Because + used on lists concatenates them instead of doing element-wise addition.

We diagnosed the problem, but we don’t have time to fix it right away. So we are going to temporarily suppress such erroneous results by marking known as erroneous with the symbol 🐛 (a bug). This will serve as a reminder!

To do this, we will derive from the class OutputChecker of doctest and override its method check_output to signal that any test whose result contains a bug symbol should be considered as passed. Then we will insert the resulting class in place of the OutputChecker class of doctest, to change the behavior of the module.

# file: doctest_patch.py
import doctest
_doctest_OutputChecker = doctest.OutputChecker

class OutputChecker(_doctest_OutputChecker):
    def check_output(self, want, got, optionflags):
        if "🐛" in want:
            return True
        else:
            return super().check_output(want, got, optionflags)

# 🐒 Monkey-patching
doctest.OutputChecker = OutputChecker

If we slightly modify the file add.py to mark our problematic test and import doctest_patch

# file: add.py

def add(x, y):
    """
    Numerical sum of two objects

    Usage:

    >>> add(1, 1)
    2
    >>> add(0.5, 0.25)
    0.75
    >>> add([1], [2])
    [3] 🐛
    """
    return x+y

if __name__ == "__main__":
    import doctest
    import doctest_patch
    doctest.testmod()

Then the tests run without error (no output means everything is fine).

$ python add.py

We can verify this by running the tests in “verbose” mode:

$ python add.py -v
Trying:
    add(1, 1)
Expecting:
    2
ok
Trying:
    add(0.5, 0.25)
Expecting:
    0.75
ok
Trying:
    add([1], [2])
Expecting:
    [3] 🐛
ok
1 items had no tests:
    __main__
1 items passed all tests:
   3 tests in __main__.add
3 tests in 2 items.
3 passed and 0 failed.
Test passed.