Custom Types¶

This section explains how to deal with a data type or a binary coding that bitstream does not support natively: how to define bitstream writer and reader functions and register them so that your custom types behave like native ones.

>>> import sys
>>> import bitstream
>>> from bitstream import BitStream

We use the example of representation of unsigned integers as binary numbers; out of the box, bitstream only supports unsigned integers of fixed size (refer to Built-in Types / Integers for details).

Definition¶

For every type of data that we want bitstream to support, we need to specify at least one writer function that encodes the data as a bitstream and one reader function that decodes data out of bitstreams.

The signature of writers is

def writer(stream, data)

where

stream is a BitStream instance,
the type of data is writer-dependent.

A writer is totally free to specify what is a valid data, but it is sensible to accept:

instances of a reference data type (or some of these instances),
data that can be safely converted to the reference data type,
sequences (lists, arrays, etc.) of the reference type (or assimilated).

A writer should raise an exception (ValueError or TypeError) when the data is invalid.

To write unsigned integers as binary numbers for example, we can consider as valid anything any non-negative integer-like data (defined as anything that the constructor int accepts) as well as lists of such data.

>>> def write_uint(stream, data):
...     if isinstance(data, list):
...         for integer in data:
...             write_uint(stream, integer)
...     else:
...         integer = int(data)
...         if integer < 0:
...             error = "negative integers cannot be encoded"
...             raise ValueError(error)
...         bools = []
...         while integer:
...             bools.append(integer & 1)
...             integer = integer >> 1
...         bools.reverse()
...         stream.write(bools, bool)

This writer behaves as expected:

>>> stream = BitStream()
>>> write_uint(stream, 42)
>>> stream
101010
>>> write_uint(stream, [1, 2, 3])
>>> stream
10101011011
>>> write_uint(stream, -1)
Traceback (most recent call last):
...
ValueError: negative integers cannot be encoded
>>> write_uint(stream, {}) # doctest: +ELLIPSIS
Traceback (most recent call last):
...
TypeError: int() argument must be ..., not 'dict'

The signature of readers is:

def reader(stream, n=None)

where

stream is a BitStream instance,
n is a non-negative integer (or None).

The call read(stream, n) should read n data items out of stream when n is an integer. However, bitstream does not require a specific type of container (list, array, string, etc.), the choice is all yours; for consistency however, you should pick a type of container that your writer supports.

The semantics of call read(stream) (when n=None) is up to you; for most of built-in types, it returns a single (unboxed) datum of the stream but there are sometimes good reasons to decide otherwise (see for example strings). The support for this default case is not mandatory.

Actually, readers may support only a subset of the possible values of n; for example they may allow only n=1 and n=None. If a reader is called with an invalid value of n, a ValueError or TypeError exception should be raised. If instead the read fails because there is not enough data in the stream or more generally if the binary data cannot be decoded, a ReadError (from bitstream) should be raised.

When we represent unsigned integers as binary numbers, while we can write multiple integers in the same stream, we cannot read unambiguously multiple integers from the stream: the code is not self-delimiting. For example 110 can be split as 1 then 10 and code for the integers 1 and 2 but also as 11 and 0 which represent the integers 3 and 0.

Thus, we design a reader that reads the whole stream as a single integer: we support only the cases n=1 and for convenience the default n=None with the same result.

A possible implementation of this reader is:

>>> def read_uint(stream, n=None):
...     if n is not None and not n == 1:
...         error = "unsupported argument n = {0!r}".format(n)
...         raise ValueError(error)
...     else:
...         integer = 0
...         for _ in range(len(stream)):
...             integer = integer << 1
...             if stream.read(bool):
...                 integer += 1
...     return integer

It behaves as expected:

>>> stream = BitStream()
>>> write_uint(stream, 42)
>>> read_uint(stream)
42
>>> write_uint(stream, [1, 2, 3]) 
>>> read_uint(stream)
27
>>> len(stream)
0
>>> write_uint(stream, 42)
>>> read_uint(stream, 1)
42
>>> write_uint(stream, 42)
>>> read_uint(stream, 2)
Traceback (most recent call last):
...
ValueError: unsupported argument n = 2

Registration¶

To fully integrate unsigned integers into bitstream, you need to associate a unique type identifier to the reader and/or writer, This type identifier is usually a type; a user-defined type with an empty definition will do:

>>> class uint(object):
...     pass

Once the type uint has been associated to the unsigned integer writer

>>> bitstream.register(uint, writer=write_uint)

we can use the write method of BitStream to encode unsigned integers

>>> stream = BitStream()
>>> stream.write(42, uint)
>>> stream
101010

>>> stream = BitStream()
>>> stream.write([2, 2, 2], uint)
>>> stream
101010

and also the shorter former using the BitStream constructor

>>> BitStream(42, uint)
101010
>>> BitStream([2, 2, 2], uint)
101010

Once the reader is registered

>>> bitstream.register(uint, reader=read_uint)

we can also use the read method of BitStream:

>>> BitStream(42, uint).read(uint)
42

Here, the uint type was merely an identifier for our reader and writer, but "real" types can be used too. If you write some data whose type is the type identifier of a writer, you don't need to specify explicitly the type identifier in writes.

For example, if we also associate our writer with Python integers:

>>> bitstream.register(int, writer=write_uint)
>>> if sys.version_info[0] == 2: # Python 2 has a 'long integer' type
...     bitstream.register(long, writer=write_uint)

then every Python integer will be automatically encoded with the write_uint writer

>>> BitStream(42)
101010
>>> BitStream([2, 2, 2])
101010

Factories¶

The coding of arbitrary unsigned integers as binary numbers doesn't allow us to represent unambiguously multiple numbers in a stream. However, if there is a known bound on the integers we use, we can assign a sufficient numbers of bits to each integer, pad the binary numbers with enough zeros of the left to use the same number of bits and this code is self-delimiting.

However, to do that, we would have to define and register a new writer for every possible number of bits. Instead, we may register a single but configurable writer, defined by a writer factory.

Let's define a type identifier factory uint whose instances hold a number of bits:

>>> class uint(object):
...     def __init__(self, num_bits):
...         self.num_bits = num_bits

Then, we define a writer factory: given an instance of uint, it returns a stream writer:

>>> def write_uint_factory(instance):
...     num_bits = instance.num_bits
...     def write_uint(stream, data):
...         if isinstance(data, list):
...             for integer in data:
...                 write_uint(stream, integer)
...         else:
...             integer = int(data)
...             if integer < 0:
...                 error = "negative integers cannot be encoded"
...                 raise ValueError(error)
...             bools = []
...             for _ in range(num_bits):
...                 bools.append(integer & 1)
...                 integer = integer >> 1
...             bools.reverse()
...             stream.write(bools, bool)
...     return write_uint

Finally, we register this writer factory with bitstream:

>>> bitstream.register(uint, writer=write_uint_factory)

To select a writer, we use the appropriate type identifier:

>>> BitStream(255, uint(8))
11111111
>>> BitStream(255, uint(16))
0000000011111111
>>> BitStream(42, uint(8))
00101010
>>> BitStream(0, uint(16))
0000000000000000

The definition of a reader factory is similar:

>>> def read_uint_factory(instance):
...     num_bits = instance.num_bits
...     def read_uint(stream, n=None):
...         if n is None:
...             integer = 0
...             for _ in range(num_bits):
...                 integer = integer << 1
...                 if stream.read(bool):
...                     integer += 1
...             return integer
...         else:
...             integers = [read_uint(stream) for _ in range(n)]
...             return integers
...     return read_uint

Once the reader factory is registered

>>> bitstream.register(uint, reader=read_uint_factory)

we can use the family of type identifiers in reads too:

>>> stream = BitStream([0, 1, 2, 3, 4], uint(8))
>>> stream.read(uint(8))
0
>>> stream.read(uint(8), 1)
[1]
>>> stream.read(uint(8), 3)
[2, 3, 4]