Built-in Types¶
Bitstream reads and writes work out-of-the-box for many Python data types. It is also tightly integrated with NumPy, since this is the library of choice to deals with arrays of numeric data.
>>> import bitstream
>>> from bitstream import BitStream
>>> from numpy import *
Bools¶
Write single bits to a bitstream with the arguments True
and False
:
>>> stream = BitStream()
>>> stream.write(False, bool)
>>> stream.write(True , bool)
>>> stream
01
Read them back in the same order with
>>> stream.read(bool)
False
>>> stream.read(bool)
True
>>> stream
<BLANKLINE>
Write multiple bits at once with lists of bools:
>>> stream = BitStream()
>>> stream.write([], bool)
>>> stream
<BLANKLINE>
>>> stream.write([False], bool)
>>> stream.write([True] , bool)
>>> stream
01
>>> stream.write([False, True], bool)
>>> stream
0101
Alternatively, specify the data type bool
as a a keyword argument:
>>> stream = BitStream()
>>> stream.write(False, type=bool)
>>> stream.write(True , type=bool)
>>> stream
01
For single bools or lists of bools, this type information is optional:
>>> stream = BitStream()
>>> stream.write(False)
>>> stream.write(True)
>>> stream.write([])
>>> stream.write([False])
>>> stream.write([True])
>>> stream.write([False, True])
>>> stream
010101
To read one boolean from a stream, do
>>> stream.read(bool)
False
>>> stream
10101
and to read several booleans, use the second method argument
>>> stream.read(bool, 2)
[True, False]
>>> stream
101
Since the booleans are returned in a list when the second argument
differs from the default value (which is None
),
stream.read(bool, 1)
is not same as stream.read(bool)
:
>>> copy = stream.copy()
>>> stream.read(bool, 1)
[True]
>>> copy.read(bool)
True
Numpy bool_
scalars or one-dimensional arrays can be used instead:
>>> bool_ # doctest: +ELLIPSIS
<... 'numpy.bool_'>
>>> stream = BitStream()
>>> stream.write(bool_(False) , bool)
>>> stream.write(bool_(True) , bool)
>>> stream
01
>>> stream = BitStream()
>>> empty = array([], dtype=bool)
>>> stream.write(empty, bool)
>>> stream
<BLANKLINE>
>>> stream.write(array([False]), bool)
>>> stream.write(array([True]) , bool)
>>> stream.write(array([False, True]), bool)
>>> stream
0101
For such data, the type information is also optional:
>>> stream = BitStream()
>>> stream.write(bool_(False))
>>> stream.write(bool_(True))
>>> stream.write(array([], dtype=bool))
>>> stream.write(array([False]))
>>> stream.write(array([True]))
>>> stream.write(array([False, True]))
>>> stream
010101
Actually, many more types can be used as booleans when the type information is explicit. For example, Python and Numpy numeric types are valid arguments: zero is considered false and nonzero numbers are considered true.
>>> stream = BitStream()
>>> stream.write(0.0, bool)
>>> stream.write(1.0, bool)
>>> stream.write(pi , bool)
>>> stream.write(float64(0.0), bool)
>>> stream.write(float64(1.0), bool)
>>> stream.write(float64(pi) , bool)
>>> stream
011011
Strings are also valid arguments, with a boolean value of True
unless
they are empty.
One-dimensional lists and numpy arrays are considered holders of
multiple data, each of which is converted to bool.
>>> bool(b"")
False
>>> bool(b" ")
True
>>> bool(b"A")
True
>>> bool(b"AAA")
True
>>> stream = BitStream()
>>> stream.write(b"", bool)
>>> stream.write(b" ", bool)
>>> stream.write(b"A", bool)
>>> stream.write(b"AAA", bool)
>>> stream
0111
>>> stream = BitStream()
>>> stream.write([b"", b" " , b"A", b"AAA"], bool)
>>> stream
0111
>>> stream = BitStream()
>>> stream.write(array([b"", b" " , b"A", b"AAA"]), bool)
>>> stream
0111
Any other sequence (strings, tuples, lists nested in lists, etc.) is considered as a single datum.
>>> stream = BitStream()
>>> stream.write( (), bool)
>>> stream.write( (0,), bool)
>>> stream.write((0, 0), bool)
>>> stream
011
>>> stream = BitStream()
>>> stream.write([[], [0], [0, 0]], bool)
>>> stream
011
More generally, arbitrary custom "bool-like" instances,
which have a __bool__
method (or __nonzero__
for Python 2)
to handle the conversion to boolean, can also be used:
>>> class BoolLike(object):
... def __init__(self, value):
... self.value = bool(value)
... def __bool__(self):
... return self.value
... __nonzero__ = __bool__
>>> false = BoolLike(False)
>>> true = BoolLike(True)
>>> stream = BitStream()
>>> stream.write(false, bool)
>>> stream.write(true, bool)
>>> stream.write([false, true], bool)
>>> stream
0101
BitStreams¶
A lists of bool is not the most efficient way to represent a binary stream.
The best type is ... an instance of BitStream
of course!
Consider the stream
>>> stream = BitStream()
>>> stream.write(8 * [True], bool)
>>> stream
11111111
To read 2 bits out of stream
as a bitstream, use
>>> stream.read(BitStream, 2)
11
Since this is a common use case, the BitStream
type is assumed by default:
>>> new_stream = stream.read(n=2)
>>> type(new_stream) is BitStream
True
>>> new_stream
11
The simpler code below also works:
>>> new_stream = stream.read(2)
>>> type(new_stream) is BitStream
True
>>> new_stream
11
When the number of items to read is also specified (n=None
),
the read empties the stream:
>>> stream.read()
11
>>> stream
<BLANKLINE>
Strings¶
In Python 2.7, strings are the structure of choice to represent
bytes in memory.
Their type is str
(or equivalently bytes
which is an alias).
Fortunately, it's straightforward to convert strings to
bitstreams: create a stream from the string "ABCD"
with
>>> stream = BitStream(b"ABCD")
To be totally explicit, the code above is equivalent to:
>>> stream = BitStream()
>>> stream.write(b"ABCDE", bytes)
Now, the content of the stream is
>>> stream
0100000101000010010000110100010001000101
It is the binary representation of the ASCII codes of the string characters, as unsigned 8-bit integers (see Integers for more details):
>>> char_codes = [ord(char) for char in "ABCDE"]
>>> char_codes
[65, 66, 67, 68, 69]
>>> stream == BitStream(char_codes, uint8)
True
There is no "single character" type in Python:
characters are represented as bytes of length 1.
To read one or several characters from a bitstream,
use the read
method with the bytes
type:
>>> stream.read(bytes, 1) # doctest: +BYTES
b'A'
>>> stream.read(bytes, 2) # doctest: +BYTES
b'BC'
Without an explicit number of characters, the bitstream is emptied
>>> stream.read(bytes) # doctest: +BYTES
b'DE'
but that works only if the bitstream contains a multiple of 8 bits.
>>> stream = BitStream(42 * [True])
>>> try:
... stream.read(bytes)
... except bitstream.ReadError:
... print("42 is not a multiple of 8")
42 is not a multiple of 8
To accept up to seven trailing bits instead, use the more explicit code:
>>> stream = BitStream(42 * [True])
>>> n = len(stream) // 8
>>> n
5
>>> stream.read(bytes, n) # doctest: +BYTES
b'\xff\xff\xff\xff\xff'
>>> stream
11
Integers¶
First, let's clear something out: since Python integers can be of arbitrary size and there is not a unique convenient and commonly accepted representation for such integers1, you cannot create a bitstream from Python integers by default.
>>> BitStream(1)
Traceback (most recent call last):
...
TypeError: unsupported type 'int'.
>>> BitStream(2**100) # doctest: +ELLIPSIS
Traceback (most recent call last):
...
TypeError: unsupported type ...
>>> BitStream(b"A").read(int)
Traceback (most recent call last):
...
TypeError: unsupported type 'int'.
You need to specify somehow an integer type that determines
what binary representation should be used. For example,
to represent 1
as an unsigned 8bit integer:
>>> BitStream(1, uint8)
00000001
>>> BitStream(uint8(1))
00000001
For integer sequences, there are even more ways to specify the integer type:
>>> BitStream([1,2,3], uint8)
000000010000001000000011
>>> BitStream([uint8(1), uint8(2), uint8(3)])
000000010000001000000011
>>> BitStream(array([1, 2, 3], dtype=uint8))
000000010000001000000011
Bitstream supports six integer types from numpy:
-
unsigned integers:
uint8
,uint16
,uint32
-
signed integers:
int8
,int16
,int32
The representation of unsigned integers is based on their decomposition as powers of 2. For example, since
>>> 13 == 1*2**3 + 1*2**2 + 0*2**1 + 1*2**0
True
we have
>>> BitStream(13, uint8)
00001101
In this scheme, only unsigned integers in the range 0-255 can be represented
as 8bit integers.
Out-of-bounds integers are accepted,
but mapped to the correct range by a modulo 2**8
operation.
Numpy follows this convention
>>> 500 % 2**8
244
>>> uint8(500)
244
and so does bitstream
>>> BitStream(500, uint8)
11110100
>>> BitStream(244, uint8)
11110100
>>> BitStream(500, uint8).read(uint8)
244
The representation of 16bit and 32bit unsigned integers follows the same approach
>>> BitStream(2**10, uint16)
0000010000000000
>>> BitStream(uint16(2**10))
0000010000000000
For the readers that know about this, we use the big-endian representation by default for multi-byte
integers. If you want to use the little-endian convention instead,
NumPy provides the method newbyteorder
for this:
>>> BitStream(uint16(2**10).newbyteorder())
0000000000000100
Finally, for signed integers, we use the two's complement representation
>>> BitStream(0, int8)
00000000
>>> BitStream(1, int8)
00000001
>>> BitStream(-1, int8)
11111111
Floating-Point Numbers¶
Bitstream supports natively the IEE754 double-precision floating-point numbers, which have a well-defined binary representation (see e.g. What every computer scientist should know about binary arithmetic).
>>> stream = BitStream()
>>> stream.write(0.0)
>>> stream.write([1.0, 2.0, 3.0])
>>> stream.write(arange(4.0, 10.0))
>>> len(stream)
640
>>> output = stream.read(float, 10)
>>> type(output) # doctest: +ELLIPSIS
<... 'numpy.ndarray'>
>>> all(output == arange(10.0))
True
Python built-in float
type and NumPy float64
types may be used interchangeably:
>>> BitStream(1.0) == BitStream(1.0, float) == BitStream(1.0, float64)
True
Scalar, lists and arrays of floats are supported:
>>> BitStream(1.0) == BitStream([1.0]) == BitStream(ones(1))
True
The byte order is big endian:
>>> import struct
>>> PI_BE = struct.pack(b">d", pi)
>>> PI_BE # doctest: +BYTES
b'@\t!\xfbTD-\x18'
>>> BitStream(pi) == BitStream(PI_BE)
True
The NumPy newbyteorder
method should be used beforeand
(on a float64
or an array of floats)
to get a little-endian representation instead.
-
Why not simply use the binary decomposition of integers? For example, since
>>> 13 == 1*2**3 + 1*2**2 + 0*2**1 + 1*2**0 True
you may be tempted to not represent
13
as>>> BitStream([True, True, False, True]) 1101
But this scheme is ambiguous if we consider sequences of integers:
1101
could represent the integer13
but also[1,5]
or[3,1]
or[3,0,1]
, etc. ↩