If you are eager to learn some Python and do not know how to start, this post may give you some hints. I will develop a very simple Python package from scratch, exemplifying some Object-oriented Programming (OOP) techniques and concepts, and using a Test-Driven Development (TDD) approach.

The package will provide some classes to deal with binary numbers (see the Rationale section), but remember that it is just a toy project. Nothing in this package has been designed with performance in mind: it wants to be as clear as possible.

Rationale

Binary numbers are rather easy to understand, even if becoming familiar with them requires some time. I expect you to have knowledge of the binary numeral system. If you need to review them just take a look at the Wikipedia entry or one of the countless resources on Internet.

The package we are going to write will provide a class that represents binary numbers (Binary) and a class that represents binary numbers with a given bit size (SizeBinary). They shall provide basic binary operations like logical (and, or, xor), arithmetic (addition, subtraction, multiplication, division), shifts and indexing.

A quick example of what the package shall do:

>>> b = Binary('0101110001')
>>> hex(b)
'0x171'
>>> int(b)
369
>>> b[0]
'1'
>>> b[9]
'0'
>>> b.SHR()
'10111000'

Python and bases

Binary system is just a representation of numbers with base 2, just like hexadecimal (base 16) and decimal (base 10). Python can already natively deal with different bases, even if internally numbers are always stored as decimal integers. Let us check it

>>> a = 5
>>> a
5
>>> a = 0x5
>>> a
5
>>> a = 0b101
>>> a
5
>>> hex(0b101)
'0x5'
>>> bin(5)
'0b101'

As you can see Python understands some common bases out of the box, using the 0x prefix for hexadecimal numbers and the 0b for binary ones (and 0o) for octals). However the number is always printed in its base-10 form (5 in this case). This means however that a binary number cannot be indexed, since integers does not provide support for this operation

>>> 0b101[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not subscriptable

You can also use a different base when converting things to integers, through the base parameter

>>> a = int('101', base=2)
>>> a
5
>>> a = int('10', base=5)
>>> a
5

Test-driven development

Simple tasks are the best way to try and use new development methodologies, so this is a good occasion to start working with the so-called test-driven approach. Test-driven Development (TDD) basically means that the first thing you do when developing is to write some tests, that is programs that use what you are going to develop. The purpose of those programs is to test that your final product complies with a given behaviour. So they provide

  • Documentation for your API: they are examples of use of your package.
  • Regression checks: when you change the code to develop new features they shall not break the behaviour of the previous package versions.
  • TODO list: until all tests run successfully you have something still waiting to be implemented.

I suggest you to follow this post until we wrote some tests (section "Writing some tests" included), then write your own class, trying to make it pass all the tests. This way, actually developing something, you can really learn both TDD and Python. Then you can check your code against mine and perhaps provide a far better solution that the one found by me.

Development environment

Let us set up a simple environment for the upcoming development. First of all create a Python virtual environment

~$ virtualenv -p python3 venv

then activate the virtualenv and install py.test, which is what we will use to write tests.

~$ source venv/bin/activate
(venv)~$ pip install pytest

Then create a directory for the package, the __init__.py file and a directory for tests

(venv)~$ mkdir binary
(venv)~$ cd binary
(venv)~/binary$ touch __init__.py
(venv)~/binary$ mkdir tests

Finally, let us check that everything is correctly working. Py.test does not find any test so it should exit without errors.

(venv)~/binary$ py.test
===================== test session starts =====================
[...]
collected 0 items 

=======================  in 0.00 seconds ======================

where [...] will be filled with information about your execution environment.

Py.test

The approach used by py.test is very straightforward: to test your library you just have to write some functions that use it. Those functions shall run without raising any exception; if a test (a function) runs without raising an exception it passes, otherwise it fails. Let us start writing a very simple test to learn the basic syntax. Create a tests/test_binary.py file and write the following code

def test_first():
    pass

If you run py.test again you shall obtain this result

(venv)~/binary$ py.test
===================== test session starts =====================
[...]
collected 1 items 

tests/test_binary.py .

=================== 1 passed in 0.01 seconds ==================

if you prefer (as I do) you may use the -v verbose switch to get detailed information about what tests have been executed

(venv)~/binary$ py.test -v
===================== test session starts =====================
[...]
collected 1 items 

tests/test_binary.py::test_first PASSED

=================== 1 passed in 0.01 seconds ==================

By default py.test looks for Python files which name starts with test_, and this is why it processes our file tests/test_binary.py. For each file it runs all functions which name, again, starts with test_, and this is why test_first() has been executed.

This latter does nothing, so it runs without raising any exception and the test passes. Let us try to raise an exception

def test_first():
    raise ValueError

which gives the following py.test output

(venv)~/binary$ py.test -v
===================== test session starts =====================
[...]
collected 1 items 

tests/test_binary.py::test_first FAILED

=========================== FAILURES ==========================
__________________________ test_first _________________________

    def test_first():
>       raise ValueError
E       ValueError

tests/test_binary.py:2: ValueError
=================== 1 failed in 0.01 seconds ==================

To easily write tests that raise exceptions when failing we may use the assert Python statement, which shall be followed by an expression. If the expression returns a true value, assert does nothing, otherwise it raises an AssertionError exception. Let us do a quick check in the Python console

>>> assert True
>>> assert False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError
>>>
>>> assert 1 == 1
>>> assert 1 == 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

So usually our tests will contain some code and one or more assertions. I prefer to have just one assertion for each test, except perhaps when testing the very same feature more than once (for example getting various elements from a list). This way you are immediately aware of what assertion raised the exception, that is you immediately know what feature does not work as expected.

Writing some tests

So now we will pretend we have already developed our Binary class and write some tests that check its behaviour. You will find the whole file in the Resources section at the end of the post, I will show here just some snippets.

Initialization

from binary import Binary

def test_binary_init_int():
    binary = Binary(6)
    assert int(binary) == 6

This is our first real test. First of all we import the class from the binary.py file (which doesn't exists yet). The test_binary_init_int() function shall (as the name suggests) initialize a Binary with an integer. The assertion checks that the newly created binary variable has a consistent integer representation, which is the number we used to initialize it.

We want to be able to initialize a Binary with a wide range of values: bit strings ('110'), binary strings ('0b110'), hexadecimal strings ('0x6'), hexadecimal values (0x6), lists of integers ([1,1,0]) and list of strings (['1','1','0']). The following tests check all those cases

def test_binary_init_bitstr():
    binary = Binary('110')
    assert int(binary) == 6

def test_binary_init_binstr():
    binary = Binary('0b110')
    assert int(binary) == 6

def test_binary_init_hexstr():
    binary = Binary('0x6')
    assert int(binary) == 6

def test_binary_init_hex():
    binary = Binary(0x6)
    assert int(binary) == 6

def test_binary_init_intseq():
    binary = Binary([1,1,0])
    assert int(binary) == 6

def test_binary_init_strseq():
    binary = Binary(['1','1','0'])
    assert int(binary) == 6

Finally, let us check that our Binary class cannot be initialized with a negative number. I decided to represent negative numbers through the two's complement technique, which however requires a predefined bit length. So for simple binaries I just discard negative numbers.

import pytest

[...]

def test_binary_init_negative():
    with pytest.raises(ValueError):
        binary = Binary(-4)

As you can see, now we have to check that our class raises an exception, but if we make the class raise it the test will fail. To let the test pass we shall check that the exception is raised but suppress it, and this can be done with pytest.raises, which is a suitable context manager.

Conversions

I want to check that my binary numbers can be correctly converted to integers (through int()), binary strings (through bin()), hexadecimals (through hex()) and to strings (through str()). I want the string representation to be a plain sequence of zeros and ones, that is the binary string representation without the 0b prefix.

Some examples (check the full code for the whole set of tests)

def test_binary_int():
    binary = Binary(6)
    assert int(binary) == 6

def test_binary_str():
    binary = Binary(6)
    assert str(binary) == '110'

Project structure

I warmly suggest to check this page for a project layout that allows py.test to work flawlessly. To avoid putting too many things in this post I am going to run py.test with a custom PYTHONPATH to make it correctly import the code. However, please remember that this setup is just a trick for simplicity's sake. Check this detailed post by Jeff Knupp to learn a lot about Python packaging and project layouts.

Writing the class

Trying to run the tests at this point just returns a big failure due to an import error, since the binary.py module does not exists yet.

(venv)~/binary$ PYTHONPATH=. py.test -v
===================== test session starts =====================
[...]
collected 0 items / 1 errors
============================ ERRORS ===========================
____________ ERROR collecting tests/test_binary.py ____________
tests/test_binary.py:2: in <module>
    from binary import Binary
E   ImportError: No module named 'binary'

=================== 1 error in 0.01 seconds ===================

Let us create the file binary.py in the project root (i.e. inside the binary/ directory you created, together with __init__.py and tests/) and start creating the Binary class.

class Binary:
    pass

Now running py.test shows that all tests can be run and that all them fail (I will just show the first one)

(venv)~/binary$ PYTHONPATH=. py.test -v
===================== test session starts =====================
[...]
collected 13 items 

tests/test_binary.py::test_binary_init_int FAILED
[...]

============================ FAILURES ==========================
_____________________ test_binary_init_int _____________________

    def test_binary_init_int():
>       binary = Binary(6)
E       TypeError: object() takes no parameters

tests/test_binary.py:5: TypeError
[...]
=================== 13 failed in 0.03 seconds ==================

So now you can start writing code and use your test battery to check if it works as expected. Obviously "as expected" means that all the tests you wrote pass, but this does not imply you covered all cases. TDD is an iterative methodology: when you find a bug or a missing feature you first write a good test or a set of tests that address the matter and then produce some code that make the tests pass.

At this point you are warmly encouraged to write the code by yourself and to check your product with the given battery of tests. Just download the first version of the tests file (test_binary_ver1.py) and put it in the tests/ directory. Then read it carefully to understand what the requirements are and start writing the class. When you think you are done with a part of it just run the tests and see if everything works well, then move on.

My solution

The most complex part of the class is the initialization, since I want it to accept a wide range of data types. Basically we have to deal with sequences (strings and lists) or with plain values. The latter ones shall be convertible to an integer, otherwise trying to initialize a binary number with them makes no sense. Since binary numbers are just a representation of integers I decided to store the value in the class as an integer inside the self._value attribute. Pay attention that this decision means that all leading zeros will be stripped from the number, i.e., Binary('000101') is equal to Binary('101'). This will be important for indexing and slicing.

This is the code

import collections

class Binary:
    def __init__(self, value=0):
        if isinstance(value, collections.Sequence):
            if len(value) > 2 and value[0:2] == '0b':
                self._value = int(value, base=2)
            elif len(value) > 2 and value[0:2] == '0x':
                self._value = int(value, base=16)
            else:
                self._value = int(''.join([str(i) for i in value]), base=2)
        else:
            try:
                self._value = int(value)
                if self._value < 0:
                    raise ValueError("Binary cannot accept negative numbers. Use SizedBinary instead")
            except ValueError:
                raise ValueError("Cannot convert value {} to Binary".format(value))

    def __int__(self):
        return self._value

Running the tests I get 4 failed, 9 passed in 0.02 seconds, which is a good score. The tests that still fail are test_binary_eq, test_binary_bin, test_binary_str and test_binary_hex. Since I still wrote no code for the conversions those failures were expected.

Let us review the code I wrote. I make use of the collections module to tell apart sequences from plain values in a pythonic way. If you do not know what Abstract Base Classes are, please check this post and the collections.abc module documentation.

WARNING: if you are using Python 3.2 or less you will find those classes in the collections module instead of collections.abc, which has been introduced with Python 3.3

Basically through isinstance(value, collections.abc.Sequence) we check that the incoming value behaves like a sequence, which is different from saying that it is a list, a string, or other sequences. The first case covers an incoming string in the form 0bXXXXX, which is converted to an integer through the int() function. The second case is the same but for hexadecimal strings in the form 0xXXXXX.

The third case covers a generic sequence of values that shall be individually convertible to 0 or 1. The code converts each element of the sequence to a string, joins them in a single string and converts it with base 2. This covers the case of a string of zeros and ones and the case of an iterable of integers, like a list for example.

If the incoming value is not a sequence it shall be convertible to an integer, which is exactly what the try part does. Here we also check if the value is negative and raise a suitable exception.

Finally, the __int__() method (one of the Python magic methods) is automatically called when we apply int() to our binary, just like we do in a lot of the tests. This method is basically the one responsible of providing a conversion to integer of a given class. In this case we just have to return the value we stored internally.

Obviously I didn't write this code in a single burst. I had to run the tests more than once to tune my code.

I already wrote the method that performs the conversion to an integer. Some tests however (namely test_binary_bin and test_binary_hex) still fail with the error message TypeError: 'Binary' object cannot be interpreted as an integer.

According to the official documentation, "If x is not a Python int object, it has to define an __index__() method that returns an integer." so this is what we are missing. As for __index__(), the documentation states that "In order to have a coherent integer type class, when __index__() is defined __int__() should also be defined, and both should return the same value."

So we just have to add

def __index__(self):
    return self.__int__()

inside the class, and we get two more successful tests.

To make test_binary_str pass we have to provide a magic method that converts the object into a string, which is

def __str__(self):
    return bin(self)[2:]

It makes use of the internal Python algorithm provided by bin() stripping the 0b prefix.

The last failing test is test_binary_eq which tests for equality between two Binary objects

def test_binary_eq():
    assert Binary(4) == Binary(4)

Out of the box, Python compares objects on a very low level, just checking if the two references point to the same object in memory. To make it smarter we have to provide the __eq__() method

def __eq__(self, other):
    return int(self) == int(other)

And now all the tests run successfully.

Binary operations

Now it is time to add new features to our Binary class. As already said, the TDD methodology wants us to first write the tests, then to write the code. Our class is missing some basic arithmetic and binary operations so the tests are (again you will find all tests in the attached file)

def test_binary_addition_int():
    assert Binary(4) + 1 == Binary(5)

def test_binary_addition_binary():
    assert Binary(4) + Binary(5) == Binary(9)

def test_binary_division_int():
    assert Binary(20) / 4 == Binary(5)

def test_binary_division_rem_int():
    assert Binary(21) / 4 == Binary(5)

def test_binary_get_bit():
    binary = Binary('0101110001')
    assert binary[0] == '1'
    assert binary[5] == '1'

def test_binary_not():
    assert ~Binary('1101') == Binary('10')

def test_binary_and():
    assert Binary('1101') & Binary('1') == Binary('1')

def test_binary_shl_pos():
    assert Binary('1101') << 5 == Binary('110100000')

The first two tests check that adding both an integer and a Binary to a Binary works as expected. The division checks two different cases: because integer division produces a remainder, which is not considered here. The test_binary_get_bit function tests indexing and is one of the few tests that contain more than one assertion. Please note that binary indexing, unlike standard sequence indexing in Python, starts from the rightmost element.

Bitwise and arithmetic operations are implemented using Python magic methods. Please check the official documentation for a complete list of operators and related methods.

Now download the file test_binary_ver2.py and use it to develop the new features.

The code that implements the required behaviour is

    def __and__(self, other):
        return Binary(self._value & Binary(other)._value)

    def __or__(self, other):
        return Binary(self._value | Binary(other)._value)

    def __xor__(self, other):
        return Binary(self._value ^ Binary(other)._value)

    def __lshift__(self, pos):
        return Binary(self._value << pos)

    def __rshift__(self, pos):
        return Binary(self._value >> pos)

    def __add__(self, other):
        return Binary(self._value + Binary(other)._value)

    def __sub__(self, other):
        return Binary(self._value - Binary(other)._value)

    def __mul__(self, other):
        return Binary(self._value * Binary(other)._value)

    def __truediv__(self, other):
        return Binary(int(self._value / Binary(other)._value))

    def __getitem__(self, item):
        return str(self)[-(item + 1)]

    def __invert__(self):
        return Binary([abs(int(i) - 1) for i in str(self)])

All methods are straightforward, except perhaps __getitem__() and __invert__(). The __invert__() method is called when performing a bitwise NOT operation (~) and is implemented avoiding negative numbers. A simple solution is to convert the Binary into a string and then reverse every digit (abs(int(i) - 1) returns 0 for '1' and 1 for '0').

The __getitem__() method is required to provide indexing, and as said shall index starting from the rightmost element and moving left. This is the reason of the -(item + 1) index. Please notice that this method will be changed later to provide full support to slicing.

Slicing

I want Binary to support slicing, just like lists. The difference between lists and my Binary type is that for the latter indexes start from the rightmost element. So when I get bits 3:7 of an 8-bit Binary I obtain the four leftmost ones. The desired behaviour is thus exemplified by

>>> b = Binary('01101010')
>>> b[4:7]
<binary.Binary object at 0x...> (110)
>>> b[1:3]
<binary.Binary object at 0x...> (101)

that can immediately be converted into tests

def test_binary_slice():
    assert Binary('01101010')[0:3] == Binary('10')
    assert Binary('01101010')[1:4] == Binary('101')
    assert Binary('01101010')[4:] == Binary('110')

You will find the new tests in the test_binary_ver3.py file.

Running the tests shows that the __getitem__() function does not provide support for slice objects, raising the exception TypeError: unsupported operand type(s) for +: 'slice' and 'int'. Checking the documentation of __getitem__() we notice that it shall manage both integers and slice objects (this is the missing part) and that it shall raise IndexError for illegal indexes to make for loops work. So I immediately add a test for this rule

def test_binary_illegal_index():
    with pytest.raises(IndexError):
        Binary('01101010')[7]

and other tests to match the correct behaviour (you will find them in the full code). Remember that leading zeros are stripped, so getting the index number 7 shall fail, since the binary number has just 7 digits, even if the incoming string has 8 characters.

Instead of trying to reimplement the whole list slicing behaviour with reversed indexes, it is much simpler to make use of it. We can just take the string version of our Binary, slice its reversed version and return the result (reversed again). The result of the slice can be a single element, however, so we have to check it against the Sequence class

def __getitem__(self, key):
        reversed_list = [int(i) for i in reversed(str(self))]
        sliced = reversed_list.__getitem__(key)
        if isinstance(sliced, collections.abc.Sequence):
            if len(sliced) > 0:
                return Binary([i for i in reversed(sliced)])
            else:
                return Binary(0)
        else:
            return Binary(sliced)

The first list comprehension returns a list of integers with the reversed version of the binary number (i.e. from Binary('01101') to [1, 0, 1, 1], remember that leading zeros are stripped). Then we delegate the slice to the list type, calling __getitem__(). The result of this call may be a sequence or a single element (integer), so we tell apart the two cases. In the first case we reverse the result again, in the second case we just return it. In both cases we create a Binary object. The check on the length of the list must be introduced because the slice may return no elements, but the Binary class does not accept an empty list.

Splitting binaries

The last feature I want to add is a split() function that divides the binary number in two binaries. The rightmost one shall have the given size in bits, while the leftmost just contains the remaining bits. The following tests exemplify the behaviour of split()

def test_binary_split_no_remainder():
    assert Binary('110').split(4) == (0, Binary('110'))

def test_binary_split_remainder():
    assert Binary('110').split(2) == (1, Binary('10'))

def test_binary_split_exact():
    assert Binary('100010110').split(9) == (0, Binary('100010110'))

def test_binary_split_leading_zeros():
    assert Binary('100010110').split(8) == (1, Binary('10110'))

These new tests are in the file test_binary_ver4.py.

The code that implements it is

def split(self, bits):
    return (self[bits:], self[:bits])

Resources

The code developed in this post can be found here:

  • The full code of the Binary class is here.
  • The last version of the tests file is here.

Final words

If you tried and write your own class before checking my solution I'm sure you experienced both some frustration when tests failed and a great joy when they finally passed. I'm also sure that you could appreciate the simplicity of TDD and perhaps understand why so many programmes adopt it.

In the next post I will guide you through the addition of the SizeBinary class, again following the TDD methodology.

Updates

2015-05-15 As suggested by Jacob Zimmerman in this comment the class lacks some methods to be a complete numeric class, most notably __radd__ and __rsub__. Indeed, my first goal was to show TDD so I did not add the whole series of reflected arithmetic operations. You will find all those methods here and try to implement them following the methodology shown in the post. Jacob also suggested to shorten the __str__() implementation, and I fixed it. Thanks Jacob!

2015-09-22 Christopher McCormack spotted an error about binary indexing: "starts from the leftmost element" should be "starts from the rightmost element". Now it has been fixed. Thanks Christopher!

2016-12-20 dndln found an error in the 'Binary operations' section. The test_binary_get_bit() test included the assertion assert binary[9] == '0' which cannot be successful, since leading zeros are stripped, as stated in the previous section 'My Solution'. The attached code files were already correct. Thanks a lot for pointing it out! `

Feedback

Feel free to use the blog Google+ page to comment the post. The GitHub issues page is the best place to submit corrections.