This post is available as an IPython Notebook here

Good Morning, Polymorphism

The term polymorphism, in the OOP lingo, refers to the ability of an object to adapt the code to the type of the data it is processing.

Polymorphism has two major applications in an OOP language. The first is that an object may provide different implementations of one of its methods depending on the type of the input parameters. The second is that code written for a given type of data may be used on data with a derived type, i.e. methods understand the class hierarchy of a type.

In Python polymorphism is one of the key concepts, and we can say that it is a built-in feature. Let us deal with it step by step.

First of all, you know that in Python the type of a variable is not explicitly declared. Beware that this does not mean that Python variables are untyped. On the contrary, everything in Python has a type, it just happens that the type is implicitly assigned. If you remember the last paragraph of the previous post, I stated that in Python variables are just pointers (using a C-like nomenclature), in other words they just tell the language where in memory a variable has been stored. What is stored at that address is not a business of the variable.

>>> a = 5
>>> a
5
>>> type(a)
<class 'int'>
>>> hex(id(a))
'0x83fe540'
>>> a = 'five'
>>> a
'five'
>>> type(a)
<class 'str'>
>>> hex(id(a))
'0xb70d6560'

This little example shows a lot about the Python typing system. The variable a is not statically declared, after all it can contain only one type of data: a memory address. When we assign the number 5 to it, Python stores in a the address of the number 5 (0x83fe540 in my case, but your result will be different). The type() built-in function is smart enough to understand that we are not asking about the type of a (which is always a reference), but about the type of the content. When you store another value in a, the string 'five', Python shamelessly replaces the previous content of the variable with the new address.

So, thanks to the reference system, Python type system is both strong and dynamic. The exact definition of those two concepts is not universal, so if you are interested be ready to dive into a broad matter. However, in Python, the meaning of those two words is the following:

  • type system is strong because everything has a well-defined type that you can check with the type() built-in function
  • type system is dynamic since the type of a variable is not explicitly declared, but changes with the content

Onward! We just scratched the surface of the whole thing.

To explore the subject a little more, try to define the simplest function in Python (apart from an empty function)

def echo(a):
    return a

The function works as expected, just echoes the given parameter

>>> echo(5)
5
>>> echo('five')
'five'

Pretty straightforward, isn't it? Well, if you come from a statically compiled language such as C or C++ you should be at least puzzled. What is a? I mean: what type of data does it contain? Moreover, how can Python know what it is returning if there is no type specification?

Again, if you recall the references stuff everything becomes clear: that function accepts a reference and returns a reference. In other words we just defined a sort of universal function, that does the same thing regardless of the input.

This is exactly the problem that polymorphism wants to solve. We want to describe an action regardless of the type of objects, and this is what we do when we talk among humans. When you describe how to move an object by pushing it, you may explain it using a box, but you expect the person you are addressing to be able to repeat the action even if you need to move a pen, or a book, or a bottle.

There are two main strategies you can apply to get code that performs the same operation regardless of the input types.

The first approach is to cover all cases, and this is a typical approach of procedural languages. If you need to sum two numbers that can be integers, float or complex, you just need to write three sum() functions, one bound to the integer type, the second bound to the float type and the third bound to the complex type, and to have some language feature that takes charge of choosing the correct implementation depending on the input type. This logic can be implemented by a compiler (if the language is statically typed) or by a runtime environment (if the language is dynamically typed) and is the approach chosen by C++. The disadvantage of this solution is that it requires the programmer to forecast all the possible situations: what if I need to sum an integer with a float? What if I need to sum two lists? (Please note that C++ is not so poorly designed, and the operator overloading technique allows to manage such cases, but the base polymorphism strategy of that language is the one exposed here).

The second strategy, the one implemented by Python, is simply to require the input objects to solve the problem for you. In other words you ask the data itself to perform the operation, reversing the problem. Instead of writing a bunch of functions that sum all the possible types in every possible combination you just write one function that requires the input data to sum, trusting that they know how to do it. Does it sound complex? It is not.

Let's look at the Python implementation of the + operator. When we write c = a + b, Python actually executes c = a.__add__(b). As you can see the sum operation is delegated to the first input variable. So if we write

def sum(a, b):
    return a + b

there is no need to specify the type of the two input variables. The object a (the object contained in the variable a) shall be able to sum with the object b. This is a very beautiful and simple implementation of the polymorphism concept. Python functions are polymorphic simply because they accept everything and trust the input data to be able to perform some actions.

Let us consider another simple example before moving on. The built-in len() function returns the length of the input object. For example

>>> l = [1, 2, 3]
>>> len(l)
3
>>> s = "Just a sentence"
>>> len(s)
15

As you can see it is perfectly polymorphic: you can feed both a list or a string to it and it just computes its length. Does it work with any type? let's check

>>> d = {'a': 1, 'b': 2}
>>> len(d)
2
>>> i = 5
>>> len(i)
Traceback  (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'int' has no len()

Ouch! Seems that the len() function is smart enough to deal with dictionaries, but not with integers. Well, after all, the length of an integer is not defined.

Indeed this is exactly the point of Python polymorphism: the integer type does not define a length operation. While you blame the len() function, the int type is at fault. The len() function just calls the __len__() method of the input object, as you can see from this code

>>> l.__len__()
3
>>> s.__len__()
15
>>> d.__len__()
2
>>> i.__len__()
Traceback  (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute '__len__'

Very straightforward: the 'int' object does not define any __len__() method.

So, to sum up what we discovered until here, I would say that Python polymorphism is based on delegation. In the following sections we will talk about the EAFP Python principle, and you will see that the delegation principle is somehow ubiquitous in this language.

Type Hard

Another real-life concept that polymorphism wants to bring into a programming language is the ability to walk the class hierarchy, that is to run code on specialized types. This is a complex sentence to say something we are used to do every day, and an example will clarify the matter.

You know how to open a door, it is something you learned in your early years. Under an OOP point of view you are an object (sorry, no humiliation intended) which is capable of interacting with a wood rectangle rotating on hinges. When you can open a door, however, you can also open a window, which, after all, is a specialized type of wood-rectangle-with-hinges, hopefully with some glass in it too. You are also able to open the car door, which is also a specialized type (this one is a mix between a standard door and a window). This shows that, once you know how to interact with the most generic type (basic door) you can also interact with specialized types (window, car door) as soon as they act like the ancestor type (e.g. as soon as they rotate on hinges).

This directly translates into OOP languages: polymorphism requires that code written for a given type may also be run on derived types. For example, a list (a generic list object, not a Python one) that can contain "numbers" shall be able to accept integers because they are numbers. The list could specify an ordering operation which requires the numbers to be able to compare each other. So, as soon as integers specify a way to compare each other they can be inserted into the list and ordered.

Statically compiled languages shall provide specific language features to implement this part of the polymorphism concept. In C++, for example, the language needs to introduce the concept of pointer compatibility between parent and child classes.

In Python there is no need to provide special language features to implement subtype polymorphism. As we already discovered Python functions accept any variable without checking the type and rely on the variable itself to provide the correct methods. But you already know that a subtype must provide the methods of the parent type, either redefining them or through implicit delegation, so as you can see Python implements subtype polymorphism from the very beginning.

I think this is one of the most important things to understand when working with this language. Python is not really interested in the actual type of the variables you are working with. It is interested in how those variables act, that is it just wants the variable to provide the right methods. So, if you come from statically typed languages, you need to make a special effort to think about acting like instead of being. This is what we called "duck typing".

Time to do an example. Let us define a Room class

class Room:
    def __init__(self, door):
        self.door = door

    def open(self):
        self.door.open()

    def close(self):
        self.door.close()

    def is_open(self):
        return self.door.is_open()

A very simple class, as you can see, just enough to exemplify polymorphism. The Room class accepts a door variable, and the type of this variable is not specified. Duck typing in action: the actual type of door is not declared, there is no "acceptance test" built in the language. Indeed, the incoming variable shall export the following methods that are used in the Room class: open(), close(), is_open(). So we can build the following classes

class Door:
    def __init__(self):
        self.status = "closed"

    def open(self):
        self.status = "open"

    def close(self):
        self.status = "closed"

    def is_open(self):
        return self.status == "open"


class BooleanDoor:
    def __init__(self):
        self.status = False

    def open(self):
        self.status = True

    def close(self):
        self.status = False

    def is_open(self):
        return self.status

Both represent a door that can be open or closed, and they implement the concept in two different ways: the first class relies on strings, while the second leverages booleans. Despite being two different types, both act the same way, so both can be used to build a Room object.

>>> door = Door()
>>> bool_door = BooleanDoor()
>>> room = Room(door)
>>> bool_room = Room(bool_door)

>>> room.open()
>>> room.is_open()
True
>>> room.close()
>>> room.is_open()
False

>>> bool_room.open()
>>> bool_room.is_open()
True
>>> bool_room.close()
>>> bool_room.is_open()
False

File Like Us

File-like objects are a concrete and very useful example of polymorphism in Python. A file-like object is a class (or the instance of a class) that acts like a file, i.e. it provides those methods a file object exposes.

Say for example that you code a class that parses an XML tree, and that you expect the XML code to be contained in a file. So your class accepts a file in its __init__() method, and reads the content from it

class XMLReader:
    def __init__(xmlfile):
        xmlfile.open()
        self.content = xmlfile.read()
        xmlfile.close()

[...]

The class works well until your application shall be modified to receive XML content from a network stream. To use the class without modifying it you shall write the stream in a temporary file and load the latter, but this sounds a little overkill. So you plan to change the class to accept a string, but this way you shall change every single code that uses the class to read a file, since now you shall open, read and close the file on your own, outside the class.

Polymorphism offers a better way. Why not storing the incoming stream inside an object that acts like a file, even if it is not an actual one? If you check the io module you will find that such an object has been already invented and provided in the standard Python library.

Other very useful file-like classes are those contained in the gzip, bz2, and zipfile modules (just to name some of the most used), which provide objects that allow you to manage compressed files just like plain files, hiding the decompression/compression machinery.

Unforgiveness

EAFP is a Python acronym that stands for easier to ask for forgiveness than permission. This coding style is highly pushed in the Python community because it completely relies on the duck typing concept, thus fitting well with the language philosophy.

The concept behind EAFP is fairly easy: instead of checking if an object has a given attribute or method before actually accessing or using it, just trust the object to provide what you need and manage the error case. This can be probably better understood by looking at some code. According to EAFP, instead of writing

if hasattr(someobj, 'open'):
    [...]
else:
    [...]

you shall write

try:
    someobj.open()
    [...]
except AttributeError:
    [...]

As you can see, the second snippet directly uses the method and deals with the possible AttributeError exception (by the way: managing exceptions is one of the top Black Magic Topics in Python, more on it in a future post. A very quick preview: I think we may learn something from Erlang - check this).

Why is this coding style pushed so much in the Python community? I think the main reason is that through EAFP you think polymorphically: you are not interested in knowing if the object has the open attribute, you are interested in knowing if the object can satisfy your request, that is to perform the open() method call.

Updates

2018-12-21: Adriaan Beiertz spotted an error in the BooleanDoor class. As the default status of the Door class is closed, the BooleanDoor should reflect this with a False status instead of a True one. Thanks Adriaan!

Movie Trivia

Section titles come from the following movies: Good Morning, Vietnam (1987), Die Hard (1988), Spies Like Us (1985), Unforgiven (1992).

Sources

You will find a lot of documentation in this Reddit post. Most of the information contained in this series come from those sources.

Feedback

The GitHub issues page is the best place to submit corrections.