The Digital Cat - A game of tokens: write an interpreter in Python with TDD

Introduction¶

In the first three parts of this series of posts we developed together a calculator using a pure TDD methodology. In the previous post we added support for variables.

In this new post we will first add the exponentiation operation. The operator will be challenging because it has a high priority, so we will need to spice up the peek functions to look at multiple tokens.

Then I will show you how I performed a refactoring of the code introducing a new version of the lexer that greatly simplifies the code of the parser.

Level 15 - Exponentiation¶

That is power. - Conan the Barbarian (1982)

The exponentiation operation is simple, and Python uses the double star operator to represent it

>>> 2**3
8

The main problem that we will face implementing it is the priority of such an operation. Traditionally, this operator has precedence on the basic arithmetic operations (sum, difference, multiplication, division). So if I write

>>> 1 + 2 ** 3
9

Python correctly computes 1 + (2 ** 3) and not (1 + 2) ** 3. As we did with multiplication and division, then, we will need to create a specific step to parse this operation.

In small calc, the exponentiation will be associated to the symbol ^, so 2^3 will mean 2 to the power of 3 (2**3 in Python).

Lexer¶

The lexer has a simple task, that of recognising the symbol '^' as a LITERAL token. The test goes into tests/test_calc_lexer.py

def test_get_tokens_understands_exponentiation():
    l = clex.CalcLexer()

    l.load('2 ^ 3')

    assert l.get_tokens() == [
        token.Token(clex.INTEGER, '2'),
        token.Token(clex.LITERAL, '^'),
        token.Token(clex.INTEGER, '3'),
        token.Token(clex.EOL),
        token.Token(clex.EOF)
    ]

Does your code already pass the test? If yes, why?

Parser¶

It's time to test the proper parsing of the exponentiation operation. Add this test to tests/test_calc_parser.py

def test_parse_exponentiation():
    p = cpar.CalcParser()
    p.lexer.load("2 ^ 3")

    node = p.parse_exponentiation()

    assert node.asdict() == {
        'type': 'exponentiation',
        'left': {
            'type': 'integer',
            'value': 2
        },
        'right': {
            'type': 'integer',
            'value': 3
        },
        'operator': {
            'type': 'literal',
            'value': '^'
        }
    }

As you can see this test checks directly the method parse_exponentiation, so you just need to properly implement that, at this stage.

To allow the use of the exponentiation operator '^' in the calculator, however, we have to integrate it with the rest of the parse functions, so we will add three tests to the same file. The first one tests that the natural priority of the exponentiation operator is higher than the multiplication

def test_parse_exponentiation_with_other_operators():
    p = cpar.CalcParser()
    p.lexer.load("3 * 2 ^ 3")

    node = p.parse_term()

    assert node.asdict() == {
        'type': 'binary',
        'operator': {
            'type': 'literal',
            'value': '*'
        },
        'left': {
            'type': 'integer',
            'value': 3
        },
        'right': {
            'type': 'exponentiation',
            'left': {
                'type': 'integer',
                'value': 2
            },
            'right': {
                'type': 'integer',
                'value': 3
            },
            'operator': {
                'type': 'literal',
                'value': '^'
            }
        }
    }

The second one checks that the parentheses still change the priority

def test_parse_exponentiation_with_parenthesis():
    p = cpar.CalcParser()
    p.lexer.load("(3 + 2) ^ 3")

    node = p.parse_term()

    assert node.asdict() == {
        'type': 'exponentiation',
        'operator': {
            'type': 'literal',
            'value': '^'
        },
        'left': {
            'type': 'binary',
            'operator': {
                'type': 'literal',
                'value': '+'
            },
            'left': {
                'type': 'integer',
                'value': 3
            },
            'right': {
                'type': 'integer',
                'value': 2
            }
        },
        'right': {
            'type': 'integer',
            'value': 3
        }
    }

And the third one checks that unary operators still have a higher priority than the exponentiation operator

def test_parse_exponentiation_with_negative_base():
    p = cpar.CalcParser()
    p.lexer.load("-2 ^ 2")

    node = p.parse_exponentiation()

    assert node.asdict() == {
        'type': 'exponentiation',
        'operator': {
            'type': 'literal',
            'value': '^'
        },
        'left': {
            'type': 'unary',
            'operator': {
                'type': 'literal',
                'value': '-'
            },
            'content': {
                'type': 'integer',
                'value': 2
            }
        },
        'right': {
            'type': 'integer',
            'value': 2
        }
    }

My advice is to add the first test and to try and pass that one changing the code. If your change doesn't touch too much of the existing parse methods, chances are that the following two tests will pass as well.

Visitor¶

Last, we need to properly expose the exponentiation operation in the CLI, which means to change the Visitor in order to support nodes of type 'exponentiation. The test that we need to add to tests/test_calc_visitor.py is

def test_visitor_exponentiation():
    ast = {
        'type': 'exponentiation',
        'left': {
            'type': 'integer',
            'value': 2
        },
        'right': {
            'type': 'integer',
            'value': 3
        },
        'operator': {
            'type': 'literal',
            'value': '^'
        }
    }

    v = cvis.CalcVisitor()
    assert v.visit(ast) == (8, 'integer')

And the change to the CalcVisitor class should be very easy as we need to simply process a new type of node.

Solution¶

The lexer can process the exponentiation operator ^ out of the box as a LITERAL token, so no changes to the code are needed.

The test test_parse_exponentiation can be passed adding a PowerNode class.

Note: After I wrote and committed the solution I realised that the class called PowerNode should have been called ExponentiationNode, the former being a leftover of a previous incorrect nomenclature. I will eventually fix it in one of the refactoring steps, trying to convert a mistake into a good example of TDD.

class PowerNode(BinaryNode):
    node_type = 'exponentiation'

and a method parse_exponentiation to the parser

    def parse_exponentiation(self):
        left = self.parse_factor()

        next_token = self.lexer.peek_token()

        if next_token.type == clex.LITERAL and next_token.value == '^':
            operator = self._parse_symbol()
            right = self.parse_exponentiation()

            return PowerNode(left, operator, right)

        return left

This allows the parser to explicitly parse the exponentiation operation, but when the operation is mixed with others the parser doesn't know how to deal with it, as parse_exponentiation is not called by any other method.

To pass the test_parse_exponentiation_with_other_operators test we need to change the method parse_term and call parse_exponentiation instead of parse_factor

    def parse_term(self):
        left = self.parse_exponentiation()

        next_token = self.lexer.peek_token()

        while next_token.type == clex.LITERAL\
                and next_token.value in ['*', '/']:
            operator = self._parse_symbol()
            right = self.parse_exponentiation()

            left = BinaryNode(left, operator, right)

the full code of the CalcParser class is now

class CalcParser:

    def __init__(self):
        self.lexer = clex.CalcLexer()

    def _parse_symbol(self):
        t = self.lexer.get_token()
        return LiteralNode(t.value)

    def parse_integer(self):
        t = self.lexer.get_token()
        return IntegerNode(t.value)

    def _parse_variable(self):
        t = self.lexer.get_token()
        return VariableNode(t.value)

    def parse_factor(self):
        next_token = self.lexer.peek_token()

        if next_token.type == clex.LITERAL and next_token.value in ['-', '+']:
            operator = self._parse_symbol()
            factor = self.parse_factor()
            return UnaryNode(operator, factor)

        if next_token.type == clex.LITERAL and next_token.value == '(':
            self.lexer.discard_type(clex.LITERAL)
            expression = self.parse_expression()
            self.lexer.discard_type(clex.LITERAL)
            return expression

        if next_token.type == clex.NAME:
            t = self.lexer.get_token()
            return VariableNode(t.value)

        return self.parse_integer()

    def parse_exponentiation(self):
        left = self.parse_factor()

        next_token = self.lexer.peek_token()

        if next_token.type == clex.LITERAL and next_token.value == '^':
            operator = self._parse_symbol()
            right = self.parse_exponentiation()

            return PowerNode(left, operator, right)

        return left

    def parse_term(self):
        left = self.parse_exponentiation()

        next_token = self.lexer.peek_token()

        while next_token.type == clex.LITERAL\
                and next_token.value in ['*', '/']:
            operator = self._parse_symbol()
            right = self.parse_exponentiation()

            left = BinaryNode(left, operator, right)

            next_token = self.lexer.peek_token()

        return left

    def parse_expression(self):
        left = self.parse_term()

        next_token = self.lexer.peek_token()

        while next_token.type == clex.LITERAL\
                and next_token.value in ['+', '-']:
            operator = self._parse_symbol()
            right = self.parse_term()

            left = BinaryNode(left, operator, right)

            next_token = self.lexer.peek_token()

        return left

    def parse_assignment(self):
        variable = self._parse_variable()
        self.lexer.discard(token.Token(clex.LITERAL, '='))
        value = self.parse_expression()

        return AssignmentNode(variable, value)

    def parse_line(self):
        try:
            self.lexer.stash()
            return self.parse_assignment()
        except clex.TokenError:
            self.lexer.pop()
            return self.parse_expression()

The given test test_visitor_exponentiation requires the CalcVisitor to parse nodes of type exponentiation. The code required to do this is

        if node['type'] == 'exponentiation':
            lvalue, ltype = self.visit(node['left'])
            rvalue, rtype = self.visit(node['right'])

            return lvalue ** rvalue, ltype

as a final case for the CalcVisitor class. The full code of the class is is now

class CalcVisitor:

    def __init__(self):
        self.variables = {}

    def isvariable(self, name):
        return name in self.variables

    def valueof(self, name):
        return self.variables[name]['value']

    def typeof(self, name):
        return self.variables[name]['type']

    def visit(self, node):
        if node['type'] == 'integer':
            return node['value'], node['type']

        if node['type'] == 'variable':
            return self.valueof(node['value']), self.typeof(node['value'])

        if node['type'] == 'unary':
            operator = node['operator']['value']
            cvalue, ctype = self.visit(node['content'])

            if operator == '-':
                return - cvalue, ctype

            return cvalue, ctype

        if node['type'] == 'binary':
            lvalue, ltype = self.visit(node['left'])
            rvalue, rtype = self.visit(node['right'])

            operator = node['operator']['value']

            if operator == '+':
                return lvalue + rvalue, rtype
            elif operator == '-':
                return lvalue - rvalue, rtype
            elif operator == '*':
                return lvalue * rvalue, rtype
            elif operator == '/':
                return lvalue // rvalue, rtype

        if node['type'] == 'assignment':
            right_value, right_type = self.visit(node['value'])
            self.variables[node['variable']] = {
                'value': right_value,
                'type': right_type
            }

            return None, None

Intermezzo - Refactoring with tests¶

See? You just had to see it in context. - Scrooged (1988)

So our little project is growing, and the TDD methodology we are following gives us plenty of confidence in what we did. There as for sure bugs we are not aware of, but we are sure that the cases that we tested are correctly handled by our code.

As happens in many projects at a certain point it's time for refactoring. We implemented solutions to the problems that we found along the way, but are we sure we avoided duplicating code, that we chose the best solution for some algorithms, or more simply that the names we chose for the variables are clear?

Refactoring means basically to change the internal structure of something without changing its external behaviour, and tests are a priceless help in this phase. The tests we wrote are there to ensure that the previous behaviour does not change. Or, if it changes, that we are perfectly aware of it.

In this section, thus, I want to guide you through a refactoring guided by tests. If you are following this series and writing your own code this section will not add anything to the project, but I recommend that you read it anyway, as it shows why tests are so important in a software project.

If you want to follow the refactoring on the repository you can create a branch on the tag context-manager-refactoring and work there. From that commit I implemented the steps you will find in the next sections.

Context managers¶

The main issue the current code has is that the lexer cannot automatically recover a past status, that is we cannot easily try to parse something and, when we discover that the initial guess is wrong, go back in time and start over.

Let's consider the method parse_line

    def parse_line(self):
        try:
            self.lexer.stash()
            return self.parse_assignment()
        except clex.TokenError:
            self.lexer.pop()
            return self.parse_expression()

Since a line can contain either an assignment or an expression the first thing this function does is to save the lexer status with stash and try to parse an assignment. If the code is not an assignment somewhere is the code the TokenError exception is raised, and parse_line restores the previous status of the lexer and tries to parse an expression.

The same thing happens in other methods like parse_term

    def parse_term(self):
        left = self.parse_exponentiation()

        next_token = self.lexer.peek_token()

        while next_token.type == clex.LITERAL\
                and next_token.value in ['*', '/']:
            operator = self._parse_symbol()
            right = self.parse_exponentiation()

            left = BinaryNode(left, operator, right)

            next_token = self.lexer.peek_token()

        return left

where the use of lexer.peek_token and the while loop show that the lexer class requires too much control from its user.

Back to parse_line, it's clear that the code works, but it is not immediately easy to understand what the function does and when the old status is recovered. I would really prefer something like

# PSEUDOCODE
     def parse_line(self):
        ATTEMPT:
            return self.parse_assignment()

        return self.parse_expression()

where I used a pseudo-keyword ATTEMPT: to signal that somehow the lexer status is automatically stored at the beginning and retrieved at the end of it.

There's a very powerful concept in Python that allows us to write code like this, and it is called context manager. I won't go into the theory and syntax of context managers here, please refer to the Python documentation or your favourite course/book/website to discover how context managers work.

If I can add context manager features to the lexer the code of parse_line might become

     def parse_line(self):
        with self.lexer:
            return self.parse_assignment()

        return self.parse_expression()

Lexer¶

The first move is to transform the lexer into a context manager that does nothing. The test in tests/test_calc_lexer.py is

def test_lexer_as_context_manager():
    l = clex.CalcLexer()
    l.load('abcd')

    with l:
        assert l.get_token() == token.Token(clex.NAME, 'abcd')

When this works we have to be sure that the lexer does not restore the previous state outside the with statement if the code inside the statement ended without errors. The new test is

def test_lexer_as_context_manager_does_not_restore_the_status_if_no_error():
    l = clex.CalcLexer()
    l.load('3 * 5')

    with l:
        assert l.get_token() == token.Token(clex.INTEGER, 3)

    assert l.get_token() == token.Token(clex.LITERAL, '*')

Conversely, we need to be sure that the status is restored when the code inside the with statement fails, which is the whole point of the context manager. This is tested by

def test_lexer_as_context_manager_restores_the_status_if_token_error():
    l = clex.CalcLexer()
    l.load('3 * 5')

    with l:
        l.get_token()
        l.get_token()
        raise clex.TokenError

    assert l.get_token() == token.Token(clex.INTEGER, 3)

When these three tests pass, we have a fully working context manager lexer, that reacts to TokenError exceptions going back to the previously stored status.

Parser¶

If the context manager lexer works as intended we should be able to replace the code of the parser without changing any test. The new code for parse_line is the one I showed before

    def parse_line(self):
        with self.lexer:
            return self.parse_assignment()

        return self.parse_expression()

and it works flawlessly.

The context manager part of the lexer, however, works if the code inside the with statement raises a TokenError exception when it fails. That exception is a signal to the context manager that the parsing path is not leading anywhere and it shall go back to the previous state.

Manage literals¶

The method _parse_symbol is often used after some checks like

        if next_token.type == clex.LITERAL and next_token.value in ['-', '+']:
            operator = self._parse_symbol()

I would prefer to include the checks in the method itself, so that it might be included in a with statement. First of all the method can be renamed to _parse_literal, and being an internal method I don't expect any test to fail

    def _parse_literal(self):
        t = self.lexer.get_token()
        return LiteralNode(t.value)

The method should also raise a TokenError when the token is not a LITERAL, and when the values are not the expected ones

    def _parse_literal(self, values=None):
        t = self.lexer.get_token()

        if t.type != clex.LITERAL:
            raise clex.TokenError

        if values and t.value not in values:
            raise clex.TokenError

        return LiteralNode(t.value)

Note that using the default value for the values parameter I didn't change the current behaviour. The whole battery of tests still passes without errors.

Parsing factors¶

The next method that we can start changing is parse_factor. The first pattern this function tries to parse is an unary node

        if next_token.type == clex.LITERAL and next_token.value in ['-', '+']:
            operator = self._parse_literal()
            factor = self.parse_factor()
            return UnaryNode(operator, factor)

which may be converted to

        with self.lexer:
            operator = self._parse_literal(['+', '-'])
            content = self.parse_factor()
            return UnaryNode(operator, content)

while still passing the whole test suite.

The second pattern are expressions surrounded by parentheses

        if next_token.type == clex.LITERAL and next_token.value == '(':
            self.lexer.discard_type(clex.LITERAL)
            expression = self.parse_expression()
            self.lexer.discard_type(clex.LITERAL)
            return expression

and this is easily converted to the new syntax as well

        with self.lexer:
            self._parse_literal(['('])
            expression = self.parse_expression()
            self._parse_literal([')'])
            return expression

Parsing exponentiation operations¶

To change the method parse_exponentiation we need first to make the _parse_variable return a TokenError in case of wrong token

    def _parse_variable(self):
        t = self.lexer.get_token()

        if t.type != clex.NAME:
            raise clex.TokenError

        return VariableNode(t.value)

then we can change the method we are interested in

    def parse_exponentiation(self):
        left = self.parse_factor()

        with self.lexer:
            operator = self._parse_literal(['^'])
            right = self.parse_exponentiation()

            return PowerNode(left, operator, right)

        return left

Doing this last substitution I also notice that there is some duplicated code in parse_factor, and I replace it with a call to _parse_variable. The test suite keeps passing, giving me the certainty that what I did does not change the behaviour of the code (at least the behaviour that is covered by tests).

Parsing terms¶

Now, the method parse_term will be problematic. To implement this function I used a while loop that keeps using the method parse_exponentiation until the separation token is a LITERAL with value * or /

    def parse_term(self):
        left = self.parse_exponentiation()

        next_token = self.lexer.peek_token()

        while next_token.type == clex.LITERAL\
                and next_token.value in ['*', '/']:
            operator = self._parse_literal()
            right = self.parse_exponentiation()

            left = BinaryNode(left, operator, right)

            next_token = self.lexer.peek_token()

        return left

This is not a pure recursive call, then, and replacing the code with the context manager lexer would result in errors, because the context manage doesn't loop. The same situation is replicated in parse_expression.

This is another typical situation that we face when refactoring code. We realise that the required change is made of multiple steps and that multiple tests will fail until all the steps are completed.

There is no single solution to this problem, but TDD gives you some hints to deal with it. The most important "rule" that I follow when I work in a TDD environment is that there should be maximum one failing test at a time. And when a code change makes multiple tests fail there is a simple way to reach this situation: comment out tests.

Yes, you should temporarily get rid of tests, so you can concentrate in writing code that passes the subset of active tests. Then you will add one test at a time, fixing the code or the tests according to your needs. When you refactor it might be necessary to change the tests as well, as sometimes we test part of the code that are not exactly an external boundary.

I can now change the code of the parse_term function introducing the context manager

    def parse_term(self):
        left = self.parse_exponentiation()

        with self.lexer:
            operator = self._parse_literal(['*', '/'])
            right = self.parse_exponentiation()

            return BinaryNode(left, operator, right)

        return left

and the test suite runs with one single failing test, test_parse_term_with_multiple_operations. I have now to work on it and try to understand why the test fails.

I decided to go for a pure recursive approach (no more while loops), which is what standard language parsers do. After working on it the new version of parse_term is

    def parse_term(self):
        left = self.parse_factor()

        with self.lexer:
            operator = self._parse_literal(['*', '/'])
            right = self.parse_term()

            return BinaryNode(left, operator, right)

        return left

And this change makes 3 tests fail. The same test_parse_term_with_multiple_operations that was failing before, plus test_parse_exponentiation_with_other_operators, and test_parse_exponentiation_with_parenthesis. The last two actually test the method parse_exponentiation, which uses parse_term. This means that I can temporarily comment them and concentrate on the first one.

What I discover is that changing the code to use the recursive approach changes the output of the functions. The previous output of parse_term applied to 2 * 3 / 4 was

{
    'type': 'binary',
    'left': {
        'type': 'binary',
        'left': {
            'type': 'integer',
            'value': 2
        },
        'right': {
            'type': 'integer',
            'value': 3
        },
        'operator': {
            'type': 'literal',
            'value': '*'
        }
    },
    'right': {
        'type': 'integer',
        'value': 4
    },
    'operator': {
        'type': 'literal',
        'value': '/'
    }
}

that is, the multiplication was stored first. Moving to a recursive approach makes the parse_term function return this

{
    'type': 'binary',
    'left': {
        'type': 'integer',
        'value': 2
    },
    'right': {
        'type': 'binary',
        'left': {
            'type': 'integer',
            'value': 3
        },
        'right': {
            'type': 'integer',
            'value': 4
        },
        'operator': {
            'type': 'literal',
            'value': '/'
        }
    },
    'operator': {
        'type': 'literal',
        'value': '*'
    }
}

I think it is pretty clear that this structure, once visited, will return the same value as the previous one, as multiplication and division can be swapped. We will have to pay attention to this swap when operators with different priority are involved, like sum and multiplication, but for this test we can agree the result is no different.

This means that we may change the test and make it pass. Let me stress it once more: we have to understand why the test doesn't pass, and once we understood the reason, and decided it is acceptable, we can change the test.

Tests are not immutable, they are mere safeguards that raise alarms when you change the behaviour. It's up to you to deal with the alarm and to decide what to do.

Once the test has been modified and the test suite passes, it's time to uncomment the first of the two remaining tests, test_parse_exponentiation_with_other_operators. This test uses parse_term to parse a string that contains an exponentiation, but the new code of the method parse_term doesn't call the parse_exponentiation function. So the test fails.

Parsing exponentiation¶

That tests tries to parse a string that contains a multiplication and an exponentiation, so the method that we should use to process it is parse_term. The current version of parse_term, however, doesn't consider exponentiation, so the new code is

    def parse_term(self):
        left = self.parse_exponentiation()

        with self.lexer:
            operator = self._parse_literal(['*', '/'])
            right = self.parse_term()

            return BinaryNode(left, operator, right)

        return left

which makes all the active tests pass.

There is still one commented test, test_parse_exponentiation_with_parenthesis, that now passes with the new code.

Parsing expressions¶

The new version of parse_expression has the same issue we found with parse_term, that is the recursive approach changes the output.

    def parse_expression(self):
        left = self.parse_term()

        with self.lexer:
            operator = self._parse_literal(['+', '-'])
            right = self.parse_expression()

            left = BinaryNode(left, operator, right)

        return left

As before, we have to decide if the change is acceptable or if it is a real error. As happened for parse_term the test can be safely changes to match the new code output.

Level 16 - Float numbers¶

So far, our calculator can handle only integer values, so it's time to add support for float numbers. This change shouldn't be complex: floating point numbers are easy to parse as they are basically two integer numbers separated by a dot.

Lexer¶

To test if the lexer understands floating point numbers it's enough to add this to tests/test_calc_lexer.py

def test_get_tokens_understands_floats():
    l = clex.CalcLexer()

    l.load('3.6')

    assert l.get_tokens() == [
        token.Token(clex.FLOAT, '3.6'),
        token.Token(clex.EOL),
        token.Token(clex.EOF)
    ]

Parser¶

To support float numbers it's enough to add that feature the method that we use to parse integers. We can rename parse_integer to parse_number, fix the test test_parse_integer, and add test_parse_float

def test_parse_integer():
    p = cpar.CalcParser()
    p.lexer.load("5")

    node = p.parse_number()

    assert node.asdict() == {
        'type': 'integer',
        'value': 5
    }


def test_parse_float():
    p = cpar.CalcParser()
    p.lexer.load("5.8")

    node = p.parse_number()

    assert node.asdict() == {
        'type': 'float',
        'value': 5.8
    }

Visitor¶

The same thing that we did for the parser is valid for the visitor. We just need to copy the test for the integer numbers and adapt it

def test_visitor_float():
    ast = {
        'type': 'float',
        'value': 12.345
    }

    v = cvis.CalcVisitor()
    assert v.visit(ast) == (12.345, 'float')

This however leaves a bug in the visitor. The Test-Driven Development methodology can help you writing and changing your code, but cannot completely avoid bugs in the code. Actually, if you don't test cases, TDD can't do anything to find and remove bugs.

The bug I noticed after a while is that the visitor doesn't correctly manage an operation between integers and floats, returning an integer result. For example, if you sum 4 with 5.1 you should get 9.1 with type float. To test this behaviour we can add this code

def test_visitor_expression_sum_with_float():
    ast = {
        'type': 'binary',
        'left': {
            'type': 'float',
            'value': 5.1
        },
        'right': {
            'type': 'integer',
            'value': 4
        },
        'operator': {
            'type': 'literal',
            'value': '+'
        }
    }

    v = cvis.CalcVisitor()
    assert v.visit(ast) == (9.1, 'float')

Solution¶

The first thing the lexer need is a label to identify FLOAT tokens

FLOAT = 'FLOAT'

then the method _process_integer cna be extended to process float numbers as well. To do this the method is renamed to _process_number, the regular expression is modified, and the token_type is managed according to the presence of the dot.

    def _process_number(self):
        regexp = re.compile('[\d\.]+')

        match = regexp.match(
            self._text_storage.tail
        )

        if not match:
            return None

        token_string = match.group()

        token_type = FLOAT if '.' in token_string else INTEGER

        return self._set_current_token_and_skip(
            token.Token(token_type, token_string)
        )

Remember that the get_token function has to be modified to use the new name of the method. The new code is

    def get_token(self):
        eof = self._process_eof()
        if eof:
            return eof

        eol = self._process_eol()
        if eol:
            return eol

        self._process_whitespace()

        name = self._process_name()
        if name:
            return name

        integer = self._process_number()
        if integer:
            return integer

        literal = self._process_literal()
        if literal:
            return literal

First we need to add a new type of node

class FloatNode(ValueNode):
    node_type = 'float'

The new version of parse_integer, renamed parse_number, shall deal with both cases but also raise the TokenError exception if the parsing fails

    def parse_number(self):
        t = self.lexer.get_token()

        if t.type == clex.INTEGER:
            return IntegerNode(int(t.value))
        elif t.type == clex.FLOAT:
            return FloatNode(float(t.value))

        raise clex.TokenError

The change to support float nodes is trivial, we just need to include it alongside with the integer case

    def visit(self, node):
        if node['type'] in ['integer', 'float']:
            return node['value'], node['type']

To fix the missing type promotion when dealing with integers and floats it's enough to add

            if ltype == 'float':
                rtype = ltype

just before evaluating the operator in the binary nodes. The full code of the method visit is then

    def visit(self, node):
        if node['type'] in ['integer', 'float']:
            return node['value'], node['type']

        if node['type'] == 'variable':
            return self.valueof(node['value']), self.typeof(node['value'])

        if node['type'] == 'unary':
            operator = node['operator']['value']
            cvalue, ctype = self.visit(node['content'])

            if operator == '-':
                return - cvalue, ctype

            return cvalue, ctype

        if node['type'] == 'binary':
            lvalue, ltype = self.visit(node['left'])
            rvalue, rtype = self.visit(node['right'])

            operator = node['operator']['value']

            if ltype == 'float':
                rtype = ltype

            if operator == '+':
                return lvalue + rvalue, rtype
            elif operator == '-':
                return lvalue - rvalue, rtype
            elif operator == '*':
                return lvalue * rvalue, rtype
            elif operator == '/':
                return lvalue // rvalue, rtype

        if node['type'] == 'assignment':
            right_value, right_type = self.visit(node['value'])
            self.variables[node['variable']] = {
                'value': right_value,
                'type': right_type
            }

            return None, None

        if node['type'] == 'exponentiation':
            lvalue, ltype = self.visit(node['left'])
            rvalue, rtype = self.visit(node['right'])

            return lvalue ** rvalue, ltype

Final words¶

This post showed you how powerful the TDD methodology is when it comes to refactoring, or in general when the code has to be changed. Remember that tests can be changed if there are good reasons, and that the main point is to understand what's happening in your code and in the cases that you already tested.

The code I developed in this post is available on the GitHub repository tagged with part4 (link).

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

Introduction¶

Level 15 - Exponentiation¶

Lexer¶

Parser¶

Visitor¶

Solution¶

Intermezzo - Refactoring with tests¶

Context managers¶

Lexer¶

Parser¶

Manage literals¶

Parsing factors¶

Parsing exponentiation operations¶

Parsing terms¶

Parsing exponentiation¶

Parsing expressions¶

Level 16 - Float numbers¶

Lexer¶

Parser¶

Visitor¶

Solution¶

Final words¶

Feedback¶

Previous articles

Next articles

A game of tokens: write an interpreter in Python with TDD - Part 5

A game of tokens: solution - Part 4

A game of tokens: solution - Part 3

A game of tokens: write an interpreter in Python with TDD - Part 3

A game of tokens: solution - Part 2

A game of tokens: write an interpreter in Python with TDD - Part 2

A game of tokens: solution - Part 1

A game of tokens: write an interpreter in Python with TDD - Part 1

TDD in Python with pytest - Part 5

TDD in Python with pytest - Part 4