The Digital Cat - Python3https://www.thedigitalcatonline.com/2023-09-03T19:00:00+02:00Adventures of a curious cat in the land of programmingFirst-class objects in Python - Higher-order functions, wrappers, and factories2021-03-09T16:00:00+00:002022-09-18T23:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2021-03-09:/blog/2021/03/09/first-class-objects-in-python/<p>My new book "First-class objects in Python" is out! Grab your <strong>FREE</strong> copy <a href="https://www.thedigitalcat.academy/freebie-first-class-objects">here</a>!</p><div class="imageblock"><img src="/images/first-class-objects-in-python.jpg"></div>Mau: a lightweight markup language2021-02-22T10:00:00+00:002021-02-25T18:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2021-02-22:/blog/2021/02/22/mau-a-lightweight-markup-language/<p>Mau is a lightweight markup language heavily inspired by AsciiDoc that makes is very easy to write blog posts or books.</p><h2 id="what-is-mau-fb3c">What is Mau?<a class="headerlink" href="#what-is-mau-fb3c" title="Permanent link">¶</a></h2><p>Mau is a lightweight markup language heavily inspired by AsciiDoc that makes is very easy to write blog posts or books.</p><p>The main goal of Mau is to provide a customisable markup language, reusing the good parts of AsciiDoc and providing a pure Python 3 implementation.</p><p>You can find Mau's source code on <a href="https://github.com/Project-Mau/mau">GitHub</a>.</p><h2 id="why-not-markdown-or-asciidoc-b535">Why not Markdown or AsciiDoc?<a class="headerlink" href="#why-not-markdown-or-asciidoc-b535" title="Permanent link">¶</a></h2><p>Markdown is a very good format, and I used it for all the posts in this blog so far. I grew increasingly unsatisfied, though, because of the lack of some features and the poor amount of customisation that it provides. When I wrote the second version of my book "Clean Architectures in Python" I considered using Markdown (through Pelican), but I couldn't find a good way to create tips and warnings. Recently, Python Markdown added a feature that allows to specify the file name for the source code, but the resulting HTML cannot easily be changed, making it difficult to achieve the graphical output I wanted.</p><p>AsciiDoc started as a Python project, but then was abandoned and eventually resurrected by Dan Allen with Asciidoctor. AsciiDoc has a lot of features and I consider it superior to Markdown, but Asciidoctor is a Ruby program, and this made it difficult for me to use it. In addition, the standard output of Asciidoctor is a nice single HTML page but again customising it is a pain. I had to struggle to add my Google Analytics code and a <code>sitemap.xml</code> to the book site.</p><p>I simply thought I could try to write my own tool, in a language that I know well (Python). It works, and I learned a lot writing it, so I'm definitely happy. I'd be delighted to know that this can be useful to other people, though.</p><h2 id="pelican-f581">Pelican<a class="headerlink" href="#pelican-f581" title="Permanent link">¶</a></h2><p>A reader for Mau source files is available in Pelican, you can find the code at <a href="https://github.com/getpelican/pelican-plugins/pull/1327">https://github.com/getpelican/pelican-plugins/pull/1327</a>. Simply add the code to your Pelican plugins directory and activate it adding <code>"mau_reader"</code> to <code>PLUGINS</code> in your file <code>pelicanconf.py</code>. The Mau reader processes only files with the <code>.mau</code> extension, so you can use Markdown/reStructuredText and Mau at the same time.</p><h2 id="development-f3c5">Development<a class="headerlink" href="#development-f3c5" title="Permanent link">¶</a></h2><p>If you are interested you can leave a star on the project on the <a href="https://github.com/Project-Mau/mau">GitHub page</a>, start using it, or contribute ideas, code, bugfixes.</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>TDD in Python with pytest - Part 52020-09-21T10:30:00+02:002021-03-06T19:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-09-21:/blog/2020/09/21/tdd-in-python-with-pytest-part-5/<p>This is the fifth and last post in the series "TDD in Python with pytest" where I develop a simple project following a strict TDD methodology. The posts come from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a> and have been reviewed to get rid of some bad naming choices of the …</p><p>This is the fifth and last post in the series "TDD in Python with pytest" where I develop a simple project following a strict TDD methodology. The posts come from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a> and have been reviewed to get rid of some bad naming choices of the version published in the book.</p>
<p>You can find the first post <a href="https://www.thedigitalcatonline.com/blog/2020/09/10/tdd-in-python-with-pytest-part-1/">here</a>.</p>
<p>In this post I will conclude the discussion about mocks introducing patching.</p>
<h2 id="patching">Patching<a class="headerlink" href="#patching" title="Permanent link">¶</a></h2>
<p>Mocks are very simple to introduce in your tests whenever your objects accept classes or instances from outside. In that case, as shown in the previous sections, you just have to instantiate the class <code>Mock</code> and pass the resulting object to your system. However, when the external classes instantiated by your library are hardcoded this simple trick does not work. In this case you have no chance to pass a fake object instead of the real one.</p>
<p>This is exactly the case addressed by patching. Patching, in a testing framework, means to replace a globally reachable object with a mock, thus achieving the goal of having the code run unmodified, while part of it has been hot swapped, that is, replaced at run time.</p>
<h3 id="a-warm-up-example">A warm-up example<a class="headerlink" href="#a-warm-up-example" title="Permanent link">¶</a></h3>
<p>Clone the repository <code>fileinfo</code> that you can find <a href="https://github.com/lgiordani/fileinfo">here</a> and move to the branch <code>develop</code>. As I did for the project <code>simple_calculator</code>, the branch <code>master</code> contains the full solution, and I use it to maintain the repository, but if you want to code along you need to start from scratch. If you prefer, you can clearly clone it on GitHub and make your own copy of the repository.</p>
<div class="highlight"><pre><span></span><code>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/lgiordani/fileinfo
<span class="nb">cd</span><span class="w"> </span>fileinfo
git<span class="w"> </span>checkout<span class="w"> </span>--track<span class="w"> </span>origin/develop
</code></pre></div>
<p>Create a virtual environment following your preferred process and install the requirements</p>
<div class="highlight"><pre><span></span><code>pip<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>requirements/dev.txt
</code></pre></div>
<p>You should at this point be able to run</p>
<div class="highlight"><pre><span></span><code>pytest<span class="w"> </span>-svv
</code></pre></div>
<p>and get an output like</p>
<div class="highlight"><pre><span></span><code>=============================== test session starts ===============================
platform linux -- Python XXXX, pytest-XXXX, py-XXXX, pluggy-XXXX --
fileinfo/venv3/bin/python3
cachedir: .cache
rootdir: fileinfo, inifile: pytest.ini
plugins: cov-XXXX
collected 0 items
============================== no tests ran in 0.02s ==============================
</code></pre></div>
<p>Let us start with a very simple example. Patching can be complex to grasp at the beginning so it is better to start learning it with trivial use cases. The purpose of this library is to develop a simple class that returns information about a given file. The class shall be instantiated with the file path, which can be relative.</p>
<p>The starting point is the class with the method <code>__init__</code>. If you want you can develop the class using TDD, but for the sake of brevity I will not show here all the steps that I followed. This is the set of tests I have in <code>tests/test_fileinfo.py</code></p>
<div class="highlight"><span class="filename">tests/test_fileinfo.py</span><pre><span></span><code><span class="kn">from</span> <span class="nn">fileinfo.fileinfo</span> <span class="kn">import</span> <span class="n">FileInfo</span>
<span class="k">def</span> <span class="nf">test_init</span><span class="p">():</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">filename</span> <span class="o">==</span> <span class="n">filename</span>
<span class="k">def</span> <span class="nf">test_init_relative</span><span class="p">():</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">relative_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">relative_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">filename</span> <span class="o">==</span> <span class="n">filename</span>
</code></pre></div>
<p>and this is the code of the class <code>FileInfo</code> in the file <code>fileinfo/fileinfo.py</code></p>
<div class="highlight"><span class="filename">fileinfo/fileinfo.py</span><pre><span></span><code><span class="kn">import</span> <span class="nn">os</span>
<span class="k">class</span> <span class="nc">FileInfo</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">path</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">original_path</span> <span class="o">=</span> <span class="n">path</span>
<span class="bp">self</span><span class="o">.</span><span class="n">filename</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">basename</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>
</code></pre></div>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/fileinfo/tree/first-version">first-version</a></p>
<p>As you can see the class is extremely simple, and the tests are straightforward. So far I didn't add anything new to what we discussed in the previous posts.</p>
<p>Now I want the method <code>get_info</code> to return a tuple with the file name, the original path the class was instantiated with, and the absolute path of the file. Pretending we are in the directory <code>/some/absolute/path</code>, the class should work as shown here</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="s1">'../book_list.txt'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span>
<span class="p">(</span><span class="s1">'book_list.txt'</span><span class="p">,</span> <span class="s1">'../book_list.txt'</span><span class="p">,</span> <span class="s1">'/some/absolute'</span><span class="p">)</span>
</code></pre></div>
<p>You can quickly realise that you have a problem writing the test. There is no way to easily test something as "the absolute path", since the outcome of the function called in the test is supposed to vary with the path of the test itself. Let us try to write part of the test</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_info</span><span class="p">():</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">original_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span> <span class="o">==</span> <span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">original_path</span><span class="p">,</span> <span class="s1">'???'</span><span class="p">)</span>
</code></pre></div>
<p>where the <code>'???'</code> string highlights that I cannot put something sensible to test the absolute path of the file.</p>
<p>Patching is the way to solve this problem. You know that the function will use some code to get the absolute path of the file. So, within the scope of this test only, you can replace that code with something different and perform the test. Since the replacement code has a known outcome writing the test is now possible.</p>
<p>Patching, thus, means to inform Python that during the execution of a specific portion of the code you want a globally accessible module/object replaced by a mock. Let's see how we can use it in our example</p>
<div class="highlight"><span class="filename">tests/test_fileinfo.py</span><pre><span></span><code><span class="kn">from</span> <span class="nn">unittest.mock</span> <span class="kn">import</span> <span class="n">patch</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">test_get_info</span><span class="p">():</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">original_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="k">with</span> <span class="n">patch</span><span class="p">(</span><span class="s1">'os.path.abspath'</span><span class="p">)</span> <span class="k">as</span> <span class="n">abspath_mock</span><span class="p">:</span>
<span class="n">test_abspath</span> <span class="o">=</span> <span class="s1">'some/abs/path'</span>
<span class="n">abspath_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_abspath</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span> <span class="o">==</span> <span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">original_path</span><span class="p">,</span> <span class="n">test_abspath</span><span class="p">)</span>
</code></pre></div>
<p>You clearly see the context in which the patching happens, as it is enclosed in a <code>with</code> statement. Inside this statement the module <code>os.path.abspath</code> will be replaced by a mock created by the function <code>patch</code> and called <code>abspath_mock</code>. So, while Python executes the lines of code enclosed by the statement <code>with</code> any call to <code>os.path.abspath</code> will return the object <code>abspath_mock</code>.</p>
<p>The first thing we can do, then, is to give the mock a known <code>return_value</code>. This way we solve the issue that we had with the initial code, that is using an external component that returns an unpredictable result. The line</p>
<div class="highlight"><span class="filename">tests/test_fileinfo.py</span><pre><span></span><code><span class="kn">from</span> <span class="nn">unittest.mock</span> <span class="kn">import</span> <span class="n">patch</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">test_get_info</span><span class="p">():</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">original_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="k">with</span> <span class="n">patch</span><span class="p">(</span><span class="s1">'os.path.abspath'</span><span class="p">)</span> <span class="k">as</span> <span class="n">abspath_mock</span><span class="p">:</span>
<span class="n">test_abspath</span> <span class="o">=</span> <span class="s1">'some/abs/path'</span>
<span class="hll"> <span class="n">abspath_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_abspath</span>
</span> <span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span> <span class="o">==</span> <span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">original_path</span><span class="p">,</span> <span class="n">test_abspath</span><span class="p">)</span>
</code></pre></div>
<p>instructs the patching mock to return the given string as a result, regardless of the real values of the file under consideration. </p>
<p>The code that make the test pass is</p>
<div class="highlight"><span class="filename">fileinfo/fileinfo.py</span><pre><span></span><code><span class="k">class</span> <span class="nc">FileInfo</span><span class="p">:</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">get_info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">filename</span><span class="p">,</span>
<span class="bp">self</span><span class="o">.</span><span class="n">original_path</span><span class="p">,</span>
<span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">abspath</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">original_path</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p>When this code is executed by the test the function <code>os.path.abspath</code> is replaced at run time by the mock that we prepared there, which basically ignores the input value <code>self.original_path</code> and returns the fixed value it was instructed to use.</p>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/fileinfo/tree/patch-with-context-manager">patch-with-context-manager</a></p>
<p>It is worth at this point discussing outgoing messages again. The code that we are considering here is a clear example of an outgoing query, as the method <code>get_info</code> is not interested in changing the status of the external component. In the previous post we reached the conclusion that testing the return value of outgoing queries is pointless and should be avoided. With <code>patch</code> we are replacing the external component with something that we know, using it to test that our object correctly handles the value returned by the outgoing query. We are thus not testing the external component, as it has been replaced, and we are definitely not testing the mock, as its return value is already known.</p>
<p>Obviously to write the test you have to know that you are going to use the function <code>os.path.abspath</code>, so patching is somehow a "less pure" practice in TDD. In pure OOP/TDD you are only concerned with the external behaviour of the object, and not with its internal structure. This example, however, shows that this pure approach has some limitations that you have to cope with, and patching is a clean way to do it.</p>
<h2 id="the-patching-decorator">The patching decorator<a class="headerlink" href="#the-patching-decorator" title="Permanent link">¶</a></h2>
<p>The function <code>patch</code> we imported from the module <code>unittest.mock</code> is very powerful, as it can temporarily replace an external object. If the replacement has to or can be active for the whole test, there is a cleaner way to inject your mocks, which is to use <code>patch</code> as a function decorator.</p>
<p>This means that you can decorate the test function, passing as argument the same argument you would pass if <code>patch</code> was used in a <code>with</code> statement. This requires however a small change in the test function prototype, as it has to receive an additional argument, which will become the mock.</p>
<p>Let's change <code>test_get_info</code>, removing the statement <code>with</code> and decorating the function with <code>patch</code></p>
<div class="highlight"><span class="filename">tests/test_fileinfo.py</span><pre><span></span><code><span class="nd">@patch</span><span class="p">(</span><span class="s1">'os.path.abspath'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_get_info</span><span class="p">(</span><span class="n">abspath_mock</span><span class="p">):</span>
<span class="n">test_abspath</span> <span class="o">=</span> <span class="s1">'some/abs/path'</span>
<span class="n">abspath_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_abspath</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">original_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span> <span class="o">==</span> <span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">original_path</span><span class="p">,</span> <span class="n">test_abspath</span><span class="p">)</span>
</code></pre></div>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/fileinfo/tree/patch-with-function-decorator">patch-with-function-decorator</a></p>
<p>As you can see the decorator <code>patch</code> works like a big <code>with</code> statement for the whole function. The argument <code>abspath_mock</code> passed to the test becomes internally the mock that replaces <code>os.path.abspath</code>. Obviously this way you replace <code>os.path.abspath</code> for the whole function, so you have to decide case by case which form of the function <code>patch</code> you need to use.</p>
<h2 id="multiple-patches">Multiple patches<a class="headerlink" href="#multiple-patches" title="Permanent link">¶</a></h2>
<p>You can patch more that one object in the same test. For example, consider the case where the method <code>get_info</code> calls <code>os.path.getsize</code> in addition to <code>os.path.abspath</code> in order to return the size of the file. You have at this point two different outgoing queries, and you have to replace both with mocks to make your class work during the test.</p>
<p>This can be easily done with an additional <code>patch</code> decorator</p>
<div class="highlight"><span class="filename">tests/test_fileinfo.py</span><pre><span></span><code><span class="nd">@patch</span><span class="p">(</span><span class="s1">'os.path.getsize'</span><span class="p">)</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s1">'os.path.abspath'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_get_info</span><span class="p">(</span><span class="n">abspath_mock</span><span class="p">,</span> <span class="n">getsize_mock</span><span class="p">):</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">original_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="n">test_abspath</span> <span class="o">=</span> <span class="s1">'some/abs/path'</span>
<span class="n">abspath_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_abspath</span>
<span class="n">test_size</span> <span class="o">=</span> <span class="mi">1234</span>
<span class="n">getsize_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_size</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span> <span class="o">==</span> <span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">original_path</span><span class="p">,</span> <span class="n">test_abspath</span><span class="p">,</span> <span class="n">test_size</span><span class="p">)</span>
</code></pre></div>
<p>Please note that the decorator which is nearest to the function is applied first. Always remember that the decorator syntax with <code>@</code> is a shortcut to replace the function with the output of the decorator, so two decorators result in</p>
<div class="highlight"><pre><span></span><code><span class="nd">@decorator1</span>
<span class="nd">@decorator2</span>
<span class="k">def</span> <span class="nf">myfunction</span><span class="p">():</span>
<span class="k">pass</span>
</code></pre></div>
<p>which is a shorcut for</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">myfunction</span><span class="p">():</span>
<span class="k">pass</span>
<span class="n">myfunction</span> <span class="o">=</span> <span class="n">decorator1</span><span class="p">(</span><span class="n">decorator2</span><span class="p">(</span><span class="n">myfunction</span><span class="p">))</span>
</code></pre></div>
<p>This explains why, in the test code, the function receives first <code>abspath_mock</code> and then <code>getsize_mock</code>. The first decorator applied to the function is the patch of <code>os.path.abspath</code>, which appends the mock that we call <code>abspath_mock</code>. Then the patch of <code>os.path.getsize</code> is applied and this appends its own mock.</p>
<p>The code that makes the test pass is</p>
<div class="highlight"><span class="filename">fileinfo/fileinfo.py</span><pre><span></span><code><span class="k">class</span> <span class="nc">FileInfo</span><span class="p">:</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">get_info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">filename</span><span class="p">,</span>
<span class="bp">self</span><span class="o">.</span><span class="n">original_path</span><span class="p">,</span>
<span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">abspath</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">original_path</span><span class="p">),</span>
<span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">getsize</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">original_path</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/fileinfo/tree/multiple-patches">multiple-patches</a></p>
<p>We can write the above test using two <code>with</code> statements as well</p>
<div class="highlight"><span class="filename">tests/test_fileinfo.py</span><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_info</span><span class="p">():</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">original_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="k">with</span> <span class="n">patch</span><span class="p">(</span><span class="s1">'os.path.abspath'</span><span class="p">)</span> <span class="k">as</span> <span class="n">abspath_mock</span><span class="p">:</span>
<span class="n">test_abspath</span> <span class="o">=</span> <span class="s1">'some/abs/path'</span>
<span class="n">abspath_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_abspath</span>
<span class="k">with</span> <span class="n">patch</span><span class="p">(</span><span class="s1">'os.path.getsize'</span><span class="p">)</span> <span class="k">as</span> <span class="n">getsize_mock</span><span class="p">:</span>
<span class="n">test_size</span> <span class="o">=</span> <span class="mi">1234</span>
<span class="n">getsize_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_size</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span> <span class="o">==</span> <span class="p">(</span>
<span class="n">filename</span><span class="p">,</span>
<span class="n">original_path</span><span class="p">,</span>
<span class="n">test_abspath</span><span class="p">,</span>
<span class="n">test_size</span>
<span class="p">)</span>
</code></pre></div>
<p>Using more than one <code>with</code> statement, however, makes the code difficult to read, in my opinion, so in general I prefer to avoid complex <code>with</code> trees if I do not really need to use a limited scope of the patching.</p>
<h2 id="checking-call-parameters">Checking call parameters<a class="headerlink" href="#checking-call-parameters" title="Permanent link">¶</a></h2>
<p>When you patch, your internal algorithm is not executed, as the patched method just return the values it has been instructed to return. This is connected to what we said about testing external systems, so everything is good, but while we don't want to test the internals of the module <code>os.path</code>, we want to be sure that we are passing the correct values to the external methods.</p>
<p>This is why mocks provide methods like <code>assert_called_with</code> (and other similar methods), through which we can check the values passed to a patched method when it is called. Let's add the checks to the test</p>
<div class="highlight"><span class="filename">tests/test_fileinfo.py</span><pre><span></span><code><span class="nd">@patch</span><span class="p">(</span><span class="s1">'os.path.getsize'</span><span class="p">)</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s1">'os.path.abspath'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_get_info</span><span class="p">(</span><span class="n">abspath_mock</span><span class="p">,</span> <span class="n">getsize_mock</span><span class="p">):</span>
<span class="n">test_abspath</span> <span class="o">=</span> <span class="s1">'some/abs/path'</span>
<span class="n">abspath_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_abspath</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">original_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="n">test_size</span> <span class="o">=</span> <span class="mi">1234</span>
<span class="n">getsize_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_size</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="n">info</span> <span class="o">=</span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span>
<span class="n">abspath_mock</span><span class="o">.</span><span class="n">assert_called_with</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="n">getsize_mock</span><span class="o">.</span><span class="n">assert_called_with</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">info</span> <span class="o">==</span> <span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">original_path</span><span class="p">,</span> <span class="n">test_abspath</span><span class="p">,</span> <span class="n">test_size</span><span class="p">)</span>
</code></pre></div>
<p>As you can see, I first invoke <code>fi.get_info</code> storing the result in the variable <code>info</code>, check that the patched methods have been called witht the correct parameters, and then assert the format of its output.</p>
<p>The test passes, confirming that we are passing the correct values.</p>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/fileinfo/tree/addding-checks-for-input-values">addding-checks-for-input-values</a></p>
<h2 id="patching-immutable-objects">Patching immutable objects<a class="headerlink" href="#patching-immutable-objects" title="Permanent link">¶</a></h2>
<p>The most widespread version of Python is CPython, which is written, as the name suggests, in C. Part of the standard library is also written in C, while the rest is written in Python itself.</p>
<p>The objects (classes, modules, functions, etc.) that are implemented in C are shared between interpreters, and this requires those objects to be immutable, so that you cannot alter them at runtime from a single interpreter.</p>
<p>An example of this immutability can be given easily using a Python console</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">a</span> <span class="o">=</span> <span class="mi">1</span>
<span class="o">>>></span> <span class="n">a</span><span class="o">.</span><span class="n">conjugate</span> <span class="o">=</span> <span class="mi">5</span>
<span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
<span class="n">File</span> <span class="s2">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="ne">AttributeError</span><span class="p">:</span> <span class="s1">'int'</span> <span class="nb">object</span> <span class="n">attribute</span> <span class="s1">'conjugate'</span> <span class="ow">is</span> <span class="n">read</span><span class="o">-</span><span class="n">only</span>
</code></pre></div>
<p>Here I'm trying to replace a method with an integer, which is pointless per se, but clearly shows the issue we are facing.</p>
<p>What has this immutability to do with patching? What <code>patch</code> does is actually to temporarily replace an attribute of an object (method of a class, class of a module, etc.), which also means that if we try to replace an attribute in an immutable object the patching action will fail.</p>
<p>A typical example of this problem is the module <code>datetime</code>, which is also one of the best candidates for patching, since the output of time functions is by definition time-varying.</p>
<p>Let me show the problem with a simple class that logs operations. I will temporarily break the TDD methodology writing first the class and then the tests, so that you can appreciate the problem.</p>
<p>Create a file called <code>logger.py</code> and put there the following code</p>
<div class="highlight"><span class="filename">fileinfo/logger.py</span><pre><span></span><code><span class="kn">import</span> <span class="nn">datetime</span>
<span class="k">class</span> <span class="nc">Logger</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">messages</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">def</span> <span class="nf">log</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">message</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">messages</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">(),</span> <span class="n">message</span><span class="p">))</span>
</code></pre></div>
<p>This is pretty simple, but testing this code is problematic, because the method <code>log</code> produces results that depend on the actual execution time. The call to <code>datetime.datetime.now</code> is however an outgoing query, and as such it can be replaced by a mock with <code>patch</code>.</p>
<p>If we try to do it, however, we will have a bitter surprise. This is the test code, that you can put in <code>tests/test_logger.py</code></p>
<div class="highlight"><span class="filename">tests/test_logger.py</span><pre><span></span><code><span class="kn">from</span> <span class="nn">unittest.mock</span> <span class="kn">import</span> <span class="n">patch</span>
<span class="kn">from</span> <span class="nn">fileinfo.logger</span> <span class="kn">import</span> <span class="n">Logger</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s1">'datetime.datetime.now'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_log</span><span class="p">(</span><span class="n">mock_now</span><span class="p">):</span>
<span class="n">test_now</span> <span class="o">=</span> <span class="mi">123</span>
<span class="n">test_message</span> <span class="o">=</span> <span class="s2">"A test message"</span>
<span class="n">mock_now</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_now</span>
<span class="n">test_logger</span> <span class="o">=</span> <span class="n">Logger</span><span class="p">()</span>
<span class="n">test_logger</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">test_message</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">test_logger</span><span class="o">.</span><span class="n">messages</span> <span class="o">==</span> <span class="p">[(</span><span class="n">test_now</span><span class="p">,</span> <span class="n">test_message</span><span class="p">)]</span>
</code></pre></div>
<p>When you try to execute this test you will get the following error</p>
<div class="highlight"><pre><span></span><code><span class="n">TypeError</span><span class="o">:</span><span class="w"> </span><span class="n">can</span><span class="s1">'t set attributes of built-in/extension type '</span><span class="n">datetime</span><span class="o">.</span><span class="na">datetime</span><span class="err">'</span>
</code></pre></div>
<p>which is raised because patching tries to replace the function <code>now</code> in <code>datetime.datetime</code> with a mock, and since the module is immutable this operation fails.</p>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/fileinfo/tree/initial-logger-not-working">initial-logger-not-working</a></p>
<p>There are several ways to address this problem. All of them, however, start from the fact that importing or subclassing an immutable object gives you a mutable "copy" of that object.</p>
<p>The easiest example in this case is the module <code>datetime</code> itself. In the function <code>test_log</code> we tried to patch directly the object <code>datetime.datetime.now</code>, affecting the builtin module <code>datetime</code>. The file <code>logger.py</code>, however, does import <code>datetime</code>, so this latter becomes a local symbol in the module <code>logger</code>. This is exactly the key for our patching. Let us change the code to</p>
<div class="highlight"><span class="filename">tests/test_logger.py</span><pre><span></span><code><span class="nd">@patch</span><span class="p">(</span><span class="s1">'fileinfo.logger.datetime.datetime'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_log</span><span class="p">(</span><span class="n">mock_datetime</span><span class="p">):</span>
<span class="n">test_now</span> <span class="o">=</span> <span class="mi">123</span>
<span class="n">test_message</span> <span class="o">=</span> <span class="s2">"A test message"</span>
<span class="n">mock_datetime</span><span class="o">.</span><span class="n">now</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_now</span>
<span class="n">test_logger</span> <span class="o">=</span> <span class="n">Logger</span><span class="p">()</span>
<span class="n">test_logger</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">test_message</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">test_logger</span><span class="o">.</span><span class="n">messages</span> <span class="o">==</span> <span class="p">[(</span><span class="n">test_now</span><span class="p">,</span> <span class="n">test_message</span><span class="p">)]</span>
</code></pre></div>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/fileinfo/tree/correct-patching">correct-patching</a></p>
<p>If you run the test now, you can see that the patching works. What we did was to inject our mock in <code>fileinfo.logger.datetime.datetime</code> instead of <code>datetime.datetime.now</code>. Two things changed, thus, in our test. First, we are patching the module imported in the file <code>logger.py</code> and not the module provided globally by the Python interpreter. Second, we have to patch the whole module because this is what is imported by the file <code>logger.py</code>. If you try to patch <code>fileinfo.logger.datetime.datetime.now</code> you will find that it is still immutable.</p>
<p>Another possible solution to this problem is to create a function that invokes the immutable object and returns its value. This last function can be easily patched, because it just uses the builtin objects and thus is not immutable. This solution, however, requires changing the source code to allow testing, which is far from being optimal. Obviously it is better to introduce a small change in the code and have it tested than to leave it untested, but whenever is possible I try as much as possible to avoid solutions that introduce code which wouldn't be required without tests.</p>
<h2 id="mocks-and-proper-tdd">Mocks and proper TDD<a class="headerlink" href="#mocks-and-proper-tdd" title="Permanent link">¶</a></h2>
<p>Following a strict TDD methodology means writing a test before writing the code that passes that test. This can be done because we use the object under test as a black box, interacting with it through its API, and thus not knowing anything of its internal structure.</p>
<p>When we mock systems we break this assumption. In particular we need to open the black box every time we need to patch an hardcoded external system. Let's say, for example, that the object under test creates a temporary directory to perform some data processing. This is a detail of the implementation and we are not supposed to know it while testing the object, but since we need to mock the file creation to avoid interaction with the external system (storage) we need to become aware of what happens internally.</p>
<p>This also means that writing a test for the object before writing the implementation of the object itself is difficult. Pretty often, thus, such objects are built with TDD but iteratively, where mocks are introduced after the code has been written.</p>
<p>While this is a violation of the strict TDD methodology, I don't consider it a bad practice. TDD helps us to write better code consistently, but good code can be written even without tests. The real outcome of TDD is a test suite that is capable of detecting regressions or the removal of important features in the future. This means that breaking strict TDD for a small part of the code (patching objects) will not affect the real result of the process, only change the way we achieve it.</p>
<h2 id="a-warning">A warning<a class="headerlink" href="#a-warning" title="Permanent link">¶</a></h2>
<p>Mocks are a good way to approach parts of the system that are not under test but that are still part of the code that we are running. This is particularly true for parts of the code that we wrote, which internal structure is ultimately known. When the external system is complex and completely detached from our code, mocking starts to become complicated and the risk is that we spend more time faking parts of the system than actually writing code.</p>
<p>In this cases we definitely crossed the barrier between unit testing and integration testing. You may see mocks as the bridge between the two, as they allow you to keep unit-testing parts that are naturally connected ("integrated") with external systems, but there is a point where you need to recognise that you need to change approach.</p>
<p>This threshold is not fixed, and I can't give you a rule to recognise it, but I can give you some advice. First of all keep an eye on how many things you need to mock to make a test run, as an increasing number of mocks in a single test is definitely a sign of something wrong in the testing approach. My rule of thumb is that when I have to create more than 3 mocks, an alarm goes off in my mind and I start questioning what I am doing.</p>
<p>The second advice is to always consider the complexity of the mocks. You may find yourself patching a class but then having to create monsters like <code>cls_mock().func1().func2().func3.assert_called_with(x=42)</code> which is a sign that the part of the system that you are mocking is deep into some code that you cannot really access, because you don't know it's internal mechanisms.</p>
<p>The third advice is to consider mocks as "hooks" that you throw at the external system, and that break its hull to reach its internal structure. These hooks are obviously against the assumption that we can interact with a system knowing only its external behaviour, or its API. As such, you should keep in mind that each mock you create is a step back from this perfect assumption, thus "breaking the spell" of the decoupled interaction. Doing this makes it increasingly complex to create mocks, and this will contribute to keep you aware of what you are doing (or overdoing).</p>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>Mocks are a very powerful tool that allows us to test code that contains outgoing messages. In particular they allow us to test the arguments of outgoing commands. Patching is a good way to overcome the fact that some external components are hardcoded in our code and are thus unreachable through the arguments passed to the classes or the methods under analysis.</p>
<h2 id="updates">Updates<a class="headerlink" href="#updates" title="Permanent link">¶</a></h2>
<p>2021-03-06 GitHub user <a href="https://github.com/4myhw">4myhw</a> spotted an inconsistency between the code on GitHub and the code in the post. Thanks!</p>
<p>2022-11-19 GitHub user <a href="https://github.com/rioj7">rioj7</a> found and corrected a typo. Thanks!</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>TDD in Python with pytest - Part 42020-09-17T11:30:00+02:002020-09-17T11:30:00+02:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-09-17:/blog/2020/09/17/tdd-in-python-with-pytest-part-4/<p>This is the fourth post in the series "TDD in Python with pytest" where I develop a simple project following a strict TDD methodology. The posts come from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a> and have been reviewed to get rid of some bad naming choices of the version published in the book.</p>
<p>You can find the first post <a href="https://www.thedigitalcatonline.com/blog/2020/09/10/tdd-in-python-with-pytest-part-1/">here</a>.</p>
<p>In this post I will discuss a very interesting and useful testing tool: mocks.</p>
<h2 id="basic-concepts">Basic concepts<a class="headerlink" href="#basic-concepts" title="Permanent link">¶</a></h2>
<p>As we saw in the previous post the relationship between the component that we are testing and other components of the system can be complex. Sometimes idempotency and isolation are not easy to achieve, and testing outgoing commands requires to check the parameters sent to the external component, which is not trivial.</p>
<p>The main difficulty comes from the fact that your code is actually using the external system. When you run it in production the external system will provide the data that your code needs and the whole process can work as intended. During testing, however, you don't want to be bound to the external system, for the reasons explained in the previous post, but at the same time you need it to make your code work.</p>
<p>So, you face a complex issue. On the one hand your code is connected to the external system (be it hardcoded or chosen programmatically), but on the other hand you want it to run without the external system being active (or even present).</p>
<p>This problem can be solved with the use of mocks. A mock, in the testing jargon, is an object that simulates the behaviour of another (more complex) object. Wherever your code connects to an external system, during testing you can replace the latter with a mock, pretending the external system is there and properly checking that your component behaves like intended.</p>
<h2 id="first-steps">First steps<a class="headerlink" href="#first-steps" title="Permanent link">¶</a></h2>
<p>Let us try and work with a mock in Python and see what it can do. First of all fire up a Python shell and import the library </p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">mock</span>
</code></pre></div>
<p>The main object that the library provides is <code>Mock</code> and you can instantiate it without any argument</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">m</span> <span class="o">=</span> <span class="n">mock</span><span class="o">.</span><span class="n">Mock</span><span class="p">()</span>
</code></pre></div>
<p>This object has the peculiar property of creating methods and attributes on the fly when you require them. Let us first look inside the object to get an idea of what it provides</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="nb">dir</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>
<span class="p">[</span>
<span class="s1">'assert_any_call'</span><span class="p">,</span> <span class="s1">'assert_called_once_with'</span><span class="p">,</span>
<span class="s1">'assert_called_with'</span><span class="p">,</span> <span class="s1">'assert_has_calls'</span><span class="p">,</span>
<span class="s1">'attach_mock'</span><span class="p">,</span> <span class="s1">'call_args'</span><span class="p">,</span> <span class="s1">'call_args_list'</span><span class="p">,</span>
<span class="s1">'call_count'</span><span class="p">,</span> <span class="s1">'called'</span><span class="p">,</span> <span class="s1">'configure_mock'</span><span class="p">,</span>
<span class="s1">'method_calls'</span><span class="p">,</span> <span class="s1">'mock_add_spec'</span><span class="p">,</span> <span class="s1">'mock_calls'</span><span class="p">,</span>
<span class="s1">'reset_mock'</span><span class="p">,</span> <span class="s1">'return_value'</span><span class="p">,</span> <span class="s1">'side_effect'</span>
<span class="p">]</span>
</code></pre></div>
<p>As you can see there are some methods which are already defined into the object <code>Mock</code>. Let's try to read a non-existent attribute</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span>
<span class="o"><</span><span class="n">Mock</span> <span class="n">name</span><span class="o">=</span><span class="s1">'mock.some_attribute'</span> <span class="nb">id</span><span class="o">=</span><span class="s1">'140222043808432'</span><span class="o">></span>
<span class="o">>>></span> <span class="nb">dir</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>
<span class="p">[</span>
<span class="s1">'assert_any_call'</span><span class="p">,</span> <span class="s1">'assert_called_once_with'</span><span class="p">,</span>
<span class="s1">'assert_called_with'</span><span class="p">,</span> <span class="s1">'assert_has_calls'</span><span class="p">,</span>
<span class="s1">'attach_mock'</span><span class="p">,</span> <span class="s1">'call_args'</span><span class="p">,</span> <span class="s1">'call_args_list'</span><span class="p">,</span>
<span class="s1">'call_count'</span><span class="p">,</span> <span class="s1">'called'</span><span class="p">,</span> <span class="s1">'configure_mock'</span><span class="p">,</span>
<span class="s1">'method_calls'</span><span class="p">,</span> <span class="s1">'mock_add_spec'</span><span class="p">,</span> <span class="s1">'mock_calls'</span><span class="p">,</span>
<span class="s1">'reset_mock'</span><span class="p">,</span> <span class="s1">'return_value'</span><span class="p">,</span> <span class="s1">'side_effect'</span><span class="p">,</span>
<span class="s1">'some_attribute'</span>
<span class="p">]</span>
</code></pre></div>
<p>As you can see this class is somehow different from what you are used to. First of all, its instances do not raise an <code>AttributeError</code> when asked for a non-existent attribute, but they happily return another instance of <code>Mock</code> itself. Second, the attribute you tried to access has now been created inside the object and accessing it returns the same mock object as before.</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span>
<span class="o"><</span><span class="n">Mock</span> <span class="n">name</span><span class="o">=</span><span class="s1">'mock.some_attribute'</span> <span class="nb">id</span><span class="o">=</span><span class="s1">'140222043808432'</span><span class="o">></span>
</code></pre></div>
<p>Mock objects are callables, which means that they may act both as attributes and as methods. If you try to call the mock, it just returns another mock with a name that includes parentheses to signal its callable nature</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="o"><</span><span class="n">Mock</span> <span class="n">name</span><span class="o">=</span><span class="s1">'mock.some_attribute()'</span> <span class="nb">id</span><span class="o">=</span><span class="s1">'140247621475856'</span><span class="o">></span>
</code></pre></div>
<p>As you can understand, such objects are the perfect tool to mimic other objects or systems, since they may expose any API without raising exceptions. To use them in tests, however, we need them to behave just like the original, which implies returning sensible values or performing real operations.</p>
<h2 id="simple-return-values">Simple return values<a class="headerlink" href="#simple-return-values" title="Permanent link">¶</a></h2>
<p>The simplest thing a mock can do for you is to return a given value every time you call one of its methods. This is configured setting the attribute <code>return_value</code> of a mock object</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="mi">42</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="mi">42</span>
</code></pre></div>
<p>Now, as you can see the object does not return a mock object any more, instead it just returns the static value stored in the attribute <code>return_value</code>. Since in Python everything is an object you can return here any type of value: simple types like an integer of a string, more complex structures like dictionaries or lists, classes that you defined, instances of those, or functions.</p>
<p>Pay attention that what the mock returns is exactly the object that it is instructed to use as return value. If the return value is a callable such as a function, calling the mock will return the function itself and not the result of the function. Let me give you an example</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="k">def</span> <span class="nf">print_answer</span><span class="p">():</span>
<span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="s2">"42"</span><span class="p">)</span>
<span class="o">...</span>
<span class="o">>>></span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">print_answer</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="o"><</span><span class="n">function</span> <span class="n">print_answer</span> <span class="n">at</span> <span class="mh">0x7f8df1e3f400</span><span class="o">></span>
</code></pre></div>
<p>As you can see calling <code>some_attribute</code> just returns the value stored in <code>return_value</code>, that is the function itself. This is not exactly what we were aiming for. To make the mock call the object that we use as a return value we have to use a slightly more complex attribute called <code>side_effect</code>.</p>
<h2 id="complex-return-values">Complex return values<a class="headerlink" href="#complex-return-values" title="Permanent link">¶</a></h2>
<p>The <code>side_effect</code> parameter of mock objects is a very powerful tool. It accepts three different flavours of objects: callables, iterables, and exceptions, and changes its behaviour accordingly.</p>
<p>If you pass an exception the mock will raise it</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="o">.</span><span class="n">side_effect</span> <span class="o">=</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'A custom value error'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
<span class="n">File</span> <span class="s2">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="n">File</span> <span class="s2">"/usr/lib/python3.6/unittest/mock.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">939</span><span class="p">,</span> <span class="ow">in</span> <span class="fm">__call__</span>
<span class="k">return</span> <span class="n">_mock_self</span><span class="o">.</span><span class="n">_mock_call</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">File</span> <span class="s2">"/usr/lib/python3.6/unittest/mock.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">995</span><span class="p">,</span> <span class="ow">in</span> <span class="n">_mock_call</span>
<span class="k">raise</span> <span class="n">effect</span>
<span class="ne">ValueError</span><span class="p">:</span> <span class="n">A</span> <span class="n">custom</span> <span class="n">value</span> <span class="n">error</span>
</code></pre></div>
<p>If you pass an iterable, such as for example a generator, a plain list, tuple, or similar objects, the mock will yield the values of that iterable, i.e. return every value contained in the iterable on subsequent calls of the mock.</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="o">.</span><span class="n">side_effect</span> <span class="o">=</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="mi">0</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="mi">1</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="mi">2</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
<span class="n">File</span> <span class="s2">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="n">File</span> <span class="s2">"/usr/lib/python3.6/unittest/mock.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">939</span><span class="p">,</span> <span class="ow">in</span> <span class="fm">__call__</span>
<span class="k">return</span> <span class="n">_mock_self</span><span class="o">.</span><span class="n">_mock_call</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">File</span> <span class="s2">"/usr/lib/python3.6/unittest/mock.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">998</span><span class="p">,</span> <span class="ow">in</span> <span class="n">_mock_call</span>
<span class="n">result</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="n">effect</span><span class="p">)</span>
<span class="ne">StopIteration</span>
</code></pre></div>
<p>As promised, the mock just returns every object found in the iterable (in this case a <code>range</code> object) one at a time until the generator is exhausted. According to the iterator protocol once every item has been returned the object raises the <code>StopIteration</code> exception, which means that you can safely use it in a loop.</p>
<p>Last, if you feed <code>side_effect</code> a callable, the latter will be executed with the parameters passed when calling the attribute. Let's consider again the simple example given in the previous section</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="k">def</span> <span class="nf">print_answer</span><span class="p">():</span>
<span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="s2">"42"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="o">.</span><span class="n">side_effect</span> <span class="o">=</span> <span class="n">print_answer</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="mi">42</span>
</code></pre></div>
<p>A slightly more complex example is that of a function with arguments</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="k">def</span> <span class="nf">print_number</span><span class="p">(</span><span class="n">num</span><span class="p">):</span>
<span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="s2">"Number:"</span><span class="p">,</span> <span class="n">num</span><span class="p">)</span>
<span class="o">...</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="o">.</span><span class="n">side_effect</span> <span class="o">=</span> <span class="n">print_number</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="n">Number</span><span class="p">:</span> <span class="mi">5</span>
</code></pre></div>
<p>As you can see the arguments passed to the attribute are directly used as arguments for the stored function. This is very powerful, especially if you stop thinking about "functions" and start considering "callables". Indeed, given the nature of Python objects we know that instantiating an object is not different from calling a function, which means that <code>side_effect</code> can be given a class and return a instance of it</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="k">class</span> <span class="nc">Number</span><span class="p">:</span>
<span class="o">...</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="o">...</span> <span class="bp">self</span><span class="o">.</span><span class="n">_value</span> <span class="o">=</span> <span class="n">value</span>
<span class="o">...</span> <span class="k">def</span> <span class="nf">print_value</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="s2">"Value:"</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_value</span><span class="p">)</span>
<span class="o">...</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="o">.</span><span class="n">side_effect</span> <span class="o">=</span> <span class="n">Number</span>
<span class="o">>>></span> <span class="n">n</span> <span class="o">=</span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">(</span><span class="mi">26</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">n</span>
<span class="o"><</span><span class="n">__main__</span><span class="o">.</span><span class="n">Number</span> <span class="nb">object</span> <span class="n">at</span> <span class="mh">0x7f8df1aa4470</span><span class="o">></span>
<span class="o">>>></span> <span class="n">n</span><span class="o">.</span><span class="n">print_value</span><span class="p">()</span>
<span class="n">Value</span><span class="p">:</span> <span class="mi">26</span>
</code></pre></div>
<h2 id="asserting-calls">Asserting calls<a class="headerlink" href="#asserting-calls" title="Permanent link">¶</a></h2>
<p>As I explained in the previous post outgoing commands shall be tested checking the correctness of the message argument. This can be easily done with mocks, as these objects record every call that they receive and the arguments passed to it.</p>
<p>Let's see a practical example</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">mock</span>
<span class="kn">import</span> <span class="nn">myobj</span>
<span class="k">def</span> <span class="nf">test_connect</span><span class="p">():</span>
<span class="n">external_obj</span> <span class="o">=</span> <span class="n">mock</span><span class="o">.</span><span class="n">Mock</span><span class="p">()</span>
<span class="n">myobj</span><span class="o">.</span><span class="n">MyObj</span><span class="p">(</span><span class="n">external_obj</span><span class="p">)</span>
<span class="n">external_obj</span><span class="o">.</span><span class="n">connect</span><span class="o">.</span><span class="n">assert_called_with</span><span class="p">()</span>
</code></pre></div>
<p>Here, the class <code>myobj.MyObj</code> needs to connect to an external object, for example a remote repository or a database. The only thing we need to know for testing purposes is if the class called the method <code>connect</code> of the external object without any parameter.</p>
<p>So the first thing we do in this test is to instantiate the mock object. This is a fake version of the external object, and its only purpose is to accept calls from the object <code>MyObj</code> under test and possibly return sensible values. Then we instantiate the class <code>MyObj</code> passing the external object. We expect the class to call the method <code>connect</code> so we express this expectation calling <code>external_obj.connect.assert_called_with</code>.</p>
<p>What happens behind the scenes? The class <code>MyObj</code> receives the fake external object and somewhere in its initialization process calls the method <code>connect</code> of the mock object. This call creates the method itself as a mock object. This new mock records the parameters used to call it and the subsequent call to its method <code>assert_called_with</code> checks that the method was called and that no parameters were passed.</p>
<p>In this case an object like</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">MyObj</span><span class="p">():</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">repo</span><span class="p">):</span>
<span class="n">repo</span><span class="o">.</span><span class="n">connect</span><span class="p">()</span>
</code></pre></div>
<p>would pass the test, as the object passed as <code>repo</code> is a mock that does nothing but record the calls. As you can see, the method <code>__init__</code> actually calls <code>repo.connect</code>, and <code>repo</code> is expected to be a full-featured external object that provides <code>connect</code> in its API. Calling <code>repo.connect</code> when <code>repo</code> is a mock object, instead, silently creates the method (as another mock object) and records that the method has been called once without arguments.</p>
<p>The method <code>assert_called_with</code> allows us to also check the parameters we passed when calling. To show this let us pretend that we expect the method <code>MyObj.setup</code> to call <code>setup(cache=True, max_connections=256)</code> on the external object. Remember that this is an outgoing command, so we are interested in checking the parameters and not the result.</p>
<p>The new test can be something like</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_setup</span><span class="p">():</span>
<span class="n">external_obj</span> <span class="o">=</span> <span class="n">mock</span><span class="o">.</span><span class="n">Mock</span><span class="p">()</span>
<span class="n">obj</span> <span class="o">=</span> <span class="n">myobj</span><span class="o">.</span><span class="n">MyObj</span><span class="p">(</span><span class="n">external_obj</span><span class="p">)</span>
<span class="n">obj</span><span class="o">.</span><span class="n">setup</span><span class="p">()</span>
<span class="n">external_obj</span><span class="o">.</span><span class="n">setup</span><span class="o">.</span><span class="n">assert_called_with</span><span class="p">(</span><span class="n">cache</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">max_connections</span><span class="o">=</span><span class="mi">256</span><span class="p">)</span>
</code></pre></div>
<p>In this case an object that passes the test can be</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">MyObj</span><span class="p">():</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">repo</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_repo</span> <span class="o">=</span> <span class="n">repo</span>
<span class="n">repo</span><span class="o">.</span><span class="n">connect</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">setup</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_repo</span><span class="o">.</span><span class="n">setup</span><span class="p">(</span><span class="n">cache</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">max_connections</span><span class="o">=</span><span class="mi">256</span><span class="p">)</span>
</code></pre></div>
<p>If we change the method <code>setup</code> to</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">setup</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_repo</span><span class="o">.</span><span class="n">setup</span><span class="p">(</span><span class="n">cache</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</code></pre></div>
<p>the test will fail with the following error</p>
<div class="highlight"><pre><span></span><code>E AssertionError: Expected call: setup(cache=True, max_connections=256)
E Actual call: setup(cache=True)
</code></pre></div>
<p>Which I consider a very clear explanation of what went wrong during the test execution.</p>
<p>As you can read in the official documentation, the object <code>Mock</code> provides other methods and attributes, like <code>assert_called_once_with</code>, <code>assert_any_call</code>, <code>assert_has_calls</code>, <code>assert_not_called</code>, <code>called</code>, <code>call_count</code>, and many others. Each of those explores a different aspect of the mock behaviour concerning calls. Make sure to read their description and go through the examples.</p>
<h2 id="a-simple-example">A simple example<a class="headerlink" href="#a-simple-example" title="Permanent link">¶</a></h2>
<p>To learn how to use mocks in a practical case, let's work together on a new module in the <code>simple_calculator</code> package. The target is to write a class that downloads a JSON file with data on meteorites and computes some statistics on the dataset using the class <code>SimpleCalculator</code>. The file is provided by NASA at <a href="https://data.nasa.gov/resource/y77d-th95.json">this URL</a>.</p>
<p>The class contains a method <code>get_data</code> that queries the remote server and returns the data, and a method <code>average_mass</code> that uses the method <code>SimpleCalculator.avg</code> to compute the average mass of the meteorites and return it. In a real world case, like for example in a scientific application, I would probably split the class in two. One class manages the data, updating it whenever it is necessary, and another one manages the statistics. For the sake of simplicity, however, I will keep the two functionalities together in this example.</p>
<p>Let's see a quick example of what is supposed to happen inside our code. An excerpt of the file provided from the server is</p>
<div class="highlight"><pre><span></span><code><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"fall"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Fell"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"geolocation"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Point"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"coordinates"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="mf">6.08333</span><span class="p">,</span><span class="w"> </span><span class="mf">50.775</span><span class="p">]</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nt">"id"</span><span class="p">:</span><span class="s2">"1"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"mass"</span><span class="p">:</span><span class="s2">"21"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="s2">"Aachen"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"nametype"</span><span class="p">:</span><span class="s2">"Valid"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"recclass"</span><span class="p">:</span><span class="s2">"L5"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"reclat"</span><span class="p">:</span><span class="s2">"50.775000"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"reclong"</span><span class="p">:</span><span class="s2">"6.083330"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"year"</span><span class="p">:</span><span class="s2">"1880-01-01T00:00:00.000"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"fall"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Fell"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"geolocation"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Point"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"coordinates"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="mf">10.23333</span><span class="p">,</span><span class="w"> </span><span class="mf">56.18333</span><span class="p">]</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nt">"id"</span><span class="p">:</span><span class="s2">"2"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"mass"</span><span class="p">:</span><span class="s2">"720"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="s2">"Aarhus"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"nametype"</span><span class="p">:</span><span class="s2">"Valid"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"recclass"</span><span class="p">:</span><span class="s2">"H6"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"reclat"</span><span class="p">:</span><span class="s2">"56.183330"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"reclong"</span><span class="p">:</span><span class="s2">"10.233330"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"year"</span><span class="p">:</span><span class="s2">"1951-01-01T00:00:00.000"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">]</span>
</code></pre></div>
<p>So a good way to compute the average mass of the meteorites is</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">urllib.request</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">simple_calculator.main</span> <span class="kn">import</span> <span class="n">SimpleCalculator</span>
<span class="n">URL</span> <span class="o">=</span> <span class="p">(</span><span class="s2">"https://data.nasa.gov/resource/y77d-th95.json"</span><span class="p">)</span>
<span class="k">with</span> <span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">URL</span><span class="p">)</span> <span class="k">as</span> <span class="n">url</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">url</span><span class="o">.</span><span class="n">read</span><span class="p">()</span><span class="o">.</span><span class="n">decode</span><span class="p">())</span>
<span class="n">masses</span> <span class="o">=</span> <span class="p">[</span><span class="nb">float</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'mass'</span><span class="p">])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="s1">'mass'</span> <span class="ow">in</span> <span class="n">d</span><span class="p">]</span>
<span class="nb">print</span><span class="p">(</span><span class="n">masses</span><span class="p">)</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">avg_mass</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">(</span><span class="n">masses</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">avg_mass</span><span class="p">)</span>
</code></pre></div>
<p>Where the list comprehension filters out those elements which do not have a attribute <code>mass</code>. This code returns the value 50190.19568930039, so that is the average mass of the meteorites contained in the file.</p>
<p>Now we have a proof of concept of the algorithm, so we can start writing the tests. We might initially come up with a simple solution like</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_average_mass</span><span class="p">():</span>
<span class="n">metstats</span> <span class="o">=</span> <span class="n">MeteoriteStats</span><span class="p">()</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">metstats</span><span class="o">.</span><span class="n">get_data</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">metstats</span><span class="o">.</span><span class="n">average_mass</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o">==</span> <span class="mf">50190.19568930039</span>
</code></pre></div>
<p>This little test contains, however, two big issues. First of all the method <code>get_data</code> is supposed to use the Internet connection to get the data from the server. This is a typical example of an outgoing query, as we are not trying to change the state of the web server providing the data. You already know that you should not test the return value of an outgoing query, but you can see here why you shouldn't use real data when testing either. The data coming from the server can change in time, and this can invalidate your tests. </p>
<p>Testing such a case becomes very simple with mocks. Since the class has a public method <code>get_data</code> that interacts with the external component, it is enough to temporarily replace it with a mock that provides sensible values. Create the file <code>tests/test_meteorites.py</code> and put this code in it</p>
<div class="highlight"><span class="filename">tests/test_meteorites.py</span><pre><span></span><code><span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">mock</span>
<span class="kn">from</span> <span class="nn">simple_calculator.meteorites</span> <span class="kn">import</span> <span class="n">MeteoriteStats</span>
<span class="k">def</span> <span class="nf">test_average_mass</span><span class="p">():</span>
<span class="n">metstats</span> <span class="o">=</span> <span class="n">MeteoriteStats</span><span class="p">()</span>
<span class="n">metstats</span><span class="o">.</span><span class="n">get_data</span> <span class="o">=</span> <span class="n">mock</span><span class="o">.</span><span class="n">Mock</span><span class="p">()</span>
<span class="n">metstats</span><span class="o">.</span><span class="n">get_data</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"fall"</span><span class="p">:</span> <span class="s2">"Fell"</span><span class="p">,</span>
<span class="s2">"geolocation"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"Point"</span><span class="p">,</span>
<span class="s2">"coordinates"</span><span class="p">:</span> <span class="p">[</span><span class="mf">6.08333</span><span class="p">,</span> <span class="mf">50.775</span><span class="p">]</span>
<span class="p">},</span>
<span class="s2">"id"</span><span class="p">:</span><span class="s2">"1"</span><span class="p">,</span>
<span class="s2">"mass"</span><span class="p">:</span><span class="s2">"21"</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span><span class="s2">"Aachen"</span><span class="p">,</span>
<span class="s2">"nametype"</span><span class="p">:</span><span class="s2">"Valid"</span><span class="p">,</span>
<span class="s2">"recclass"</span><span class="p">:</span><span class="s2">"L5"</span><span class="p">,</span>
<span class="s2">"reclat"</span><span class="p">:</span><span class="s2">"50.775000"</span><span class="p">,</span>
<span class="s2">"reclong"</span><span class="p">:</span><span class="s2">"6.083330"</span><span class="p">,</span>
<span class="s2">"year"</span><span class="p">:</span><span class="s2">"1880-01-01T00:00:00.000"</span><span class="p">},</span>
<span class="p">{</span>
<span class="s2">"fall"</span><span class="p">:</span> <span class="s2">"Fell"</span><span class="p">,</span>
<span class="s2">"geolocation"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"Point"</span><span class="p">,</span>
<span class="s2">"coordinates"</span><span class="p">:</span> <span class="p">[</span><span class="mf">10.23333</span><span class="p">,</span> <span class="mf">56.18333</span><span class="p">]</span>
<span class="p">},</span>
<span class="s2">"id"</span><span class="p">:</span><span class="s2">"2"</span><span class="p">,</span>
<span class="s2">"mass"</span><span class="p">:</span><span class="s2">"720"</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span><span class="s2">"Aarhus"</span><span class="p">,</span>
<span class="s2">"nametype"</span><span class="p">:</span><span class="s2">"Valid"</span><span class="p">,</span>
<span class="s2">"recclass"</span><span class="p">:</span><span class="s2">"H6"</span><span class="p">,</span>
<span class="s2">"reclat"</span><span class="p">:</span><span class="s2">"56.183330"</span><span class="p">,</span>
<span class="s2">"reclong"</span><span class="p">:</span><span class="s2">"10.233330"</span><span class="p">,</span>
<span class="s2">"year"</span><span class="p">:</span><span class="s2">"1951-01-01T00:00:00.000"</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">metstats</span><span class="o">.</span><span class="n">average_mass</span><span class="p">(</span><span class="n">metstats</span><span class="o">.</span><span class="n">get_data</span><span class="p">())</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">370.5</span>
</code></pre></div>
<p>When we run this test we are not testing that the external server provides the correct data. We are testing the process implemented by <code>average_mass</code>, feeding the algorithm some known input. This is not different from the first tests that we implemented: in that case we were testing an addition, here we are testing a more complex algorithm, but the concept is the same.</p>
<p>We can now write a class that passes this test. Put the following code in <code>simple_calculator/meteorites.py</code> alongside with <code>main.py</code></p>
<div class="highlight"><span class="filename">simple_calculator/meteorites.py</span><pre><span></span><code><span class="kn">import</span> <span class="nn">urllib.request</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">simple_calculator.main</span> <span class="kn">import</span> <span class="n">SimpleCalculator</span>
<span class="n">URL</span> <span class="o">=</span> <span class="p">(</span><span class="s2">"https://data.nasa.gov/resource/y77d-th95.json"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">MeteoriteStats</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">get_data</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">URL</span><span class="p">)</span> <span class="k">as</span> <span class="n">url</span><span class="p">:</span>
<span class="k">return</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">url</span><span class="o">.</span><span class="n">read</span><span class="p">()</span><span class="o">.</span><span class="n">decode</span><span class="p">())</span>
<span class="k">def</span> <span class="nf">average_mass</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">masses</span> <span class="o">=</span> <span class="p">[</span><span class="nb">float</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'mass'</span><span class="p">])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="s1">'mass'</span> <span class="ow">in</span> <span class="n">d</span><span class="p">]</span>
<span class="k">return</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">(</span><span class="n">masses</span><span class="p">)</span>
</code></pre></div>
<p>As you can see the class contains the code we wrote as a proof of concept, slightly reworked to match the methods we used in the test. Run the test suite now, and you will see that the latest test we wrote passes.</p>
<p>Please note that we are not testing the method <code>get_data</code>. That method uses the function <code>urllib.request.urlopen</code> that opens an Internet connection without passing through any other public object that we can replace at run time during the test. We need then a tool to replace internal parts of our objects when we run them, and this is provided by patching, which will be the topic of the next post.</p>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/meteoritestats-class">meteoritestats-class-added</a></p>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>Mocks are very important, and as a Python programmer you need to know the subtleties of their implementation. Aside from the technical details, however, I believe it is mandatory to master the different types of tests that I discussed in the previous post, and to learn when to use simple assertions and when to pull a bigger gun like a mock object.</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>TDD in Python with pytest - Part 32020-09-15T08:00:00+02:002020-09-15T08:00:00+02:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-09-15:/blog/2020/09/15/tdd-in-python-with-pytest-part-3/<p>This is the third post in the series "TDD in Python from scratch" where I develop a simple project following a strict TDD methodology. The posts come from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a> and have been reviewed to get rid of some bad naming choices of the version published in the book.</p>
<p>What I introduced in the previous two posts is commonly called "unit testing", since it focuses on testing a single and very small unit of code. As simple as it may seem, the TDD process has some caveats that are worth being discussed. In this chapter I discuss some aspects of TDD and unit testing that I consider extremely important.</p>
<h2 id="tests-should-be-fast">Tests should be fast<a class="headerlink" href="#tests-should-be-fast" title="Permanent link">¶</a></h2>
<p>You will run your tests many times, potentially you should run them every time you save your code. Your tests are the watchdogs of your code, the dashboard warning lights that signal a correct status or some malfunction. This means that your testing suite should be <em>fast</em>. If you have to wait minutes for each execution to finish, chances are that you will end up running your tests only after some long coding session, which means that you are not using them as guides.</p>
<p>It's true however that some tests may be intrinsically slow, or that the test suite might be so big that running it would take an amount of time which makes continuous testing uncomfortable. In this case you should identify a subset of tests that run quickly and that can show you if something is not working properly, the so-called "smoke tests", and leave the rest of the suite for longer executions that you run less frequently. Typically, the library part of your project has tests that run very quickly, as testing functions does not require specific set-ups, while the user interface tests (be it a CLI or a GUI) are usually slower. If your tests are well-structured you can also run just the tests that are connected with the subsystem that you are dealing with.</p>
<h2 id="tests-should-be-idempotent">Tests should be idempotent<a class="headerlink" href="#tests-should-be-idempotent" title="Permanent link">¶</a></h2>
<p><em>Idempotency</em> in mathematics and computer science identifies processes that can be run multiple times without changing the status of the system. Since this latter doesn't change, the tests can be run in whichever order without changing their results. If a test interacts with an external system leaving it in a different state you will have random failures depending on the execution order.</p>
<p>The typical example is when you interact with the filesystem in your tests. A test may create a file and not remove it, and this makes another test fail because the file already exists, or because the directory is not empty. Whatever you do while interacting with external systems has to be reverted after the test. If you run your tests concurrently, however, even this precaution is not enough.</p>
<p>This poses a big problem, as interacting with external systems is definitely to be considered dangerous. Mocks, introduced in the next chapter, are a very good tool to deal with this aspect of testing.</p>
<h2 id="tests-should-be-isolated">Tests should be isolated<a class="headerlink" href="#tests-should-be-isolated" title="Permanent link">¶</a></h2>
<p>In computer science <em>isolation</em> means that a component shall not change its behaviour depending on something that happens externally. In particular it shouldn't be affected by the execution of other components in the system (spatial isolation) and by the previous execution of the component itself (temporal isolation). Each test should run as much as possible in an isolated universe.</p>
<p>While this is easy to achieve for small components, like we did with the class <code>SimpleCalculator</code>, it might be almost impossible to do in more complex cases. Whenever you write a routine that deals with time, for example, be it the current date or a time interval, you are faced with something that flows incessantly and that cannot be stopped or slowed down. This is also true in other cases, for example if you are testing a routine that accesses an external service like a website. If the website is not reachable the test will fail, but this failure comes from an external source, not from the code under test.</p>
<p>Mocks or fake objects are a good tool to enforce isolation in tests that need to communicate with external actors in the system.</p>
<h2 id="external-systems">External systems<a class="headerlink" href="#external-systems" title="Permanent link">¶</a></h2>
<p>It is important to understand that the above definitions (idempotency, isolation) depend on the scope of the test. You should consider <em>external</em> whatever part of the system is not directly involved in the test, even though you need to use it to run the test itself. You should also try to reduce the scope of the test as much as possible.</p>
<p>Let me give you an example. Consider a web application and imagine a test that checks that a user can log in. The login process involves many layers: the user inputs, the username and the password in a GUI and submits the form, the GUI communicates with the core of the application that finds the user in the DB and checks the password hash against the one stored there, then sends back a message that grants access to the user, and the GUI stores a cookie to keep the user logged in. Suppose now that the test fails. Where is the error? Is it in the query that retrieves the user from the DB? Or in the routine that hashes the password? Or is it just an issue in the connectivity between the application and the database?</p>
<p>As you can see there are too many possible points of failure. While this is a perfectly valid <em>integration test</em>, it is definitely not a <em>unit test</em>. Unit tests try to test the smallest possible units of code in your system, usually simple routines like functions or object methods. Integration tests, instead, put together whole systems that have already been tested and test that they can work together.</p>
<p>Too many times developers confuse integration tests with unit tests. One simple example: every time a web framework makes you test your models against a real database you are mixing a unit test (the methods of the model object work) with an integration one (the model object connects with the database and can store/retrieve data). You have to learn how to properly identify what is external to your system in the scope of a given test, so your tests can be focused and small.</p>
<h2 id="focus-on-messages">Focus on messages<a class="headerlink" href="#focus-on-messages" title="Permanent link">¶</a></h2>
<p>I will never recommend enough Sandi Metz's talk <a href="https://speakerdeck.com/skmetz/magic-tricks-of-testing-railsconf">"The Magic Tricks of Testing"</a> where she considers the different messages that a software component has to deal with. She comes up with 3 different origins for messages (incoming, sent to self, and outgoing) and 2 types (query and command). The very interesting conclusion she reaches is that you should only test half of them, and I believe this is one of the most useful results you can learn as a software developer. In this section I will shamelessly start from Sandi Metz's categorisations and give a personal view of the matter. I absolutely recommend to watch the original talk as it is both short and very effective.</p>
<p>Testing is all about the behaviour of a component when it is used, i.e. when it is connected to other components that interact with it. This interaction is well represented by the word "message", which has hereafter the simple meaning of "data exchanged between two actors".</p>
<p>We can then classify the interactions happening in our system, and thus to our components, by flow and by type (Sandi Metz speaks of <em>origin</em> and <em>type</em>).</p>
<h3 id="message-flow">Message flow<a class="headerlink" href="#message-flow" title="Permanent link">¶</a></h3>
<p>The flow is defined as the tuple <code>(source, origin)</code>, that is where the message comes from and what is its destination. There are three different combinations that we are interested in: <code>(outside, self)</code>, <code>(self, self)</code>, and <code>(self, outside)</code>, where <code>self</code> is the object we are testing, and <code>outside</code> is a generic object that lives in the system. There is a fourth combination, <code>(outside, outside)</code> that is not relevant for the testing, since it doesn't involve the object under analysis.</p>
<p>So <code>(outside, self)</code> contains all the messages that other parts of the system send to our component. These messages correspond to the public API of the component, that is the set of entry points the component makes available to interact with it. Notable examples are the public methods of an object in an object-oriented programming language or the HTTP endpoints of a Web application. This flow represents the <em>incoming messages</em>.</p>
<p>At the opposite side of the spectrum there is <code>(self, outside)</code>, which is the set of messages that the component under test sends to other parts of the system. These are for example the external calls that an object does to a library or to other objects, or the API of other applications we rely on, like databases or Web applications. This flow describes all the <em>outgoing messages</em>.</p>
<p>Between the two there is <code>(self, self)</code>, which identifies the messages that the component sends to itself, i.e. the use that the component does of its own internal API. This can be the set of private methods of an object or the business logic inside a Web application. The important thing about this last case is that while the component is seen as a black box by the rest of the system it actually has an internal structure and it uses it to run. This flow contains all the <em>private messages</em>.</p>
<h3 id="message-type">Message type<a class="headerlink" href="#message-type" title="Permanent link">¶</a></h3>
<p>Messages can be further divided according to the interaction the source requires to have with the target: <em>queries</em> and <em>commands</em>. Queries are messages that do not change the status of the component, they just extract information. The class <code>SimpleCalculator</code> that we developed in the previous section is a typical example of object that exposes query methods. Adding two numbers doesn't change the status of the object, and you will receive the same answer every time you call the method <code>add</code>.</p>
<p>Commands are the opposite. They do not extract any information, but they change the status of the object. A method of an object that increases an internal counter or a method that adds values to an array are perfect examples of commands.</p>
<p>It's perfectly normal to combine a query and a command in a single message, as long as you are aware that your message is changing the status of the component. Remember that changing the status is something that can have concrete secondary effect.</p>
<h2 id="the-testing-grid">The testing grid<a class="headerlink" href="#the-testing-grid" title="Permanent link">¶</a></h2>
<p>Combining 3 flows and 2 message types we get 6 different message cases that involve the component under testing. For each one of this cases we have to decide how to test the interaction represented by that flow and message type.</p>
<h3 id="incoming-queries">Incoming queries<a class="headerlink" href="#incoming-queries" title="Permanent link">¶</a></h3>
<p>An incoming query is a message that an external actor sends to get a value from your component. Testing this behaviour is straightforward, as you just need to write a test that sends the message and makes an assertion on the returned value. A concrete example of this is what we did to test the method <code>add</code> of <code>SimpleCalculator</code>.</p>
<h3 id="incoming-commands">Incoming commands<a class="headerlink" href="#incoming-commands" title="Permanent link">¶</a></h3>
<p>An incoming command comes from an external actor that wants to change the status of the system. There should be a way for an external actor to check the status, which translates into the need of having either a companion incoming query message that allows to extract the status (or at least the part of the status affected by the command), or the knowledge that the change is going to affect the behaviour of another query. A simple example might be a method that sets the precision (number of digits) of the division in the object <code>SimpleCalculator</code>. Setting that value changes the result of a query, which can be used to test the effect of the incoming command.</p>
<h3 id="private-queries">Private queries<a class="headerlink" href="#private-queries" title="Permanent link">¶</a></h3>
<p>A private query is a message that the component sends to self to get a value without affecting its own state, and it is basically nothing more than an explicit use of some internal logic. This happens often in object-oriented languages because you extracted some common logic from one or more methods of an object and created a private method to avoid duplication.</p>
<p>Since private queries use the internal logic you shouldn't test them. This might be surprising, as private methods are code, and code should be tested, but remember that other methods are calling them, so the effects of that code are not invisible, they are tested by the tests of the public entry points, although indirectly. The only effect you would achieve by testing private methods is to lock the tests to the internal implementation of the component, which by definition shouldn't be used by anyone outside of the component itself. This in turn, makes refactoring painful, because you have to keep redundant tests in sync with the changes that you do, instead of using them as a guide for the code changes like TDD wants you to do.</p>
<p>As Sandi Metz says, however, this is not an inflexible rule. Whenever you see that testing an internal method makes the structure more robust feel free to do it. Be aware that you are locking the implementation, so do it only where it makes a real difference businesswise.</p>
<h3 id="private-commands">Private commands<a class="headerlink" href="#private-commands" title="Permanent link">¶</a></h3>
<p>Private commands shouldn't be treated differently than private queries. They change the status of the component, but this is again part of the internal logic of the component itself, so you shouldn't test private commands either. As stated for private queries, feel free to do it if this makes a real difference.</p>
<h3 id="outgoing-queries-and-commands">Outgoing queries and commands<a class="headerlink" href="#outgoing-queries-and-commands" title="Permanent link">¶</a></h3>
<p>An outgoing query is a message that the component under testing sends to an external actor asking for a value, without changing the status of the actor itself. The correctness of the returned value, given the inputs, is not part of what you want to test, because that is an incoming query for the external actor. Let me repeat this: you don't want to test that the external actor return the correct value given some inputs.</p>
<p>This is perhaps one of the biggest mistakes that programmers make when they test their applications. Definitely it is a mistake that I made many times. We tend to introduce tests that, starting from the code of our component, end up testing different components.</p>
<p>Outgoing commands are messages sent to external actors in order to change their state. Since our component sends such messages to cause an effect in another part of the system we have to be sure that the sent values are correct. We do not want to test that the state of the external actor change accordingly, as this is part of the testing suite of the external actor itself (incoming command).</p>
<p>From this consideration it is evident that you shouldn't test the results of any outgoing query or command. Possibly, you should avoid running them at all, otherwise you will need the external system to be up and running when you run the test suite.</p>
<p>We want to be sure, however, that our component uses the API of the external actor in a proper way and the standard technique to test this is to use mocks, that is components that simulate other components. Mocks are an important tool in the TDD methodology and for this reason they are the topic of the next chapter.</p>
<div class="highlight"><pre><span></span><code>| Flow | Type | Test? |
|----------|---------|-------|
| Incoming | Query | Yes |
| Incoming | Command | Yes |
| Private | Query | Maybe |
| Private | Command | Maybe |
| Outgoing | Query | Mock |
| Outgoing | Command | Mock |
</code></pre></div>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>Since the discovery of TDD few things changed the way I write code more than these considerations on what I am supposed to test. Out of 6 different types of tests we discovered that 2 shouldn't be tested, 2 of them require a very simple technique based on assertions, and the last 2 are the only ones that requires an advanced technique (mocks). This should cheer you up, as for once a good methodology doesn't add new rules and further worries, but removes one third of them, even forbidding you to implement them!</p>
<p>In the next two posts I will discuss mocks and patches, two very important testing tools to have in your belt.</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>TDD in Python with pytest - Part 22020-09-11T10:30:00+02:002023-09-03T19:00:00+02:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-09-11:/blog/2020/09/11/tdd-in-python-with-pytest-part-2/<p>This is the second post in the series <strong>TDD in Python with pytest</strong> where I develop a simple project following a strict TDD methodology. The posts come from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a> and have been reviewed to get rid of some bad naming choices of the version published …</p><p>This is the second post in the series <strong>TDD in Python with pytest</strong> where I develop a simple project following a strict TDD methodology. The posts come from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a> and have been reviewed to get rid of some bad naming choices of the version published in the book.</p><p>You can find the first post <a href="https://www.thedigitalcatonline.com/blog/2020/09/10/tdd-in-python-with-pytest-part-1/">here</a>.</p><h2 id="step-7---division-0afe">Step 7 - Division<a class="headerlink" href="#step-7---division-0afe" title="Permanent link">¶</a></h2><p>The requirements state that there shall be a division function, and that it has to return a float value. This is a simple condition to test, as it is sufficient to divide two numbers that do not give an integer result</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_div_two_numbers_float</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">div</span><span class="p">(</span><span class="mi">13</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">6.5</span>
</pre></div> </div> </div><p>The test suite fails with the usual error that signals a missing method. The implementation of this function is very simple as the operator <code>/</code> in Python performs a float division</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">div</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">/</span> <span class="n">b</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-7-float-division">step-7-float-division</a></p><p>If you run the test suite again all the test should pass. There is a second requirement about this operation, however, that states that division by zero shall return <code>inf</code>.</p><p>I already mentioned in the previous post that this is not a good requirement, and please don't go around telling people that I told you to create function that return either floats or strings. This is a simple requirement that I will use to show you how to deal with exceptions.</p><p>The test that comes from the requirement is simple</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_div_by_zero_returns_inf</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">div</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="nb">float</span><span class="p">(</span><span class="s1">'inf'</span><span class="p">)</span>
</pre></div> </div> </div><p>And the test suite fails now with this message</p><div class="code"><div class="content"><div class="highlight"><pre>__________________________ test_div_by_zero_returns_inf ___________________________
def test_div_by_zero_returns_inf():
calculator = SimpleCalculator()
> result = calculator.div(5, 0)
tests/test_main.py:70:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <simple_calculator.main.SimpleCalculator object at 0x7f0b0b733990>, a = 5, b = 0 <span class="callout">1</span>
def div(self, a, b):
> return a / b
E ZeroDivisionError: division by zero
simple_calculator/main.py:17: ZeroDivisionError
</pre></div> </div> </div><p>Note that when an exception happens in the code and not in the test, the pytest output changes slightly. The first part of the message shows where the test fails, but then there is a second part that shows the internal code that raised the exception and provides information about the value of local variables on the first line <span class="callout">1</span>.</p><p>We might implement two different solutions to satisfy this requirement and its test. The first one is to prevent <code>b</code> to be 0</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">div</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">b</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">float</span><span class="p">(</span><span class="s1">'inf'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">/</span> <span class="n">b</span>
</pre></div> </div> </div><p>and the second one is to intercept the exception with a <code>try/except</code> block</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">div</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">/</span> <span class="n">b</span>
<span class="k">except</span> <span class="ne">ZeroDivisionError</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">float</span><span class="p">(</span><span class="s1">'inf'</span><span class="p">)</span>
</pre></div> </div> </div><p>Both solutions make the test suite pass, so both are correct. I leave to you the decision about which is the best one, syntactically speaking.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-7-division-by-zero">step-7-float-division</a></p><h2 id="step-8---testing-exceptions-ca11">Step 8 - Testing exceptions<a class="headerlink" href="#step-8---testing-exceptions-ca11" title="Permanent link">¶</a></h2><p>A further requirement is that multiplication by zero must raise a <code>ValueError</code> exception. This means that we need a way to test if our code raises an exception, which is the opposite of what we did until now. In the previous tests, the condition to pass was that there was no exception in the code, while in this test the condition will be that an exception has been raised.</p><p>Again, this is a requirement I made up just for the sake of showing you how do deal with exceptions, so if you think this is a silly behaviour for a multiplication function you are probably right.</p><p>Pytest provides a context manager named <code>raises</code> that runs the code contained in it and passes only if the given exception is produced by that code.</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">pytest</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">test_mul_by_zero_raises_exception</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="k">with</span> <span class="n">pytest</span><span class="o">.</span><span class="n">raises</span><span class="p">(</span><span class="ne">ValueError</span><span class="p">):</span>
<span class="n">calculator</span><span class="o">.</span><span class="n">mul</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
</pre></div> </div> </div><p>In this case, thus, pytest runs the line <code>calculator.mul(3, 0)</code>. If the method doesn't raise the exception <code>ValueError</code> the test will fail. Indeed, if you run the test suite now, you will get the following failure</p><div class="code"><div class="content"><div class="highlight"><pre>________________________ test_mul_by_zero_raises_exception ________________________
def test_mul_by_zero_raises_exception():
calculator = SimpleCalculator()
with pytest.raises(ValueError):
> calculator.mul(3, 0)
E Failed: DID NOT RAISE <class 'ValueError'>
tests/test_main.py:81: Failed
</pre></div> </div> </div><p>which signals that the code didn't raise the expected exception.</p><p>The code that makes the test pass needs to test if one of the inputs of the function <code>mul</code> is 0. This can be done with the help of the built-in function <code>all</code>, which accepts an iterable and returns <code>True</code> only if all the values contained in it are <code>True</code>. Since in Python the value <code>0</code> is not true, we may write</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">mul</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">all</span><span class="p">(</span><span class="n">args</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">ValueError</span>
<span class="k">return</span> <span class="n">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="n">x</span><span class="o">*</span><span class="n">y</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span>
</pre></div> </div> </div><p>and make the test suite pass. The condition checks that there are no false values in the tuple <code>args</code>, that is there are no zeros.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-8-multiply-by-zero">step-8-multiply-by-zero</a></p><h2 id="step-9---a-more-complex-set-of-requirements-9dc2">Step 9 - A more complex set of requirements<a class="headerlink" href="#step-9---a-more-complex-set-of-requirements-9dc2" title="Permanent link">¶</a></h2><p>Until now the requirements were pretty simple, and it was was easy to map each of them directly into tests. It's time to try to tackle a more complex problem. The remaining requirements say that the class has to provide a function to compute the average of an iterable, and that this function shall accept two optional upper and lower thresholds to remove outliers.</p><p>Let's break these two requirements into a set of simpler ones</p><ol><li>The function accepts an iterable and computes the average, i.e. <code>avg([2, 5, 12, 98]) == 29.25</code></li><li>The function accepts an optional upper threshold. It must remove all the values that are greater than the threshold before computing the average, i.e. <code>avg([2, 5, 12, 98], ut=90) == avg([2, 5, 12])</code></li><li>The function accepts an optional lower threshold. It must remove all the values that are less then the threshold before computing the average, i.e. <code>avg([2, 5, 12, 98], lt=10) == avg([12, 98])</code></li><li>The upper threshold is not included when removing data, i.e. <code>avg([2, 5, 12, 98], ut=12) == avg([2, 5, 12])</code></li><li>The lower threshold is not included when removing data, i.e. <code>avg([2, 5, 12, 98], lt=5) == avg([5, 12, 98])</code></li><li>The function works with an empty list, returning <code>0</code>, i.e. <code>avg([]) == 0</code></li><li>The function works if the list is empty after outlier removal, i.e. <code>avg([12, 98], lt=15, ut=90) == 0</code></li><li>The function outlier removal works if the list is empty, i.e. <code>avg([], lt=15, ut=90) == 0</code></li></ol><p>As you can see a requirement can produce multiple tests. Some of these are clearly expressed by the requirement (numbers 1, 2, 3), some of these are choices that we make (numbers 4, 5, 6) and can be discussed, some are boundary cases that we have to discover thinking about the problem (numbers 6, 7, 8).</p><p>There is a fourth category of tests, which are the ones that come from bugs that you discover. We will discuss about those later in this chapter.</p><p>Now, if you followed the posts coding along it is time to try to tackle a problem on your own. Why don't you try to go on and implement these features? Each of the eight requirements can be directly mapped into a test, and you know how to write tests and code that passes them. The next steps show my personal solution, which is just one of the possible ones, so you can compare what you did with what I came up with to solve the tests.</p><h3 id="step-9.1---average-of-an-iterable-4522">Step 9.1 - Average of an iterable</h3><p>Let's start adding a test for requirement number 1</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_correct_average</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">98</span><span class="p">])</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">29.25</span>
</pre></div> </div> </div><p>We feed the function <code>avg</code> a list of generic numbers, which average we calculated with an external tool. The first run of the test suite fails with the usual complaint about a missing function, and we can make the test pass with a simple use of <code>sum</code> and <code>len</code>, as both built-in functions work on iterables</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">it</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
</pre></div> </div> </div><p>Here, <code>it</code> stands for iterable, as this function works with anything that supports the loop protocol.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-1-average-of-an-iterable">step-9-1-average-of-an-iterable</a></p><h3 id="step-9.2---upper-threshold-e0a5">Step 9.2 - Upper threshold</h3><p>The second requirement mentions an upper threshold, but we are free with regards to the API, i.e. the requirement doesn't specify how the threshold is supposed to be specified or named. I decided to call the upper threshold parameter <code>ut</code>, so the test becomes</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_removes_upper_outliers</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">98</span><span class="p">],</span> <span class="n">ut</span><span class="o">=</span><span class="mi">90</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="o">.</span><span class="n">approx</span><span class="p">(</span><span class="mf">6.333333</span><span class="p">)</span>
</pre></div> </div> </div><p>As you can see the parameter <code>ut=90</code> is supposed to remove the element <code>98</code> from the list and then compute the average of the remaining elements. Since the result has an infinite number of digits I used the function <code>pytest.approx</code> to check the result.</p><p>The test suite fails because the function <code>avg</code> doesn't accept the parameter <code>ut</code></p><div class="code"><div class="content"><div class="highlight"><pre>_________________________ test_avg_removes_upper_outliers _________________________
def test_avg_removes_upper_outliers():
calculator = SimpleCalculator()
> result = calculator.avg([2, 5, 12, 98], ut=90)
E TypeError: avg() got an unexpected keyword argument 'ut'
tests/test_main.py:95: TypeError
</pre></div> </div> </div><p>There are two problems now that we have to solve, as it happened for the second test we wrote in this project. The new <code>ut</code> argument needs a default value, so we have to manage that case, and then we have to make the upper threshold work. My solution is</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">ut</span><span class="p">:</span>
<span class="n">ut</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">it</span> <span class="k">if</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">ut</span><span class="p">]</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span>
</pre></div> </div> </div><p>The idea here is that <code>ut</code> is used to filter the iterable keeping all the elements that are less than or equal to the threshold. This means that the default value for the threshold has to be neutral with regards to this filtering operation. Using the maximum value of the iterable makes the whole algorithm work in every case, while for example using a big fixed value like <code>9999</code> would introduce a bug, as one of the elements of the iterable might be bigger than that value.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-2-upper-threshold">step-9-2-upper-threshold</a></p><h3 id="step-9.3---lower-threshold-b88a">Step 9.3 - Lower threshold</h3><p>The lower threshold is the mirror of the upper threshold, so it doesn't require many explanations. The test is</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_removes_lower_outliers</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">98</span><span class="p">],</span> <span class="n">lt</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="o">.</span><span class="n">approx</span><span class="p">(</span><span class="mi">55</span><span class="p">)</span>
</pre></div> </div> </div><p>and the code of the function <code>avg</code> now becomes</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="n">lt</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">lt</span><span class="p">:</span>
<span class="n">lt</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">ut</span><span class="p">:</span>
<span class="n">ut</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">it</span> <span class="k">if</span> <span class="n">x</span> <span class="o">>=</span> <span class="n">lt</span> <span class="ow">and</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">ut</span><span class="p">]</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-3-lower-threshold">step-9-3-lower-threshold</a></p><h3 id="step-9.4-and-9.5---boundary-inclusion-e6fe">Step 9.4 and 9.5 - Boundary inclusion</h3><p>As you can see from the code of the function <code>avg</code>, the upper and lower threshold are included in the comparison, so we might consider the requirements as already satisfied. TDD, however, pushes you to write a test for each requirement (as we saw it's not unusual to actually have multiple tests per requirements), and this is what we are going to do. </p><p>The reason behind this is that you might get the expected behaviour for free, like in this case, because some other code that you wrote to pass a different test provides that feature as a side effect. You don't know, however what will happen to that code in the future, so if you don't have tests that show that all your requirements are satisfied you might lose features without knowing it.</p><p>The test for the fourth requirement is</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_upper_threshold_is_included</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">98</span><span class="p">],</span> <span class="n">ut</span><span class="o">=</span><span class="mi">98</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">29.25</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-4-upper-threshold-is-included">step-9-4-upper-threshold-is-included</a></p><p>while the test for the fifth one is</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_lower_threshold_is_included</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">98</span><span class="p">],</span> <span class="n">lt</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">29.25</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-5-lower-threshold-is-included">step-9-5-lower-threshold-is-included</a></p><p>And, as expected, both pass without any change in the code. Do you remember rule number 5? You should ask yourself why the tests don't fail. In this case we reasoned about that before, so we can accept that the new tests don't require any code change to pass.</p><h3 id="step-9.6---empty-list-2dcd">Step 9.6 - Empty list</h3><p>Requirement number 6 is something that wasn't clearly specified in the project description so we decided to return 0 as the average of an empty list. You are free to change the requirement and decide to raise an exception, for example.</p><p>The test that implements this requirement is</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_empty_list</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([])</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">0</span>
</pre></div> </div> </div><p>and the test suite fails with the following error</p><div class="code"><div class="content"><div class="highlight"><pre>_______________________________ test_avg_empty_list _______________________________
def test_avg_empty_list():
calculator = SimpleCalculator()
> result = calculator.avg([])
tests/test_main.py:127:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <simple_calculator.main.SimpleCalculator object at 0x7feeb7098a10>, it = [], lt = None, ut = None
def avg(self, it, lt=None, ut=None):
if not lt:
> lt = min(it)
E ValueError: min() arg is an empty sequence
simple_calculator/main.py:26: ValueError
</pre></div> </div> </div><p>The function <code>min</code> that we used to compute the default lower threshold doesn't work with an empty list, so the code raises an exception. The simplest solution is to check for the length of the iterable before computing the default thresholds</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="n">lt</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">len</span><span class="p">(</span><span class="n">it</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">lt</span><span class="p">:</span>
<span class="n">lt</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">ut</span><span class="p">:</span>
<span class="n">ut</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">it</span> <span class="k">if</span> <span class="n">x</span> <span class="o">>=</span> <span class="n">lt</span> <span class="ow">and</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">ut</span><span class="p">]</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-6-empty-list">step-9-6-empty-list</a></p><p>As you can see the function <code>avg</code> is already pretty rich, but at the same time it is well structured and understandable. This obviously happens because the example is trivial, but cleaner code is definitely among the benefits of TDD.</p><h3 id="step-9.7---empty-list-after-applying-the-thresholds-deed">Step 9.7 - Empty list after applying the thresholds</h3><p>The next requirement deals with the case in which the outlier removal process empties the list. The test is the following</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_manages_empty_list_after_outlier_removal</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="mi">12</span><span class="p">,</span> <span class="mi">98</span><span class="p">],</span> <span class="n">lt</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="mi">90</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">0</span>
</pre></div> </div> </div><p>and the test suite fails with a <code>ZeroDivisionError</code>, because the length of the iterable is now 0.</p><div class="code"><div class="content"><div class="highlight"><pre>________________ test_avg_manages_empty_list_after_outlier_removal ________________
def test_avg_manages_empty_list_after_outlier_removal():
calculator = SimpleCalculator()
> result = calculator.avg([12, 98], lt=15, ut=90)
tests/test_main.py:135:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <simple_calculator.main.SimpleCalculator object at 0x7f9e60c3ba90>, it = [12, 98], lt = 15, ut = 90
def avg(self, it, lt=None, ut=None):
if not len(it):
return 0
if not lt:
lt = min(it)
if not ut:
ut = max(it)
_it = [x for x in it if x >= lt and x <= ut]
> return sum(_it)/len(_it)
E ZeroDivisionError: division by zero
simple_calculator/main.py:36: ZeroDivisionError
</pre></div> </div> </div><p>The easiest solution is to introduce a new check on the length of the iterable</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="n">lt</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">len</span><span class="p">(</span><span class="n">it</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">lt</span><span class="p">:</span>
<span class="n">lt</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">ut</span><span class="p">:</span>
<span class="n">ut</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">it</span> <span class="k">if</span> <span class="n">x</span> <span class="o">>=</span> <span class="n">lt</span> <span class="ow">and</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">ut</span><span class="p">]</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span>
</pre></div> </div> </div><p>And this code makes the test suite pass. As I stated before, code that makes the tests pass is considered correct, but you are always allowed to improve it. In this case I don't really like the repetition of the length check, so I might try to refactor the function to get a cleaner solution. Since I have all the tests that show that the requirements are satisfied, I am free to try to change the code of the function.</p><p>After some attempts I found this solution</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="n">lt</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">_it</span> <span class="o">=</span> <span class="n">it</span><span class="p">[:]</span>
<span class="k">if</span> <span class="n">lt</span><span class="p">:</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">_it</span> <span class="k">if</span> <span class="n">x</span> <span class="o">>=</span> <span class="n">lt</span><span class="p">]</span>
<span class="k">if</span> <span class="n">ut</span><span class="p">:</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">_it</span> <span class="k">if</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">ut</span><span class="p">]</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span>
</pre></div> </div> </div><p>which looks reasonably clean, and makes the whole test suite pass.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-7-empty-list-after-thresholds">step-9-7-empty-list-after-thresholds</a></p><h3 id="step-9.8---empty-list-before-applying-the-thresholds-a7ab">Step 9.8 - Empty list before applying the thresholds</h3><p>The last requirement checks another boundary case, which happens when the list is empty and we specify one of or both the thresholds. This test will check that the outlier removal code doesn't assume the list contains elements.</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_manages_empty_list_before_outlier_removal</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([],</span> <span class="n">lt</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="mi">90</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">0</span>
</pre></div> </div> </div><p>This test doesn't fail. So, according to the TDD methodology, we should provide a reason why this happens and decide if we want to keep the test. The reason is because the two list comprehensions used to filter the elements work perfectly with empty lists. As for the test, it comes directly from a corner case, and it checks a behaviour which is not already covered by other tests. This makes me decide to keep the test.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-8-empty-list-before-thresholds">step-9-8-empty-list-before-thresholds</a></p><h3 id="step-9.9---zero-as-lowerupper-threshold-35dc">Step 9.9 - Zero as lower/upper threshold</h3><p>This is perhaps the most important step of the whole chapter, for two reasons.</p><p>First of all, the test added in this step was added by two readers of my book about clean architectures (<a href="https://github.com/faustgertz">Faust Gertz</a> and <a href="https://github.com/IrishPrime">Michael O'Neill</a>), and this shows a real TDD workflow. After you published you package (or your book, in this case) someone notices a wrong behaviour in some use case. This might be a big flaw or a tiny corner case, but in any case they can come up with a test that exposes the bug, and maybe even with a patch to the code, but the most important part is the test.</p><p>Whoever discovers the bug has a clear way to show it, and you, as an author/maintainter/developer can add that test to your suite and work on the code until that passes. The rest of the test suite will block any change in the code that disrupts the behaviour you already tested. As I already stressed multiple times, we could do the same without TDD, but if we need to change a substantial amount of code there is nothing like a test suite that can guarantee we are not re-introducing bugs (also called regressions).</p><p>Second, this step shows an important part of the TDD workflow: checking corner cases. In general you should pay a lot of attention to the boundaries of a domain, and test the behaviour of the code in those cases.</p><p>This test shows that the code doesn't manage zero-valued lower thresholds correctly</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_manages_zero_value_lower_outlier</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">lt</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">0.5</span>
</pre></div> </div> </div><p>The reason is that the function <code>avg</code> contains a check like <code>if lt:</code>, which fails when <code>lt</code> is 0, as that is a false value. The check should be <code>if lt is not None:</code>, so that part of the function <code>avg</code> becomes</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">if</span> <span class="n">lt</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">_it</span> <span class="k">if</span> <span class="n">x</span> <span class="o">>=</span> <span class="n">lt</span><span class="p">]</span>
</pre></div> </div> </div><p>It is immediately clear that the upper threshold has the same issue, so the two tests I added are</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_manages_zero_value_lower_outlier</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">lt</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">0.5</span>
<span class="k">def</span> <span class="nf">test_avg_manages_zero_value_upper_outlier</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">ut</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="o">-</span><span class="mf">0.5</span>
</pre></div> </div> </div><p>and the final version of <code>avg</code> is</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="n">lt</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">_it</span> <span class="o">=</span> <span class="n">it</span><span class="p">[:]</span>
<span class="k">if</span> <span class="n">lt</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">_it</span> <span class="k">if</span> <span class="n">x</span> <span class="o">>=</span> <span class="n">lt</span><span class="p">]</span>
<span class="k">if</span> <span class="n">ut</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">_it</span> <span class="k">if</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">ut</span><span class="p">]</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-9-zero-as-lower-upper-threshold">step-9-9-zero-as-lower-upper-threshold</a></p><h3 id="step-9.10---refactoring-for-generators-baa5">Step 9.10 - Refactoring for generators</h3><p>One of the readers of this series, <a href="https://github.com/labdmitriy">Dmitry Labazkin</a>, was following the series and noticed that the final implementation has some drawbacks, namely:</p><ul><li>According to the requirements, this method should accept any iterable, but the implementation can't process generators (which are iterators and also iterables). For example, the function <code>len()</code> cannot be used with generators.</li><li>The iterable is copied, which is something we try to avoid to reduce memory usage.</li><li>Globally, the iterator is read 4 times, which affects performances.</li></ul><p>These are interesting points, and he provides an implementation that solves them all. It's important to mention that the first point is closely related to requirements, so it should be represented by a unit test, while the other two are connected with performances and cannot be tested with pytest. However, any refactoring that produces code we consider better (for example from the performances point of view) can be tested by the existing tests. In other words, we can provide an alternative implementation and still make sure it works correctly.</p><p>Dmitry adds a test to check that generators are supported</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_accepts_generators</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">(</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">98</span><span class="p">])</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">29.25</span>
</pre></div> </div> </div><p>His implementation of the function <code>avg()</code> passes that test and the previous ones we wrote</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="n">lt</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">total</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">number</span> <span class="ow">in</span> <span class="n">it</span><span class="p">:</span>
<span class="k">if</span> <span class="n">lt</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">number</span> <span class="o"><</span> <span class="n">lt</span><span class="p">:</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="n">ut</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">number</span> <span class="o">></span> <span class="n">ut</span><span class="p">:</span>
<span class="k">continue</span>
<span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">total</span> <span class="o">+=</span> <span class="n">number</span>
<span class="k">if</span> <span class="n">count</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">return</span> <span class="n">total</span> <span class="o">/</span> <span class="n">count</span>
</pre></div> </div> </div><p>One might argue that this implementation is less <em>pythonic</em> as it doesn't use fancy list comprehensions, but again, that is a matter of style (and performances). The point about generators is correct, but if that wasn't included in the requirements we might accept either implementation. I personally believe this new implementation is much better than the previous one, as I like to keep a low memory fingerprint, but if we were sure the calculator is used only on small sequences the concern might be overkill.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-10-refactoring-for-generators">step-9-10-refactoring-for-generators</a></p><h2 id="recap-of-the-tdd-rules-92ff">Recap of the TDD rules<a class="headerlink" href="#recap-of-the-tdd-rules-92ff" title="Permanent link">¶</a></h2><p>Through this very simple example we learned 6 important rules of the TDD methodology. Let us review them, now that we have some experience that can make the words meaningful</p><ol><li>Test first, code later</li><li>Add the bare minimum amount of code you need to pass the tests</li><li>You shouldn't have more than one failing test at a time</li><li>Write code that passes the test. Then refactor it.</li><li>A test should fail the first time you run it. If it doesn't ask yourself why you are adding it.</li><li>Never refactor without tests.</li></ol><h2 id="how-many-assertions-a828">How many assertions?<a class="headerlink" href="#how-many-assertions-a828" title="Permanent link">¶</a></h2><p>I am frequently asked "How many assertions do you put in a test?", and I consider this question important enough to discuss it in a dedicated section. To answer this question I want to briefly go back to the nature of TDD and the role of the test suite that we run.</p><p>The whole point of automated tests is to run through a set of checkpoints that can quickly reveal that there is a problem in a specific area. Mind the words "quickly" and "specific". When I run the test suite and an error occurs I'd like to be able to understand as fast as possible where the problem lies. This doesn't (always) mean that the problem will have a quick resolution, but at least I can be immediately aware of which part of the system is misbehaving.</p><p>On the other hand, we don't want to have too many test for the same condition, on the contrary we want to avoid testing the same condition more than once as tests have to be maintained. A test suite that is too fine-grained might result in too many tests failing because of the same problem in the code, which might be daunting and not very informative.</p><p>My advice is to group together assertions that can be executed after running the same setup, if they test the same process. For example, you might consider the two functions <code>add</code> and <code>sub</code> that we tested in this chapter. They require the same setup, which is to instantiate the class <code>SimpleCalculator</code> (a setup that they share with many other tests), but they are actually testing two different processes. A good sign of this is that you should rename the test to <code>test_add_or_sub</code>, and a failure in this test would require a further investigation in the test output to check which method of the class is failing.</p><p>If you have to test that a method returns positive even numbers, instead, you will have consider running the method and then writing two assertions, one that checks that the number is positive, and one that checks it is even. This makes sense, as a failure in one of the two means a failure of the whole process.</p><p>As a rule of thumb, then, consider if the test is a logical <code>AND</code> between conditions or a logical <code>OR</code>. In the former case go for multiple assertions, in the latter create multiple test functions.</p><h2 id="how-to-manage-bugs-or-missing-features-f2e8">How to manage bugs or missing features<a class="headerlink" href="#how-to-manage-bugs-or-missing-features-f2e8" title="Permanent link">¶</a></h2><p>In this chapter we developed the project from scratch, so the challenge was to come up with a series of small tests starting from the requirements. At a certain point in the life of your project you will have a stable version in production (this expression has many definitions, but in general it means "used by someone other than you") and you will need to maintain it. This means that people will file bug reports and feature requests, and TDD gives you a clear strategy to deal with those.</p><p>From the TDD point of view both a bug and a missing feature are cases not currently covered by a test, so I will refer to them collectively as bugs, but don't forget that I'm talking about the second ones as well. </p><p>The first thing you need to do is to write one or more tests that expose the bug. This way you can easily decide when the code that you wrote is correct or good enough. For example, let's assume that a user files an issue on the project <code>SimpleCalculator</code> saying: "The function <code>add</code> doesn't work with negative numbers". You should definitely try to get a concrete example from the user that wrote the issue and some information about the execution environment (as it is always possible that the problem comes from a different source, like for example an old version of a library your package relies on), but in the meanwhile you can come up with at least 3 tests: one that involves two negative numbers, one with a negative number as the first argument, and one with a negative numbers as the second argument.</p><p>You shouldn't write down all of them at once. Write the first test that you think might expose the issue and see if it fails. If it doesn't, discard it and write a new one. From the TDD point of view, if you don't have a failing test there is no bug, so you have to come up with at least one test that exposes the issue you are trying to solve.</p><p>At this point you can move on and try to change the code. Remember that you shouldn't have more than one failing test at a time, so start doing this as soon as you discover a test case that shows there is a problem in the code.</p><p>Once you reach a point where the test suite passes without errors stop and try to run the code in the environment where the bug was first discovered (for example sharing a branch with the user that created the ticket) and iterate the process.</p><h2 id="the-problem-of-types-2b1a">The problem of types<a class="headerlink" href="#the-problem-of-types-2b1a" title="Permanent link">¶</a></h2><p>Other than contributing to the TDD steps, Dmitry Labazkin asked some relevant questions about types, that I will summarise here. You can read his original questions in <a href="https://github.com/TheDigitalCatOnline/blog_source/issues/11">issue #11</a> and <a href="https://github.com/TheDigitalCatOnline/blog_source/issues/12">issue #12</a>.</p><p>The question of type checking is thorny, and since this is an introductory series I will discuss it briefly and give some pointers. Don't get me wrong, though. As I will say later, this is one of the most important topics we can discuss in computer science.</p><p>Overall the problem Dmitry raises is that operators like addition and multiplication are valid for types other than integers (like floats) and also non-numeric ones (like strings). In Python, it is possible to multiply a string by a number and obtain a concatenation of that number of copies of the original string. At the same time, however, subtraction and division are not defined for strings, so some of the questions we can ask are:</p><ul><li>can <code>SimpleCalculator</code> be used on non-integer numeric types?</li><li>can <code>SimpleCalculator</code> be used on non-numeric types?</li><li>shall we explicitly check in the code that the input values belong to a certain type?</li><li>shall we write tests to rule out other types?</li></ul><p>As I said, such questions are deceptively simple, so let's tackle them step by step.</p><p>Let's assume it makes sense for our class to work with numeric types. In Python there is no way to prevent a program from calling <code>SimpleCalculator().add("string1", "string2")</code>, which would fail as the current implementation uses the built-in function <code>sum</code> that doesn't work on strings (unless you call it with a specific initial value). However, calling <code>SimpleCalculator().mul("abc", 3)</code> would result in <code>"abcabcabc"</code>, as the internal implementation quietly supports strings.</p><p>Given the inconsistency, we might be tempted to rule out non-numeric types explicitly. In other words, we might want to add code to our calculator that <em>actively checks</em> if we are passing a non-numeric type. In that case we shall also add tests for those types, according to the TDD methodology, as no code can be added without tests.</p><p>The reason why this topic is thorny is because Python relies heavily on <em>polymorphism</em>, which means that it is more interested in the <em>behaviour</em> of an object more than in its <em>nature</em>. In other words, an object can be considered a number because <em>it is an instance</em> of <code>int</code> or <code>float</code>, for example, but it could just be a class we made up that <em>behaves like</em> one of those types. Using Abstract Base Classes like <a href="https://docs.python.org/3/library/numbers.html">numbers</a> is useful to check if an object is an instance of one of the types encompassed by the hierarchy (again, types such as <code>int</code> and <code>float</code>) but doesn't automatically include everything that behaves like a number. We can create a class that behaves like <code>int</code> without belonging to the hierarchy of <code>numbers</code>.</p><p>Ultimately, this is the reason why Python programmers have to remember that the operator <code>+</code> can be used with types like <code>int</code>, <code>string</code>, and <code>list</code>, but cannot be used with dictionaries. Conversely, <code>len</code> can be used on dictionaries and lists, but cannot be used on integers. We need to remember it, as these operators are polymorphic (there is no operator <code>int+</code> or <code>float+</code>) but don't make sense or are not implemented for some types.</p><p>Those basic operators and functions raise an exception when the wrong type is passed, so we might be tempted to do the same and explicitly raise an exception when the wrong type is passed to <code>SimpleCalculator</code>. Again, the focus is on behaviour and implementation. If our implementation doesn't work with instances of certain classes an exception will occur already, and we don't need to do it explicitly. The aforementioned snipped <code>SimpleCalculator().add("string1", "string2")</code> would raise a <code>TypeError</code> because the underlying <code>sum</code> doesn't like strings. We don't need to do it explicitly.</p><p>In conclusion, my answers to the questions above are:</p><p>Can <code>SimpleCalculator</code> be used on non-integer numeric types? Probably, given the implementation is not specific to integers, but if we want to be sure we should add some tests to expose the functionality. So far, according to TDD, the class is certified to work with integers only. In this case, I might want to add some tests to show that it works with floats. But if someone feeds the class float-like objects that for some reason do not support the operator <code>/</code> some part of the calculator won't work, and there is no way to test all those conditions.</p><p>Can <code>SimpleCalculator</code> be used on non-numeric types? Yes, to a certain extent. <code>mul</code> can be used on sequences, for example. It is a calculator, though, so it doesn't make much sense to try to use it on non-numeric types. Users can feed the calculator any sort of non-numeric types and we cannot do anything to prevent it.</p><p>Shall we explicitly check in the code that the input values belong to a certain type? This goes against the nature of Python: if a certain function or method doesn't work with a specific type an exception will be raised.</p><p>Shall we write tests to rule out other types? Since it is basically impossible to write code that narrows the set of accepted types it is also impossible to write <em>useful</em> tests to check this. We can check that it doesn't work on strings, but what about other sequences? We can check it doesn't work with classes that inherit from <code>Sequence</code>, but what about classes that do not and behave the same?</p><p>In a dynamically typed language like Python, polymorphism and operator overloading are embedded in the language. I think the deeply polymorphic nature of Python is one of the most important aspects any user of this language should understand. It is an incredibly sharp double-edged sword, as it is at the same time extremely powerful and dangerous. "Everything is an object" might sound very simple at first, but it hides a degree of complexity that sooner of later has to be faced by those who want to be proficient with the language.</p><p>I wrote some posts that might help you to understand these topics. You can find them grouped <a href="https://www.thedigitalcatonline.com/blog/2020/04/26/object-oriented-programming-concepts-in-python/">here</a>.</p><h2 id="final-words-9803">Final words<a class="headerlink" href="#final-words-9803" title="Permanent link">¶</a></h2><p>I hope you found the project entertaining and that you can now appreciate the power of TDD. The journey doesn't end here, though. In the next post I will discuss the practice of writing unit tests in depth, and then introduce you to another powerful tool: mocks.</p><h2 id="updates-0083">Updates<a class="headerlink" href="#updates-0083" title="Permanent link">¶</a></h2><p>2021-01-03: <a href="https://github.com/4myhw">George</a> fixed a typo, thanks!</p><p>2023-09-03: <a href="https://github.com/labdmitriy">Dmitry Labazkin</a> provided a new test for the method <code>avg</code> and a better implementation. He also asked relevant questions about type checking that I addressed in a new section. Thanks Dmitry!</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>TDD in Python with pytest - Part 12020-09-10T10:30:00+02:002023-09-03T19:00:00+02:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-09-10:/blog/2020/09/10/tdd-in-python-with-pytest-part-1/<p>This series of posts comes directly from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a>. As I am reviewing the book to prepare a second edition, I realised that Harry Percival was right when he said that the initial part on TDD shouldn't be in the book. That's a prerequisite …</p><p>This series of posts comes directly from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a>. As I am reviewing the book to prepare a second edition, I realised that Harry Percival was right when he said that the initial part on TDD shouldn't be in the book. That's a prerequisite to follow the chapters on the clean architecture, but it is something many programmers already know and they might be surprised to find it in a book that discusses architectures.</p><p>So, I decided to move it here before I start working on a new version of the book. I also followed the advice of <a href="https://github.com/valorien">valorien</a>, who pointed out that the main example had some bad naming choices, and so I reworked the code.</p><h2 id="introduction-8835">Introduction<a class="headerlink" href="#introduction-8835" title="Permanent link">¶</a></h2><p>Test-Driven Development (TDD) is fortunately one of the names that I can spot most frequently when people talk about methodologies. Unfortunately, many programmers still do not follow it, fearing that it will impose a further burden on the already difficult life of a developer.</p><p>In this chapter I will try to outline the basic concept of TDD and to show you how your job as a programmer can greatly benefit from it. I will develop a very simple project to show how to practically write software following this methodology.</p><p>TDD is a <em>methodology</em>, something that can help you to create better code. But it is not going to solve all your problems. As with all methodologies you have to pay attention not to commit blindly to it. Try to understand the reasons why certain practices are suggested by the methodology and you will also understand when and why you can or have to be flexible.</p><p>Keep also in mind that testing is a broader concept that doesn't end with TDD, which focuses a lot on unit testing, a specific type of test that helps you to develop the API of your library/package. There are other types of tests, like integration or functional ones, that are not specifically part of the TDD methodology, strictly speaking, even though the TDD approach can be extended to any testing activity.</p><h2 id="a-real-life-example-5470">A real-life example<a class="headerlink" href="#a-real-life-example-5470" title="Permanent link">¶</a></h2><p>Let's start with a simple example taken from a programmer's everyday life.</p><p>The programmer is in the office with other colleagues, trying to nail down an issue in some part of the software. Suddenly the boss storms into the office, and addresses the programmer:</p><p><strong>Boss</strong>: I just met with the rest of the board. Our clients are not happy, we didn't fix enough bugs in the last two months.</p><p><strong>Programmer</strong>: I see. How many bugs did we fix?</p><p><strong>Boss</strong>: Well, not enough!</p><p><strong>Programmer</strong>: OK, so how many bugs do we have to fix every month?</p><p><strong>Boss</strong>: More!</p><p>I guess you feel very sorry for the poor programmer. Apart from the aggressive attitude of the boss, what is the real issue in this conversation? At the end of it there is no hint for the programmer and their colleagues about what to do next. They don't have any clue about what they have to change. They can definitely try to work harder, but the boss didn't refer to actual figures, so it will be definitely hard for the developers to understand if they improved "enough".</p><p>The classical <a href="https://en.wikipedia.org/wiki/Sorites_paradox">sorites paradox</a> may help to understand the issue. One of the standard formulations, taken from the Wikipedia page, is</p><div class="callout"><div class="content"><p>1,000,000 grains of sand is a heap of sand (Premise 1)</p>
<p>A heap of sand minus one grain is still a heap. (Premise 2)</p>
<p>So 999,999 grains is a heap of sand.</p>
<p>A heap of sand minus one grain is still a heap. (Premise 2)</p>
<p>So 999,998 grains is a heap of sand.</p>
<p>So one grain is a heap of sand.</p></div></div><p>Where is the issue? The concept expressed by the word "heap" is nebulous, it is not defined clearly enough to allow the process to find a stable point, or a solution.</p><p>When you write software you face that same challenge. You cannot conceive a function and just expect it "to work", because this is not clearly defined. How do you test if the function that you wrote "works"? What do you mean by "works"? TDD forces you to <strong>clearly state your goal</strong> before you write the code. Actually, the TDD mantra is "Test first, code later", which can be translated to "Goal first, solution later". Will shortly see a practical example of this.</p><p>For the time being, consider that this is a valid practice also outside the realm of software creation. Whoever runs a business knows that you need to be able to extract some numbers (KPIs) from the activity of your company, because it is by comparing those numbers with some predefined thresholds that you can easily tell if the business is healthy or not. KPIs are a form of test, and you have to define them in advance, according to the expectations or needs that you have. </p><p>Pay attention. Nothing prevents you from changing the thresholds as a reaction to external events. You may consider that, given the incredible heat wave that hit your country, the amount of coats that your company sold could not reach the goal. So, because of a specific event, you can justify a change in the test (KPI). If you didn't have the test you would have just generically recorded that you earned less money.</p><p>Going back to software and TDD, following this methodology you are forced to state clear goals like</p><div class="code"><div class="content"><div class="highlight"><pre><span class="nb">sum</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span> <span class="o">==</span> <span class="mi">9</span>
</pre></div> </div> </div><p>Let me read this test for you: there will be a <code>sum</code> function available in the system that accepts two integers. If the two integers are 4 and 5 the function will return 9.</p><p>As you can see there are many things that are tested by this statement.</p><ul><li>The function exists and can be imported</li><li>The function accepts two integers</li><li>Passing 4 and 5 as inputs, the output of the function will be 9.</li></ul><p>Pay attention that at this stage there is no code that implements the function <code>sum</code>, the tests will fail for sure.</p><p>As we will see with a practical example in the next chapter, what I explained in this section will become a set of rules of the methodology.</p><h2 id="a-simple-tdd-project-e470">A simple TDD project<a class="headerlink" href="#a-simple-tdd-project-e470" title="Permanent link">¶</a></h2><p>The project we are going to develop is available at <a href="https://github.com/lgiordani/simple_calculator">https://github.com/lgiordani/simple_calculator</a>.</p><p>This project is purposefully extremely simple. You don't need to be an experienced Python programmer to follow this chapter, but you need to know the basics of the language. The goal of this series of posts is not that of making you write the best Python code, but that of allowing you learn the TDD work flow, so don't be too worried if your code is not perfect.</p><p>Methodologies are like sports or arts: you cannot learn them just by reading their description on a book. You have to practice them. Thus, you should avoid as much as possible to just follow this chapter reading the code passively. Instead, you should try to write the code and to try new solutions to the problems that I discuss. This is very important, as it actually makes you use TDD. This way, at the end of the chapter you will have a personal experience of what TDD is like.</p><p>The repository is tagged, and at the end of each section you will find a link to the relative tag that contains <em>my</em> working solution. Please note that it is entirely possible your solution is different from mine: there are several aspects of coding, like for example style, that are not related to unit testing and TDD.</p><h2 id="setup-the-project-5c88">Setup the project<a class="headerlink" href="#setup-the-project-5c88" title="Permanent link">¶</a></h2><p>Clone the project repository and move to the branch <code>develop</code>. The branch <code>master</code> contains the full solution, and I use it to maintain the repository, but if you want to code along you need to start from scratch. I recommend you fork the repository on GitHub so that you are able to commit your changes.</p><div class="code"><div class="content"><div class="highlight"><pre>git clone https://github.com/YOURUSERNAME/simple_calculator
cd simple_calculator
git checkout --track origin/develop
</pre></div> </div> </div><p>Create a virtual environment following your preferred process and install the requirements</p><div class="code"><div class="content"><div class="highlight"><pre>pip install -r requirements/dev.txt
</pre></div> </div> </div><p>You should at this point be able to run</p><div class="code"><div class="content"><div class="highlight"><pre>pytest -svv
</pre></div> </div> </div><p>and get an output like</p><div class="code"><div class="content"><div class="highlight"><pre>================================ test session starts ===============================
platform XXXX -- Python XXXX, pytest-XXXX, py-XXXX, pluggy-XXXX -- XXXX
cachedir: .pytest_cache
rootdir: XXXX
configfile: XXXX
plugins: XXXX
collected 0 items
=============================== no tests ran in 0.02s ==============================
</pre></div> </div> </div><p>You can see here the operating system and a short list of the versions of the main packages involved in running pytest: Python, pytest itself, and some of its components and plugins. You can also see here where pytest is reading its configuration from. As this header is standard I will omit it from the output that I will show in the rest of the chapter. The specific versions of the packages are not important for this series.</p><h2 id="requirements-dd57">Requirements<a class="headerlink" href="#requirements-dd57" title="Permanent link">¶</a></h2><p>The goal of the project is to write a class <code>SimpleCalculator</code> that performs calculations: addition, subtraction, multiplication, and division. Addition and multiplication shall accept multiple arguments. Division shall return a float value, and division by zero shall return the string <code>"inf"</code>. Multiplication by zero must raise a <code>ValueError</code> exception. The class will also provide a function to compute the average of an iterable like a list. This function gets two optional upper and lower thresholds and should remove from the computation the values that fall outside these boundaries.</p><p>As you can see the requirements are pretty simple, and a couple of them are definitely not "good" requirements, like the behaviour of division and multiplication. I added those requirements for the sake of example, to show how to deal with exceptions when developing in TDD.</p><p>An interesting topic to discuss is that of data types: shall the calculator perform addition between integers or between floats? What about complex numbers, strings, and other items that can be "added" together? And what about the other operations? I consider this an advanced topic, in particular in Python, so for now I will consider only integers as inputs and discuss the problem of different types later in the series.</p><h2 id="step-1---adding-two-numbers-513b">Step 1 - Adding two numbers<a class="headerlink" href="#step-1---adding-two-numbers-513b" title="Permanent link">¶</a></h2><p>The first test we are going to write is one that checks if the class <code>SimpleCalculator</code> can perform an addition. Add the following code to the file <code>tests/test_main.py</code></p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">simple_calculator.main</span> <span class="kn">import</span> <span class="n">SimpleCalculator</span> <span class="callout">1</span>
<span class="k">def</span> <span class="nf">test_add_two_numbers</span><span class="p">():</span> <span class="callout">2</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">9</span>
</pre></div> </div> </div><p>As you can see the first thing we do is to import the class <code>SimpleCalculator</code> <span class="callout">1</span> that we are supposed to write. This class doesn't exist yet, don't worry, you didn't skip any passage.</p><p>The test is a standard function <span class="callout">2</span> (this is how pytest works), and the function name shall begin with <code>test_</code> so that pytest can automatically discover all the tests. I tend to give my tests a descriptive name, so it is easier later to come back and understand what the test is about with a quick glance. You are free to follow the style you prefer but in general remember that naming components in a proper way is one of the most difficult things in programming. So better to get a handle on it as soon as possible.</p><p>The body of the test function is pretty simple. The class <code>SimpleCalculator</code> is instantiated, and the method <code>add</code> of the instance is called with two numbers, 4 and 5. The result is stored in the variable <code>result</code>, which is later the subject of the test itself. The statement <code>assert result == 9</code> first computes <code>result == 9</code> which is a boolean, with a value that is either <code>True</code> or <code>False</code>. The keyword <code>assert</code>, then, silently passes if the argument is <code>True</code>, but raises an exception if it is <code>False</code>.</p><p>And this is how you write tests in pytest: if your code doesn't raise any exception the test passes, otherwise it fails. The keyword <code>assert</code> is used to force an exception in case of wrong result. Remember that pytest doesn't consider the return value of the function, so it can detect a failure only if it raises an exception.</p><p>Save the file and go back to the terminal. Execute <code>pytest -svv</code> and you should receive the following error message</p><div class="code"><div class="content"><div class="highlight"><pre>====================================== ERRORS ======================================
_______________________ ERROR collecting tests/test_main.py _______________________
[...]
tests/test_main.py:4: in <module>
from simple_calculator.main import SimpleCalculator
E ImportError: cannot import name 'SimpleCalculator' from 'simple_calculator.main'
!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!
============================== 1 error in 0.20 seconds =============================
</pre></div> </div> </div><p>No surprise here, actually, as we just tried to use something that doesn't exist. This is good, the test is showing us that something we suppose exists actually doesn't.</p><div class="callout"><div class="content"><p><strong>TDD rule number 1:</strong> Test first, code later</p></div></div><p>This, by the way, is not yet an error in a test. The error happens very soon, during the tests collection phase (as shown by the message in the bottom line <code>Interrupted: 1 errors during collection</code>). Given this, the methodology is still valid, as we wrote a test and it fails because of an error or a missing feature in the code.</p><p>Let's fix this issue. Open the file <code>simple_calculator/main.py</code> and add this code</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="k">pass</span>
</pre></div> </div> </div><p>But, I hear you scream, this class doesn't implement any of the requirements that are in the project. Yes, this is the hardest lesson you have to learn when you start using TDD. The development of the code is ruled by the tests, not by the requirements. The requirements are used to write the tests, the tests are used to write the code. You shouldn't worry about something that is more than one level above the current one in this workflow.</p><div class="callout"><div class="content"><p><strong>TDD rule number 2:</strong> Add the reasonably minimum amount of code you need to pass the tests</p></div></div><p>Run the test again, and this time you should receive a different error, that is</p><div class="code"><div class="content"><div class="highlight"><pre>tests/test_main.py::test_add_two_numbers FAILED
===================================== FAILURES =====================================
______________________________ test_add_two_numbers _______________________________
def test_add_two_numbers():
calculator = SimpleCalculator()
> result = calculator.add(4, 5)
E AttributeError: 'SimpleCalculator' object has no attribute 'add'
tests/test_main.py:10: AttributeError
============================= 1 failed in 0.04 seconds =============================
</pre></div> </div> </div><p>This is the first proper pytest failure report that we receive. You see a list of files containing tests and the result of each test</p><div class="code"><div class="content"><div class="highlight"><pre>tests/test_main.py::test_add_two_numbers FAILED
</pre></div> </div> </div><p>Later we will see that the syntax <code>FILENAME::TESTNAME</code> can be given directly to pytest to run a single test. In this case we already have only one test, but later you might run a single failing test giving the name shown here on the command line. For example</p><div class="code"><div class="content"><div class="highlight"><pre>pytest -svv tests/test_main.py::test_add_two_numbers
</pre></div> </div> </div><p>The second part of the output shows details on the failing tests, if any</p><div class="code"><div class="content"><div class="highlight"><pre>______________________________ test_add_two_numbers _______________________________
def test_add_two_numbers():
calculator = SimpleCalculator()
> result = calculator.add(4, 5)
E AttributeError: 'SimpleCalculator' object has no attribute 'add'
tests/test_main.py:10: AttributeError
</pre></div> </div> </div><p>For each failing test, pytest shows a header with the name of the test and the part of the code that raised the exception. At the end of each box, pytest shows the line of the test file where the error happened.</p><p>Back to the project. The new error is no surprise, as the test uses the method <code>add</code> that wasn't defined in the class. I bet you already guessed what I'm going to do, didn't you? This is the code that you should add to the class</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="hll"> <span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span><span class="hll"> <span class="k">pass</span>
</pre></div> </div> </div><p>And again, as you notice, we made the smallest possible addition to the code to pass the test. Running pytest again you should receive a different error message</p><div class="code"><div class="content"><div class="highlight"><pre>_______________________________ test_add_two_numbers _______________________________
def test_add_two_numbers():
calculator = SimpleCalculator()
> result = calculator.add(4, 5)
E TypeError: add() takes 1 positional argument but 3 were given
tests/test_main.py:10: TypeError
</pre></div> </div> </div><p>The function we defined doesn't accept any argument other than <code>self</code> (<code>def add(self)</code>), but in the test we pass three of them (<code>calculator.add(4, 5)</code>. Remember that in Python <code>self</code> is passed implicitly when you call a function. Our move at this point is to change the function to accept the parameters that it is supposed to receive, namely two numbers. The code now becomes</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="hll"> <span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
</span> <span class="k">pass</span>
</pre></div> </div> </div><p>Run the test again, and you will receive another error</p><div class="code"><div class="content"><div class="highlight"><pre>______________________________ test_add_two_numbers ________________________________
def test_add_two_numbers():
calculator = SimpleCalculator()
result = calculator.add(4, 5)
> assert result == 9
E assert None == 9
E -None
E +9
tests/test_main.py:12: AssertionError
</pre></div> </div> </div><p>The function returns <code>None</code>, as it doesn't contain any code, while the test expects it to return <code>9</code>. What do you think is the minimum code you can add to pass this test?</p><p>Well, the answer is</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="hll"> <span class="k">return</span> <span class="mi">9</span>
</pre></div> </div> </div><p>and this may surprise you (it should!). You might have been tempted to add some code that performs an addition between <code>a</code> and <code>b</code>, but this would violate the TDD principles, because you would have been driven by the requirements and not by the tests.</p><p>When you run pytest again, you will be rewarded by a success message</p><div class="code"><div class="content"><div class="highlight"><pre>tests/test_main.py::test_add_two_numbers PASSED
</pre></div> </div> </div><p>I know this sound weird, but think about it for a moment: if your code works (that is, it passes the tests), you don't need to change anything, as your tests should specify everything the code should do. Maybe in the future you will discover that this solution is not good enough, and at that point you will have to change it (this will happen with the next test, in this case). But for now everything works, and you shouldn't implement more than this.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-1-adding-two-numbers">step-1-adding-two-numbers</a></p><h2 id="step-2---adding-three-numbers-c8d7">Step 2 - Adding three numbers<a class="headerlink" href="#step-2---adding-three-numbers-c8d7" title="Permanent link">¶</a></h2><p>The requirements state that "Addition and multiplication shall accept multiple arguments". This means that we should be able to execute not only <code>add(4, 5)</code> like we did, but also <code>add(4, 5, 11)</code>, <code>add(4, 5, 11, 2)</code>, and so on. We can start testing this behaviour with the following test, that you should put in <code>tests/test_main.py</code>, after the previous test that we wrote.</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_add_three_numbers</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">15</span>
</pre></div> </div> </div><p>This test fails when we run the test suite</p><div class="code"><div class="content"><div class="highlight"><pre>_____________________________ test_add_three_numbers _______________________________
def test_add_three_numbers():
calculator = SimpleCalculator()
> result = calculator.add(4, 5, 6)
E TypeError: SimpleCalculator.add() takes 3 positional arguments but 4 were given
tests/test_main.py:18: TypeError
</pre></div> </div> </div><p>for the obvious reason that the function we wrote in the previous section accepts only 2 arguments other than <code>self</code>. What is the minimum code that you can write to fix this test?</p><p>Well, the simplest solution is to add another argument, so my first attempt is</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">9</span>
</pre></div> </div> </div><p>which solves the previous error, but creates a new one. If that wasn't enough, it also makes the first test fail!</p><div class="code"><div class="content"><div class="highlight"><pre>______________________________ test_add_two_numbers ________________________________
def test_add_two_numbers():
calculator = SimpleCalculator()
> result = calculator.add(4, 5)
E TypeError: SimpleCalculator.add() missing 1 required positional argument: 'c'
tests/test_main.py:10: TypeError
_____________________________ test_add_three_numbers _______________________________
def test_add_three_numbers():
calculator = SimpleCalculator()
result = calculator.add(4, 5, 6)
> assert result == 15
E assert 9 == 15
tests/test_main.py:20: AssertionError
</pre></div> </div> </div><p>The first test now fails because the new <code>add</code> method requires three arguments and we are passing only two. The second tests fails because the method <code>add</code> returns <code>9</code> and not <code>15</code> as expected by the test.</p><p>When multiple tests fail it's easy to feel discomforted and lost. Where are you supposed to start fixing this? Well, one possible solution is to undo the previous change and to try a different solution, but in general you should try to get to a situation in which only one test fails.</p><div class="callout"><div class="content"><p><strong>TDD rule number 3:</strong> You shouldn't have more than one failing test at a time</p></div></div><p>This is very important as it allows you to focus on one single test and thus one single problem. Clearly, we need to keep an eye on the global problem that we are trying to solve, but real test batteries can contain hundreds of tests and it is not practical to try to tackle all of them together.</p><p>Commenting tests to make them inactive is a perfectly valid way to have only one failing test. Pytest, however, has a smarter solution: you can use the option <code>-k</code> that allows you to specify a matching name. That option has a lot of expressive power, but for now we can just give it the name of the test that we want to run</p><div class="code"><div class="content"><div class="highlight"><pre>pytest -svv -k test_add_two_numbers
</pre></div> </div> </div><p>This option allows you to select multiple tests that share the same prefix, for example. If you want to run a single specific test you can also name it on the command line with the syntax we discussed previously</p><div class="code"><div class="content"><div class="highlight"><pre>pytest -svv tests/test_main.py::test_add_two_numbers
</pre></div> </div> </div><p>Either way, pytest will run only the first test and return the same result returned before, since we didn't change the test itself</p><div class="code"><div class="content"><div class="highlight"><pre>______________________________ test_add_two_numbers ________________________________
def test_add_two_numbers():
calculator = SimpleCalculator()
> result = calculator.add(4, 5)
E TypeError: SimpleCalculator.add() missing 1 required positional argument: 'c'
tests/test_main.py:10: TypeError
</pre></div> </div> </div><p>To fix this error we can obviously revert the addition of the third argument, but this would mean going back to the previous solution. Obviously tests focus on a very small part of the code, but we have to keep in mind what we are doing in terms of the big picture. A better solution is to add a default value to the third argument. The additive identity is <code>0</code>, so the new code of the method <code>add</code> is</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="hll"> <span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
</span> <span class="k">return</span> <span class="mi">9</span>
</pre></div> </div> </div><p>And this makes the first test pass. At this point we can run the full suite with <code>pytest -svv</code> and see what happens</p><div class="code"><div class="content"><div class="highlight"><pre>_____________________________ test_add_three_numbers ______________________________
def test_add_three_numbers():
calculator = SimpleCalculator()
result = calculator.add(4, 5, 6)
> assert result == 15
E assert 9 == 15
tests/test_main.py:20: AssertionError
</pre></div> </div> </div><p>The second test still fails, because the returned value that we hard coded doesn't match the expected one. At this point the tests show that our previous solution (<code>return 9</code>) is not sufficient anymore, and we have to try to implement something more complex.</p><p>I want to stress this. You should implement the minimal change in the code that makes tests pass. If that solution is not enough there will be a test that shows it. Now, as you can see, the addition of a new requirement changes the tests, adding a new one, and the old solution is not sufficient any more.</p><p>How can we solve this? We know that writing <code>return 15</code> will make the first test fail (you may try, if you want), so here we have to be a bit smarter and try a better solution, that in this case is actually to implement a real sum</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
<span class="hll"> <span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">+</span> <span class="n">c</span>
</pre></div> </div> </div><p>This solution makes both tests pass, so the entire suite runs without errors.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-2-adding-three-numbers">step-2-adding-three-numbers</a></p><p>I can see your face, your are probably frowning at the fact that it took us 10 minutes to write a method that performs the addition of two or three numbers. On the one hand, keep in mind that I'm going at a very slow pace, this being an introduction, and for these first tests it is better to take the time to properly understand every single step. Later, when you will be used to TDD, some of these steps will be implicit. On the other hand, TDD <em>is</em> slower than untested development, but the time that you invest writing tests now is usually negligible compared to the amount of time you would spend trying to identify and fix bugs later.</p><h2 id="step-3---adding-multiple-numbers-5bb3">Step 3 - Adding multiple numbers<a class="headerlink" href="#step-3---adding-multiple-numbers-5bb3" title="Permanent link">¶</a></h2><p>The requirements are not yet satisfied, however, as they mention "multiple" numbers and not just three. How can we test that we can add a generic amount of numbers? We might add a <code>test_add_four_numbers</code>, a <code>test_add_five_numbers</code>, and so on, but this will cover specific cases and will never cover all of them. Sad to say, it is impossible to test that generic condition, or, at least in this case, so complex that it is not worth trying to do it.</p><p>What you shall do in TDD is to test boundary cases. In general you should always try to find the so-called "corner cases" of your algorithm and write tests that show that the code covers them. For example, if you are testing some code that accepts as inputs a number from 1 to 100, you need a test that runs it with a generic number like 42 (which is far from being generic, but don't panic!), but you definitely want to have a specific test that runs the algorithm with the number 1 and one that runs with the number 100. You also want to have tests that show the algorithm doesn't work with 0 and with 101, but we will talk later about testing error conditions.</p><p>In our example there is no real limitation to the number of arguments that you pass to your function. Before Python 3.7 there was a limit of 256 arguments, which has been removed in that version of the language, but these are limitations enforced by an external system, and they are not real boundaries of your algorithm.</p><p>The definition of "external system" obviously depends on what you are testing. If you are implementing a programming language you want to have tests that show how many arguments you can pass to a function, or that check the amount of memory used by certain language features. In this case we accept the Python language as the environment in which we work, so we don't want to test its features.</p><p>The solution, in this case, might be to test a reasonable high amount of input arguments, to check that everything works. In particular, we should try to keep in mind that our goal is to devise as much as possible a generic solution. For example, we easily realise that we cannot come up with a function like</p><div class="code"><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">e</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">f</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">g</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">h</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
</pre></div> </div> </div><p>as it is not <em>generic</em>, it is just covering a greater amount of inputs (9, in this case, but not 10 or more).</p><p>That said, a good test might be the following</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_add_many_numbers</span><span class="p">():</span>
<span class="n">numbers</span> <span class="o">=</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="o">*</span><span class="n">numbers</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">4950</span>
</pre></div> </div> </div><p>which creates an array (strictly speaking a <code>range</code>, which is an iterable) of all the numbers from 0 to 99. The sum of all those numbers is 4950, which is what the algorithm shall return.</p><p>Please note that the assertion doesn't implement any algorithm to find the solution. I calculated the answer manually and hard coded it in the test. You should try as much as possible to minimise the algorithmic complexity of tests, instead "stating the facts". The reason is simple: the more complex the code of the test is, the higher the chances of introducing a bug <em>in the test</em>.</p><p>The test suite fails because we are giving the function too many arguments</p><div class="code"><div class="content"><div class="highlight"><pre>______________________________ test_add_many_numbers _______________________________
def test_add_many_numbers():
numbers = range(100)
calculator = SimpleCalculator()
> result = calculator.add(*numbers)
E TypeError: SimpleCalculator.add() takes from 3 to 4 positional arguments but 101 were given
tests/test_main.py:28: TypeError
</pre></div> </div> </div><p>The minimum amount of code that we can add, this time, will not be so trivial, as we have to pass three tests. This is actually the greatest advantage of TDD: the tests that we wrote are still there and will check that the previous conditions are still satisfied. And since tests are committed with the code they will always be there.</p><p>The Python way to support a generic number of arguments (technically called <em>variadic functions</em>) is through the use of the syntax <code>*args</code>, which stores in <code>args</code> a tuple that contains all the arguments.</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">args</span><span class="p">)</span>
</pre></div> </div> </div><p>At that point we can use the built-in function <code>sum</code> to sum all the arguments. This solution makes the whole test suite pass without errors, so it is correct.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-3-adding-multiple-numbers">step-3-adding-multiple-numbers</a></p><p>Pay attention here, please. In TDD, a solution is not correct when it is beautiful, when it is smart, or when it uses the latest feature of the language. All these things are good, but TDD wants your code to pass the tests. So, your code might be ugly, convoluted, and slow, but if it passes the test it is correct. This in turn means that TDD doesn't cover all the needs of your software project. Delivering fast routines, for example, might be part of the advantage you have on your competitors, but it is not really testable with the TDD methodology (typically, performance testing is done in a completely different way).</p><p>Part of the TDD methodology, then, deals with "refactoring", which means changing the code in a way that doesn't change the outputs, which in turns means that all your tests keep passing. Once you have a proper test suite in place, you can focus on the beauty of the code, or you can introduce smart solutions according to what the language allows you to do. We will discuss refactoring further later in this post.</p><div class="callout"><div class="content"><p><strong>TDD rule number 4:</strong> Write code that passes the test. Then refactor it.</p></div></div><h2 id="step-4---subtraction-952c">Step 4 - Subtraction<a class="headerlink" href="#step-4---subtraction-952c" title="Permanent link">¶</a></h2><p>From the requirements we know that we have to implement a function to subtract numbers, but this doesn't mention multiple arguments (as it would be complex to define what subtracting 3 of more numbers actually means). The tests that implements this requirements is</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_subtract_two_numbers</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">7</span>
</pre></div> </div> </div><p>which doesn't pass with the following error</p><div class="code"><div class="content"><div class="highlight"><pre>____________________________ test_subtract_two_numbers ____________________________
def test_subtract_two_numbers():
calculator = SimpleCalculator()
> result = calculator.sub(10, 3)
E AttributeError: 'SimpleCalculator' object has no attribute 'sub'
tests/test_main.py:36: AttributeError
</pre></div> </div> </div><p>Now that you understood the TDD process, and that you know you should avoid over-engineering, you can also skip some of the passages that we run through in the previous sections. A good solution for this test is</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">sub</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">-</span> <span class="n">b</span>
</pre></div> </div> </div><p>which makes the test suite pass.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-4-subtraction">step-4-subtraction</a></p><h2 id="step-5---multiplication-20bb">Step 5 - Multiplication<a class="headerlink" href="#step-5---multiplication-20bb" title="Permanent link">¶</a></h2><p>It's time to move to multiplication, which has many similarities to addition. The requirements state that we have to provide a function to multiply numbers and that this function shall allow us to multiply multiple arguments. In TDD you should try to tackle problems one by one, possibly dividing a bigger requirement in multiple smaller ones.</p><p>In this case the first test can be the multiplication of two numbers, as it was for addition.</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_mul_two_numbers</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">mul</span><span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">24</span>
</pre></div> </div> </div><p>And the test suite fails as expected with the following error</p><div class="code"><div class="content"><div class="highlight"><pre>______________________________ test_mul_two_numbers _______________________________
def test_mul_two_numbers():
calculator = SimpleCalculator()
> result = calculator.mul(6, 4)
E AttributeError: 'SimpleCalculator' object has no attribute 'mul'
tests/test_main.py:44: AttributeError
</pre></div> </div> </div><p>We face now a classical TDD dilemma. Shall we implement the solution to this test as a function that multiplies two numbers, knowing that the next test will invalidate it, or shall we already consider that the target is that of implementing a variadic function and thus use <code>*args</code> directly?</p><p>In this case the choice is not really important, as we are dealing with very simple functions. In other cases, however, it might be worth recognising that we are facing the same issue we solved in a similar case and try to implement a smarter solution from the very beginning. In general, however, you should not implement anything that you don't plan to test in one of the next few tests that you will write.</p><p>If we decide to follow the strict TDD, that is implement the simplest first solution, the bare minimum code that passes the test would be</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">mul</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-5-multiply-two-numbers">step-5-multiply-two-numbers</a></p><p>To show you how to deal with redundant tests I will in this case choose the second path, and implement a smarter solution for the present test. Keep in mind however that it is perfectly correct to implement that solution shown above and then move on and try to solve the problem of multiple arguments later.</p><p>The problem of multiplying a tuple of numbers can be solved in Python using the function <code>reduce</code>. This function implements a typical algorithm that "reduces" an array to a single number, applying a given function. The algorithm steps are the following</p><p>1. Apply the function to the first two elements 2. Remove the first two elements from the array 3. Apply the function to the result of the previous step and to the first element of the array 4. Remove the first element 5. If there are still elements in the array go back to step 3</p><p>So, suppose the function is</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">mul2</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span>
</pre></div> </div> </div><p>and the array is</p><div class="code"><div class="content"><div class="highlight"><pre><span class="n">a</span> <span class="o">=</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
</pre></div> </div> </div><p>The steps followed by the algorithm will be</p><p>1. Apply the function to 2 and 6 (first two elements). The result is <code>2 * 6</code>, that is 12 2. Remove the first two elements, the array is now <code>a = [4, 8, 3]</code> 3. Apply the function to 12 (result of the previous step) and 4 (first element of the array). The new result is <code>12 * 4</code>, that is 48 4. Remove the first element, the array is now <code>a = [8, 3]</code> 5. Apply the function to 48 (result of the previous step) and 8 (first element of the array). The new result is <code>48 * 8</code>, that is 384 6. Remove the first element, the array is now <code>a = [3]</code> 7. Apply the function to 384 (result of the previous step) and 3 (first element of the array). The new result is <code>384 * 3</code>, that is 1152 8. Remove the first element, the array is now empty and the procedure ends</p><p>Going back to our class <code>SimpleCalculator</code>, we might import <code>reduce</code> from the module <code>functools</code> and use it on the array <code>args</code>. We need to provide a function that we can define in the function <code>mul</code> itself.</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">reduce</span>
<span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">mul</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">mul2</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span>
<span class="k">return</span> <span class="n">reduce</span><span class="p">(</span><span class="n">mul2</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-5-multiply-two-numbers-smart">step-5-multiply-two-numbers-smart</a></p><p>More information about the algorithm <code>reduce</code> can be found on the MapReduce Wikipedia page <a href="https://en.wikipedia.org/wiki/MapReduce">https://en.wikipedia.org/wiki/MapReduce</a>. The Python function documentation can be found at <a href="https://docs.python.org/3.10/library/functools.html#functools.reduce">https://docs.python.org/3.10/library/functools.html#functools.reduce</a>.</p><p>The above code makes the test suite pass, so we can move on and address the next problem. As happened with addition we cannot properly test that the function accepts a potentially infinite number of arguments, so we can test a reasonably high number of inputs.</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_mul_many_numbers</span><span class="p">():</span>
<span class="n">numbers</span> <span class="o">=</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">mul</span><span class="p">(</span><span class="o">*</span><span class="n">numbers</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">362880</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-5-multiply-many-numbers">step-5-multiply-many-numbers</a></p><p>We might use 100 arguments as we did with addition, but the multiplication of all numbers from 1 to 100 gives a result with 156 digits and I don't really need to clutter the tests file with such a monstrosity. As I said, testing multiple arguments is testing a boundary, and the idea is that if the algorithm works for 2 numbers and for 10 it will work for 10 thousands arguments as well.</p><p>If we run the test suite now all tests pass, and <em>this should worry you</em>.</p><p>Yes, you shouldn't be happy. When you follow TDD each new test that you add should fail. If it doesn't fail you should ask yourself if it is worth adding that test or not. This is because chances are that you are adding a useless test and we don't want to add useless code, because code has to be maintained, so the less the better.</p><p>In this case, however, we know why the test already passes. We implemented a smarter algorithm as a solution for the first test knowing that we would end up trying to solve a more generic problem. And the value of this new test is that it shows that multiple arguments can be used, while the first test doesn't.</p><p>So, after these considerations, we can be happy that the second test already passes.</p><div class="callout"><div class="content"><p><strong>TDD rule number 5:</strong> A test should fail the first time you run it. If it doesn't, ask yourself why you are adding it.</p></div></div><h2 id="step-6---refactoring-b6bd">Step 6 - Refactoring<a class="headerlink" href="#step-6---refactoring-b6bd" title="Permanent link">¶</a></h2><p>Previously, I introduced the concept of refactoring, which means changing the code without altering the results. How can you be sure you are not altering the behaviour of your code? Well, this is what the tests are for. If the new code keeps passing the test suite you can be sure that you didn't remove any feature.</p><p>In theory, refactoring shouldn't add any new behaviour to the code, as it should be an idempotent transformation. There is no real practical way to check this, and we will not bother with it now. You should be concerned with this if you are discussing security, as your code shouldn't add any entry point you don't want to be there. In this case you will need tests that check the absence of features instead of their presence.</p><p>This means that if you have no tests you shouldn't refactor. But, after all, if you have no tests you shouldn't have any code, either, so refactoring shouldn't be a problem you have. If you have some code without tests (I know you have it, I do), you should seriously consider writing tests for it, at least before changing it. More on this in a later section.</p><p>For the time being, let's see if we can work on the code of the class <code>SimpleCalculator</code> without altering the results. I do not really like the definition of the function <code>mul2</code> inside the function <code>mul</code>. It is obviously perfectly fine and valid, but for the sake of example I will pretend we have to get rid of it.</p><p>Python provides a useful function to multiply two objects in the module <code>operator</code> of the standard library</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">operator</span>
<span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">reduce</span>
<span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">mul</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="k">return</span> <span class="n">reduce</span><span class="p">(</span><span class="n">operator</span><span class="o">.</span><span class="n">mul</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span>
</pre></div> </div> </div><p>Running the test suite I can see that all the test pass, so my refactoring is correct.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-6-refactoring">step-6-refactoring</a></p><div class="callout"><div class="content"><p><strong>TDD rule number 6:</strong> Never refactor without tests.</p></div></div><h2 id="final-words-9803">Final words<a class="headerlink" href="#final-words-9803" title="Permanent link">¶</a></h2><p>Well, I think we learned a lot. We started with no knowledge of TDD and we managed to implement a fully tested class with 3 methods. We also briefly touched the topic of refactoring, which is of paramount importance in development. In the next post I will cover the remaining requirements: division, testing exceptions, and the average function.</p><h2 id="updates-0083">Updates<a class="headerlink" href="#updates-0083" title="Permanent link">¶</a></h2><p>2021-01-03: <a href="https://github.com/4myhw">George</a> fixed a typo, thanks!</p><p>2021-08-11: <a href="https://github.com/floatingpurr">Andrea Mignone</a> fixed a link. Thank you!</p><p>2023-09-03: <a href="https://github.com/labdmitriy">Dmitry Labazkin</a> and <a href="https://github.com/blablatdinov">Ilaletdinov Almaz</a> suggested using <code>operator.mul</code> instead of a <code>lambda</code> in the final refactoring. Thanks both!</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>Delegation: composition and inheritance in object-oriented programming2020-08-17T09:00:00+01:002022-10-03T08:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-08-17:/blog/2020/08/17/delegation-composition-and-inheritance-in-object-oriented-programming/<h2 id="introduction-8835">Introduction<a class="headerlink" href="#introduction-8835" title="Permanent link">¶</a></h2><p>Object-oriented programming (OOP) is a methodology that was introduced in the 60s, though as for many other concepts related to programming languages it is difficult to give a proper date. While recent years have witnessed a second youth of functional languages, object-oriented is still a widespread paradigm among successful …</p><h2 id="introduction-8835">Introduction<a class="headerlink" href="#introduction-8835" title="Permanent link">¶</a></h2><p>Object-oriented programming (OOP) is a methodology that was introduced in the 60s, though as for many other concepts related to programming languages it is difficult to give a proper date. While recent years have witnessed a second youth of functional languages, object-oriented is still a widespread paradigm among successful programming languages, and for good reasons. OOP is not the panacea for all the architectural problems in software development, but if used correctly can give a solid foundation to any system.</p><p>It might sound obvious, but if you use an object-oriented language or a language with strong OOP traits, you have to learn this paradigm well. Being very active in the Python community, I see how many times young programmers are introduced to the language, the main features, and the most important libraries and frameworks, <em>without a proper and detailed description of OOP and how OOP is implemented in the language</em>.</p><p>The <em>implementation</em> part is particularly important, as OOP is a set of concepts and features that are expressed theoretically and then implemented in the language, with specific traits or choices. It is very important, then, to keep in mind that the concepts behind OOP are generally shared among OOP languages, but are not tenets, and are subject to interpretation.</p><p>What is the core of OOP? Many books and tutorials mention the three pillars encapsulation, delegation, and polymorphism, but I believe these are traits of a more central concept, which is the <strong>collaboration of entities</strong>. In a well-designed OO system, we can observe a set of actors that send messages to each other to keep the system alive, responsive, and consistent.</p><p>These actors have a state, the data, and give access to it through an interface: this is <strong>encapsulation</strong>. Each actor can use functionalities implemented by another actor sending a message (calling a method) and when the relationship between the two is stable we have <strong>delegation</strong>. As communication happens through messages, actors are not concerned with the nature of the recipients, only with their interface, and this is <strong>polymorphism</strong>.</p><p>Alan Kay, in his "The Early History of Smalltalk", says</p><div class="callout"><div class="content"><p>In computer terms, Smalltalk is a recursion on the notion of computer itself. Instead of dividing "computer stuff" into things each less strong than the whole — like data structures, procedures, and functions which are the usual paraphernalia of programming languages — each Smalltalk object is a recursion on the entire possibilities of the computer. Thus its semantics are a bit like having thousands and thousands of computers all hooked together by a very fast network.</p></div></div><p>I find this extremely enlightening, as it reveals the idea behind the three pillars, and the reason why we do or don't do certain things in OOP, why we consider good to provide some automatic behaviours or to forbid specific solutions.</p><p>By the way, if you replace the word "object" with "microservice" in the quote above, you might be surprised by the description of a very modern architecture for cloud-based systems. Once again, concepts in computer science are like fractals, they are self-similar and pop up in unexpected places.</p><p>In this post, I want to focus on the second of the pillars of object-oriented programming: <strong>delegation</strong>. I will discuss its nature and the main two strategies we can follow to implement it: <strong>composition</strong> and <strong>inheritance</strong>. I will provide examples in Python and show how the powerful OOP implementation of this language opens the door to interesting atypical solutions.</p><p>For the rest of this post, I will consider objects as mini computers and the system in which they live a "very fast network", using the words of Alan Kay. Data contained in an object is the state of the computer, its methods are the input/output devices, and calling methods is the same thing as sending a message to another computer through the network.</p>
<div class="advertisement">
<a href="https://www.thedigitalcat.academy/freebie-first-class-objects">
<img src="/images/first-class-objects/cover.jpg" />
</a>
<div class="body">
<h2 id="first-class-objects-in-python-fffa">First-class objects in Python<a class="headerlink" href="#first-class-objects-in-python-fffa" title="Permanent link">¶</a></h2>
<p>Higher-order functions, wrappers, and factories</p>
<p>Learn all you need to know to understand first-class citizenship in Python, the gateway to grasp how decorators work and how functional programming can supercharge your code.</p>
<div class="actions">
<a class="action" href="https://www.thedigitalcat.academy/freebie-first-class-objects">Get your FREE copy</a>
</div>
</div>
</div>
<h2 id="delegation-in-oop-1aef">Delegation in OOP<a class="headerlink" href="#delegation-in-oop-1aef" title="Permanent link">¶</a></h2><p>Delegation is the mechanism through which an actor assigns a task or part of a task to another actor. This is not new in computer science, as any program can be split into blocks and each block generally depends on the previous ones. Furthermore, code can be isolated in libraries and reused in different parts of a program, implementing this "task assignment". In an OO system the assignee is not just the code of a function, but a full-fledged object, another actor.</p><p>The main concept to retain here is that the reason behind delegation is <strong>code reuse</strong>. We want to avoid code repetition, as it is often the source of regressions; fixing a bug in one of the repetitions doesn't automatically fix it in all of them, so keeping one single version of each algorithm is paramount to ensure the consistency of a system. Delegation helps us to keep our actors small and specialised, which makes the whole architecture more flexible and easier to maintain (if properly implemented). Changing a very big subsystem to satisfy a new requirement might affect other parts system in bad ways, so the smaller the subsystems the better (up to a certain point, where we incur in the opposite problem, but this shall be discussed in another post).</p><p>There is a <strong>dichotomy</strong> in delegation, as it can be implemented following two different strategies, which are orthogonal from many points of view, and I believe that one of the main problems that object-oriented systems have lies in the use of the wrong strategy, in particular the overuse of inheritance. When we create a system using an object-oriented language we need to keep in mind this dichotomy at every step of the design.</p><p>There are four areas or points of views that I want to introduce to help you to visualise delegation between actors: <strong>visibility</strong>, <strong>control</strong>, <strong>relationship</strong>, and <strong>entities</strong>. As I said previously, while these concepts apply to systems at every scale, and in particular to every object-oriented language, I will provide examples in Python.</p><h3 id="visibility-state-sharing-0703">Visibility: state sharing</h3><p>The first way to look at delegation is through the lenses of state sharing. As I said before the data contained in an object can be seen as its state, and if hearing this you think about components in a frontend framework or state machines you are on the right path. The state of a computer, its memory or the data on the mass storage, can usually be freely accessed by <em>internal</em> systems, while the access is mediated for <em>external</em> ones. Indeed, the level of access to the state is probably one of the best ways to define internal and external systems in a software or hardware architecture.</p><p>When using inheritance, the child class shares its whole state with the parent class. Let's have a look at a simple example</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Parent</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_value</span> <span class="o">=</span> <span class="n">value</span> <span class="callout">3</span>
<span class="k">def</span> <span class="nf">describe</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="callout">1</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Parent: value is </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">_value</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Child</span><span class="p">(</span><span class="n">Parent</span><span class="p">):</span>
<span class="k">pass</span>
<span class="o">>>></span> <span class="n">cld</span> <span class="o">=</span> <span class="n">Child</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="o">>>></span> <span class="nb">print</span><span class="p">(</span><span class="n">cld</span><span class="o">.</span><span class="n">_value</span><span class="p">)</span>
<span class="mi">5</span>
<span class="o">>>></span> <span class="n">cld</span><span class="o">.</span><span class="n">describe</span><span class="p">()</span> <span class="callout">2</span>
<span class="n">Parent</span><span class="p">:</span> <span class="n">value</span> <span class="ow">is</span> <span class="mi">5</span>
</pre></div> </div> </div><p>As you can see, <code>describe</code> is defined in <code>Parent</code> <span class="callout">1</span>, so when the instance <code>cld</code> calls it <span class="callout">2</span>, its class <code>Child</code> delegates the call to the class <code>Parent</code>. This, in turn, uses <code>_value</code> as if it was defined locally <span class="callout">3</span>, while it is defined in <code>cld</code>. This works because, from the point of view of the state, <code>Parent</code> has complete access to the state of <code>Child</code>. Please note that the state is not even enclosed in a name space, as the state of the child class <em>becomes</em> the state of the parent class.</p><p>Composition, on the other side, keeps the state completely private and makes the delegated object see only what is explicitly shared through message passing. A simple example of this is</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Logger</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">log</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Logger: value is </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Process</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_value</span> <span class="o">=</span> <span class="n">value</span> <span class="callout">1</span>
<span class="bp">self</span><span class="o">.</span><span class="n">logger</span> <span class="o">=</span> <span class="n">Logger</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">logger</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_value</span><span class="p">)</span> <span class="callout">2</span>
<span class="o">>>></span> <span class="n">prc</span> <span class="o">=</span> <span class="n">Process</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="o">>>></span> <span class="nb">print</span><span class="p">(</span><span class="n">prc</span><span class="o">.</span><span class="n">_value</span><span class="p">)</span>
<span class="mi">5</span>
<span class="o">>>></span> <span class="n">prc</span><span class="o">.</span><span class="n">info</span><span class="p">()</span>
<span class="n">Logger</span><span class="p">:</span> <span class="n">value</span> <span class="ow">is</span> <span class="mi">5</span>
</pre></div> </div> </div><p>Here, instances of <code>Process</code> have an attribute <code>_value</code> <span class="callout">1</span> that is shared with the class<code>Logger</code> only when it comes to calling <code>Logger.log</code> <span class="callout">2</span> inside their <code>info</code> method. <code>Logger</code> objects have no visibility of the state of <code>Process</code> objects unless it is explicitly shared.</p><p>Note for advanced readers: I'm clearly mixing the concepts of instance and class here, and blatantly ignoring the resulting inconsistencies. The state of an instance is not the same thing as the state of a class, and it should also be mentioned that classes are themselves instances of metaclasses, at least in Python. What I want to point out here is that access to attributes is granted automatically to inherited classes because of the way <code>__getattribute__</code> and bound methods work, while in composition such mechanisms are not present and the effect is that the state is not shared.</p><h3 id="control-implicit-and-explicit-delegation-ffc4">Control: implicit and explicit delegation</h3><p>Another way to look at the dichotomy between inheritance and composition is that of the control we have over the process. Inheritance is usually provided by the language itself and is implemented according to some rules that are part of the definition of the language itself. This makes inheritance an implicit mechanism: when you make a class inherit from another one, there is an automatic and implicit process that rules the delegation between the two, which makes it run outside our control.</p><p>Let's see an example of this in action using inheritance</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Window</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">title</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_title</span> <span class="o">=</span> <span class="n">title</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_size_x</span> <span class="o">=</span> <span class="n">size_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_size_y</span> <span class="o">=</span> <span class="n">size_y</span>
<span class="k">def</span> <span class="nf">resize</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">new_size_x</span><span class="p">,</span> <span class="n">new_size_y</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_size_x</span> <span class="o">=</span> <span class="n">new_size_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_size_y</span> <span class="o">=</span> <span class="n">new_size_y</span>
<span class="bp">self</span><span class="o">.</span><span class="n">info</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="callout">2</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Window '</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">_title</span><span class="si">}</span><span class="s2">' is </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">_size_x</span><span class="si">}</span><span class="s2">x</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">_size_y</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">TransparentWindow</span><span class="p">(</span><span class="n">Window</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">title</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">,</span> <span class="n">transparency</span><span class="o">=</span><span class="mi">50</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_title</span> <span class="o">=</span> <span class="n">title</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_size_x</span> <span class="o">=</span> <span class="n">size_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_size_y</span> <span class="o">=</span> <span class="n">size_y</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_transparency</span> <span class="o">=</span> <span class="n">transparency</span>
<span class="k">def</span> <span class="nf">change_transparency</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">new_transparency</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_transparency</span> <span class="o">=</span> <span class="n">new_transparency</span>
<span class="k">def</span> <span class="nf">info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="callout">1</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">info</span><span class="p">()</span> <span class="callout">3</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Transparency is set to </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">_transparency</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</pre></div> </div> </div><p>At this point we can instantiate and use <code>TransparentWindow</code></p><div class="code"><div class="content"><div class="highlight"><pre><span class="o">>>></span> <span class="n">twin</span> <span class="o">=</span> <span class="n">TransparentWindow</span><span class="p">(</span><span class="s2">"Terminal"</span><span class="p">,</span> <span class="mi">640</span><span class="p">,</span> <span class="mi">480</span><span class="p">,</span> <span class="mi">80</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">twin</span><span class="o">.</span><span class="n">info</span><span class="p">()</span>
<span class="n">Window</span> <span class="s1">'Terminal'</span> <span class="ow">is</span> <span class="mi">640</span><span class="n">x480</span>
<span class="n">Transparency</span> <span class="ow">is</span> <span class="nb">set</span> <span class="n">to</span> <span class="mi">80</span>
<span class="o">>>></span> <span class="n">twin</span><span class="o">.</span><span class="n">change_transparency</span><span class="p">(</span><span class="mi">70</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">twin</span><span class="o">.</span><span class="n">resize</span><span class="p">(</span><span class="mi">800</span><span class="p">,</span> <span class="mi">600</span><span class="p">)</span>
<span class="n">Window</span> <span class="s1">'Terminal'</span> <span class="ow">is</span> <span class="mi">800</span><span class="n">x600</span>
<span class="n">Transparency</span> <span class="ow">is</span> <span class="nb">set</span> <span class="n">to</span> <span class="mi">70</span>
</pre></div> </div> </div><p>When we call <code>twin.info</code>, Python is running <code>TransparentWindow</code>'s implementation of that method <span class="callout">1</span> and is not automatically delegating anything to <code>Window</code> even though the latter has a method with that name <span class="callout">2</span>. Indeed, we have to explicitly call it through <code>super</code> when we want to reuse it <span class="callout">3</span>. When we use <code>resize</code>, though, the implicit delegation kicks in and we end up with the execution of <code>Window.resize</code>. Please note that this delegation doesn't propagate to the next calls. When <code>Window.resize</code> calls <code>self.info</code> this runs <code>TransparentWindow.info</code>, as the original call was made from that class.</p><p>Composition is on the other end of the spectrum, as any delegation performed through composed objects has to be explicit. Let's see an example</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Body</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_text</span> <span class="o">=</span> <span class="n">text</span>
<span class="k">def</span> <span class="nf">info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s2">"length"</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_text</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">class</span> <span class="nc">Page</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">title</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_title</span> <span class="o">=</span> <span class="n">title</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_body</span> <span class="o">=</span> <span class="n">Body</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s2">"title"</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">_title</span><span class="p">,</span>
<span class="s2">"body"</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">_body</span><span class="o">.</span><span class="n">info</span><span class="p">()</span> <span class="callout">1</span>
<span class="p">}</span>
</pre></div> </div> </div><p>When we instantiate a <code>Page</code> and call <code>info</code> everything works</p><div class="code"><div class="content"><div class="highlight"><pre><span class="o">>>></span> <span class="n">page</span> <span class="o">=</span> <span class="n">Page</span><span class="p">(</span><span class="s2">"New post"</span><span class="p">,</span> <span class="s2">"Some text for an exciting new post"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">page</span><span class="o">.</span><span class="n">info</span><span class="p">()</span>
<span class="p">{</span><span class="s1">'title'</span><span class="p">:</span> <span class="s1">'New post'</span><span class="p">,</span> <span class="s1">'body'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'length'</span><span class="p">:</span> <span class="mi">34</span><span class="p">}}</span>
</pre></div> </div> </div><p>but as you can see, <code>Page.info</code> has to explicitly mention <code>Body.info</code> through <code>self._body</code> <span class="callout">1</span>, as we had to do when using inheritance with <code>super</code>. Composition is not different from inheritance when methods are overridden, at least in Python.</p><h3 id="relationship-to-be-vs-to-have-6ac5">Relationship: to be vs to have</h3><p>The third point of view from which you can look at delegation is that of the nature of the relationship between actors. Inheritance gives the child class the same nature as the parent class, with specialised behaviour. We can say that a child class implements new features or changes the behaviour of existing ones, but generally speaking, we agree that it <em>is</em> like the parent class. Think about a gaming laptop: it <em>is</em> a laptop, only with specialised features that enable it to perform well in certain situations. On the other end, composition deals with actors that are usually made of other actors of a different nature. A simple example is that of the computer itself, which <em>has</em> a CPU, <em>has</em> a mass storage, <em>has</em> memory. We can't say that the computer <em>is</em> the CPU, because that is reductive.</p><p>This difference in the nature of the relationship between actors in a delegation is directly mapped into inheritance and composition. When using inheritance, we implement the verb <em>to be</em></p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Car</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">colour</span><span class="p">,</span> <span class="n">max_speed</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_colour</span> <span class="o">=</span> <span class="n">colour</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_speed</span> <span class="o">=</span> <span class="mi">0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_max_speed</span> <span class="o">=</span> <span class="n">max_speed</span>
<span class="k">def</span> <span class="nf">accelerate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">speed</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_speed</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">speed</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_max_speed</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">SportsCar</span><span class="p">(</span><span class="n">Car</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">accelerate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">speed</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_speed</span> <span class="o">=</span> <span class="n">speed</span>
</pre></div> </div> </div><p>Here, <code>SportsCar</code> <em>is</em> a <code>Car</code>, it can be initialised in the same way and has the same methods, though it can accelerate much more (wow, that might be a fun ride). Since the relationship between the two actors is best described by <em>to be</em> it is natural to use inheritance.</p><p>Composition, on the other hand, implements the verb <em>to have</em> and describes an object that is "physically" made of other objects</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Employee</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_name</span> <span class="o">=</span> <span class="n">name</span>
<span class="k">class</span> <span class="nc">Company</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">ceo_name</span><span class="p">,</span> <span class="n">cto_name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_ceo</span> <span class="o">=</span> <span class="n">Employee</span><span class="p">(</span><span class="n">ceo_name</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_cto</span> <span class="o">=</span> <span class="n">Employee</span><span class="p">(</span><span class="n">cto_name</span><span class="p">)</span>
</pre></div> </div> </div><p>We can say that a company is the sum of its employees (plus other things), and we easily recognise that the two classes <code>Employee</code> and <code>Company</code> have a very different nature. They don't have the same interface, and if they have methods with the same name is just by chance and not because they are serving the same purpose.</p><h3 id="entities-classes-or-instances-1c89">Entities: classes or instances</h3><p>The last point of view that I want to explore is that of the entities involved in the delegation. When we discuss a theoretical delegation, for example saying "This Boeing 747 is a plane, thus it flies" we are describing a delegation between abstract, immaterial objects, namely generic "planes" and generic "flying objects".</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">FlyingObject</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">class</span> <span class="nc">Plane</span><span class="p">(</span><span class="n">FlyingObject</span><span class="p">):</span>
<span class="k">pass</span>
<span class="o">>>></span> <span class="n">boeing747</span> <span class="o">=</span> <span class="n">Plane</span><span class="p">()</span>
</pre></div> </div> </div><p>Since <code>Plane</code> and <code>FlyingObject</code> share the same underlying nature, their relationship is valid for all objects of that type and it is thus established between classes, which are ideas that become concrete when instantiated.</p><p>When we use composition, instead, we are putting into play a delegation that is not valid for all objects of that type, but only for those that we connected. For example, we can separate gears from the rest of a bicycle, and it is only when we put together <em>that</em> specific set of gears and <em>that</em> bicycle that the delegation happens. So, while we can think theoretically at bicycles and gears, the actual delegation happens only when dealing with concrete objects.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Gears</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">current</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">def</span> <span class="nf">up</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">current</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">8</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">down</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">current</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Bicycle</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">gears</span> <span class="o">=</span> <span class="n">Gears</span><span class="p">()</span> <span class="callout">1</span>
<span class="k">def</span> <span class="nf">gear_up</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">gears</span><span class="o">.</span><span class="n">up</span><span class="p">()</span> <span class="callout">2</span>
<span class="k">def</span> <span class="nf">gear_down</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">gears</span><span class="o">.</span><span class="n">down</span><span class="p">()</span> <span class="callout">3</span>
<span class="o">>>></span> <span class="n">bicycle</span> <span class="o">=</span> <span class="n">Bicycle</span><span class="p">()</span>
</pre></div> </div> </div><p>As you can see here, an instance of <code>Bicycle</code> contains an instance of <code>Gears</code> <span class="callout">1</span> and this allows us to create a delegation in the methods <code>gear_up</code> <span class="callout">2</span> and <code>gear_down</code> <span class="callout">3</span>. The delegation, however, happens between <code>bicycle</code> and <code>bicycle.gears</code> which are instances.</p><p>It is also possible, at least in Python, to have composition using pure classes, which is useful when the class is a pure helper or a simple container of methods (I'm not going to discuss here the benefits or the disadvantages of such a solution)</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Gears</span><span class="p">:</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">up</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">current</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">min</span><span class="p">(</span><span class="n">current</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">8</span><span class="p">)</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">down</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">current</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">max</span><span class="p">(</span><span class="n">current</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Bicycle</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">gears</span> <span class="o">=</span> <span class="n">Gears</span>
<span class="bp">self</span><span class="o">.</span><span class="n">current_gear</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">def</span> <span class="nf">gear_up</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">current_gear</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">gears</span><span class="o">.</span><span class="n">up</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current_gear</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">gear_down</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">current_gear</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">gears</span><span class="o">.</span><span class="n">down</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current_gear</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">bicycle</span> <span class="o">=</span> <span class="n">Bicycle</span><span class="p">()</span>
</pre></div> </div> </div><p>Now, when we run <code>bicycle.gear_up</code> the delegation happens between <code>bicycle</code>, and instance, and <code>Gears</code>, a class. We might extend this forward to have a class which class methods call class methods of another class, but I won't give an example of this because it sounds a bit convoluted and probably not very reasonable to do. But it can be done.</p><p>So, we might devise a pattern here and say that in composition there is no rule that states the nature of the entities involved in the delegation, but that most of the time this happens between instances.</p><p>Note for advanced readers: in Python, classes are instances of a metaclass, usually <code>type</code>, and <code>type</code> is an instance of itself, so it is correct to say that composition happens always between instances.</p><h2 id="bad-signs-0ddc">Bad signs<a class="headerlink" href="#bad-signs-0ddc" title="Permanent link">¶</a></h2><p>Now that we looked at the two delegations strategies from different points of view, it's time to discuss what happens when you use the wrong one. You might have heard of the "composition over inheritance" mantra, which comes from the fact that inheritance is often overused. This wasn't and is not helped by the fact that OOP is presented as encapsulation, inheritance, and polymorphism; open a random OOP post or book and you will see this with your own eyes.</p><p>Please, bloggers, authors, mentors, teachers, and overall programmers: <strong>stop considering inheritance the only delegation system in OOP</strong>.</p><p>That said, I think we should avoid going from one extreme to the opposite, and in general learn to use the tools languages give us. So, let's learn how to recognise the "smell" of bad code!</p><p>You are incorrectly using inheritance when:</p><ul><li>There is a clash between attributes with the same name and different meanings. In this case, you are incorrectly sharing the state of a parent class with the child one (visibility). With composition the state of another object is namespaced and it's always clear which attribute you are dealing with.</li><li>You feel the need to remove methods from the child class. This is typically a sign that you are polluting the class interface (relationship) with the content of the parent class. using composition makes it easy to expose only the methods that you want to delegate.</li></ul><p>You are incorrectly using composition when:</p><ul><li>You have to map too many methods from the container class to the contained one, to expose them. The two objects might benefit from the automatic delegation mechanism (control) provided by inheritance, with the child class overriding the methods that should behave differently.</li><li>You are composing instances, but creating many class methods so that the container can access them. This means that the nature of the delegation is more related to the code and the object might benefit from inheritance, where the classes delegate the method calls, instead of relying on the relationship between instances.</li></ul><p>Overall, code smells for inheritance are the need to override or delete attributes and methods, changes in one class affecting too many other classes in the inheritance tree, big classes that contain heavily unrelated methods. For composition: too many methods that just wrap methods of the contained instances, the need to pass too many arguments to methods, classes that are too empty and that just contain one instance of another class.</p>
<div class="advertisement">
<a href="https://www.thedigitalcat.academy/freebie-first-class-objects">
<img src="/images/first-class-objects/cover.jpg" />
</a>
<div class="body">
<h2 id="first-class-objects-in-python-fffa">First-class objects in Python<a class="headerlink" href="#first-class-objects-in-python-fffa" title="Permanent link">¶</a></h2>
<p>Higher-order functions, wrappers, and factories</p>
<p>Learn all you need to know to understand first-class citizenship in Python, the gateway to grasp how decorators work and how functional programming can supercharge your code.</p>
<div class="actions">
<a class="action" href="https://www.thedigitalcat.academy/freebie-first-class-objects">Get your FREE copy</a>
</div>
</div>
</div>
<h2 id="domain-modelling-f8e6">Domain modelling<a class="headerlink" href="#domain-modelling-f8e6" title="Permanent link">¶</a></h2><p>We all know that there are few cases (in computer science as well as in life) where we can draw a clear line between two options and that most of the time the separation is blurry. There are many grey shades between black and white.</p><p>The same applies to composition and inheritance. While the nature of the relationship often can guide us to the best solution, we are not always dealing with the representation of real objects, and even when we do we always have to keep in mind that we are <em>modelling</em> them, not implementing them perfectly.</p><p>As a colleague of mine told me once, we have to represent reality with our code, but we have to avoid representing it too faithfully, to avoid bringing reality's limitations into our programs.</p><p>I believe this is very true, so I think that when it comes to choosing between composition an inheritance we need to be guided by the nature of the relationship <em>in our system</em>. In this, object-oriented programming and database design are very similar. When you design a database you have to think about the domain and the way you extract information, not (only) about the real-world objects that you are modelling.</p><p>Let's consider a quick example, bearing in mind that I'm only scratching the surface of something about which people write entire books. Let's pretend we are designing a web application that manages companies and their owners, and we started with the consideration that and <code>Owner</code>, well, <em>owns</em> the <code>Company</code>. This is a clear composition relationship.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Company</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="k">class</span> <span class="nc">Owner</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">first_name</span><span class="p">,</span> <span class="n">last_name</span><span class="p">,</span> <span class="n">company_name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">first_name</span> <span class="o">=</span> <span class="n">first_name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">last_name</span> <span class="o">=</span> <span class="n">last_name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">company</span> <span class="o">=</span> <span class="n">Company</span><span class="p">(</span><span class="n">company_name</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">owner1</span> <span class="o">=</span> <span class="n">Owner</span><span class="p">(</span><span class="s2">"John"</span><span class="p">,</span> <span class="s2">"Doe"</span><span class="p">,</span> <span class="s2">"Pear"</span><span class="p">)</span>
</pre></div> </div> </div><p>Unfortunately, this automatically limits the number of companies owned by an <code>Owner</code> to one. If we want to relax that requirement, the best way to do it is to reverse the composition, and make the <code>Company</code> contain the <code>Owner</code>.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Owner</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">first_name</span><span class="p">,</span> <span class="n">last_name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">first_name</span> <span class="o">=</span> <span class="n">first_name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">last_name</span> <span class="o">=</span> <span class="n">last_name</span>
<span class="k">class</span> <span class="nc">Company</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">owner_first_name</span><span class="p">,</span> <span class="n">owner_last_name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">owner</span> <span class="o">=</span> <span class="n">Owner</span><span class="p">(</span><span class="n">owner_first_name</span><span class="p">,</span> <span class="n">owner_last_name</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">company1</span> <span class="o">=</span> <span class="n">Company</span><span class="p">(</span><span class="s2">"Pear"</span><span class="p">,</span> <span class="s2">"John"</span><span class="p">,</span> <span class="s2">"Doe"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">company2</span> <span class="o">=</span> <span class="n">Company</span><span class="p">(</span><span class="s2">"Pulses"</span><span class="p">,</span> <span class="s2">"John"</span><span class="p">,</span> <span class="s2">"Doe"</span><span class="p">)</span>
</pre></div> </div> </div><p>As you can see this is in direct contrast with the initial modelling that comes from our perception of the relationship between the two in the real world, which in turn comes from the specific word "owner" that I used. If I used a different word like "president" or "CEO", you would immediately accept the second solution as more natural, as the "president" is one of many employees.</p><p>The code above is not satisfactory, though, as it initialises <code>Owner</code> every time we create a company, while we might want to use the same instance. Again, this is not mandatory, it depends on the data contained in the <code>Owner</code> objects and the level of consistency that we need. For example, if we add to the owner an attribute <code>online</code> to mark that they are currently using the website and can be reached on the internal chat, we don't want have to cycle between all companies and set the owner's online status for each of them if the owner is the same. So, we might want to change the way we compose them, passing an instance of <code>Owner</code> instead of the data used to initialise it.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Owner</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">first_name</span><span class="p">,</span> <span class="n">last_name</span><span class="p">,</span> <span class="n">online</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">first_name</span> <span class="o">=</span> <span class="n">first_name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">last_name</span> <span class="o">=</span> <span class="n">last_name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">online</span> <span class="o">=</span> <span class="n">online</span>
<span class="k">class</span> <span class="nc">Company</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">owner</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">owner</span> <span class="o">=</span> <span class="n">owner</span>
<span class="o">>>></span> <span class="n">owner1</span> <span class="o">=</span> <span class="n">Owner</span><span class="p">(</span><span class="s2">"John"</span><span class="p">,</span> <span class="s2">"Doe"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">company1</span> <span class="o">=</span> <span class="n">Company</span><span class="p">(</span><span class="s2">"Pear"</span><span class="p">,</span> <span class="n">owner1</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">company2</span> <span class="o">=</span> <span class="n">Company</span><span class="p">(</span><span class="s2">"Pulses"</span><span class="p">,</span> <span class="n">owner1</span><span class="p">)</span>
</pre></div> </div> </div><p>Clearly, if the class <code>Company</code> has no other purpose than having a name, using a class is overkill, so this design might be further reduced to an <code>Owner</code> with a list of company names.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Owner</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">first_name</span><span class="p">,</span> <span class="n">last_name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">first_name</span> <span class="o">=</span> <span class="n">first_name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">last_name</span> <span class="o">=</span> <span class="n">last_name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">companies</span> <span class="o">=</span> <span class="p">[]</span>
<span class="o">>>></span> <span class="n">owner1</span> <span class="o">=</span> <span class="n">Owner</span><span class="p">(</span><span class="s2">"John"</span><span class="p">,</span> <span class="s2">"Doe"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">owner1</span><span class="o">.</span><span class="n">companies</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span><span class="s2">"Pear"</span><span class="p">,</span> <span class="s2">"Pulses"</span><span class="p">])</span>
</pre></div> </div> </div><p>Can we use inheritance? Now I am stretching the example to its limit, but I can accept there might be a use case for something like this.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Owner</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">first_name</span><span class="p">,</span> <span class="n">last_name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">first_name</span> <span class="o">=</span> <span class="n">first_name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">last_name</span> <span class="o">=</span> <span class="n">last_name</span>
<span class="k">class</span> <span class="nc">Company</span><span class="p">(</span><span class="n">Owner</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">owner_first_name</span><span class="p">,</span> <span class="n">owner_last_name</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">owner_first_name</span><span class="p">,</span> <span class="n">owner_last_name</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">company1</span> <span class="o">=</span> <span class="n">Company</span><span class="p">(</span><span class="s2">"Pear"</span><span class="p">,</span> <span class="s2">"John"</span><span class="p">,</span> <span class="s2">"Doe"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">company2</span> <span class="o">=</span> <span class="n">Company</span><span class="p">(</span><span class="s2">"Pulses"</span><span class="p">,</span> <span class="s2">"John"</span><span class="p">,</span> <span class="s2">"Doe"</span><span class="p">)</span>
</pre></div> </div> </div><p>As I showed in the previous sections, though, this code smells as soon as we start adding something like the <code>email</code> address.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Owner</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">first_name</span><span class="p">,</span> <span class="n">last_name</span><span class="p">,</span> <span class="n">email</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">first_name</span> <span class="o">=</span> <span class="n">first_name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">last_name</span> <span class="o">=</span> <span class="n">last_name</span>
<span class="bp">self</span><span class="o">.</span><span class="n">email</span> <span class="o">=</span> <span class="n">email</span>
<span class="k">class</span> <span class="nc">Company</span><span class="p">(</span><span class="n">Owner</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">owner_first_name</span><span class="p">,</span> <span class="n">owner_last_name</span><span class="p">,</span> <span class="n">email</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">owner_first_name</span><span class="p">,</span> <span class="n">owner_last_name</span><span class="p">,</span> <span class="n">email</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">company1</span> <span class="o">=</span> <span class="n">Company</span><span class="p">(</span><span class="s2">"Pear"</span><span class="p">,</span> <span class="s2">"John"</span><span class="p">,</span> <span class="s2">"Doe"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">company2</span> <span class="o">=</span> <span class="n">Company</span><span class="p">(</span><span class="s2">"Pulses"</span><span class="p">,</span> <span class="s2">"John"</span><span class="p">,</span> <span class="s2">"Doe"</span><span class="p">)</span>
</pre></div> </div> </div><p>Is <code>email</code> that of the company or the personal one of its owner? There is a clash, and this is a good example of "state pollution": both attributes have the same name, but they represent different things and might need to coexist.</p><p>In conclusion, as you can see we have to be very careful to discuss relationships between objects in the context of our domain and avoid losing connection with the business logic.</p><h2 id="mixing-the-two-composed-inheritance-dc32">Mixing the two: composed inheritance<a class="headerlink" href="#mixing-the-two-composed-inheritance-dc32" title="Permanent link">¶</a></h2><p>Speaking of blurry separations, Python offers an interesting hook to its internal attribute resolution mechanism which allows us to create a hybrid between composition and inheritance that I call "composed inheritance".</p><p>Let's have a look at what happens internally when we deal with classes that are linked through inheritance.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Parent</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span>
<span class="k">def</span> <span class="nf">info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Value: </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Child</span><span class="p">(</span><span class="n">Parent</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">is_even</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span>
<span class="o">>>></span> <span class="n">c</span> <span class="o">=</span> <span class="n">Child</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">c</span><span class="o">.</span><span class="n">info</span><span class="p">()</span>
<span class="n">Value</span><span class="p">:</span> <span class="mi">5</span>
<span class="o">>>></span> <span class="n">c</span><span class="o">.</span><span class="n">is_even</span><span class="p">()</span>
<span class="kc">False</span>
</pre></div> </div> </div><p>This is a trivial example of an inheritance relationship between <code>Child</code> and <code>Parent</code>, where <code>Parent</code> provides the methods <code>__init__</code> and <code>info</code> and <code>Child</code> augments the interface with the method <code>is_even</code>.</p><p>Let's have a look at the internals of the two classes. <code>Parent.__dict__</code> is</p><div class="code"><div class="content"><div class="highlight"><pre><span class="n">mappingproxy</span><span class="p">({</span><span class="s1">'__module__'</span><span class="p">:</span> <span class="s1">'__main__'</span><span class="p">,</span>
<span class="s1">'__init__'</span><span class="p">:</span> <span class="o"><</span><span class="n">function</span> <span class="n">__main__</span><span class="o">.</span><span class="n">Parent</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span><span class="o">></span><span class="p">,</span>
<span class="s1">'info'</span><span class="p">:</span> <span class="o"><</span><span class="n">function</span> <span class="n">__main__</span><span class="o">.</span><span class="n">Parent</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span><span class="o">></span><span class="p">,</span>
<span class="s1">'__dict__'</span><span class="p">:</span> <span class="o"><</span><span class="n">attribute</span> <span class="s1">'__dict__'</span> <span class="n">of</span> <span class="s1">'Parent'</span> <span class="n">objects</span><span class="o">></span><span class="p">,</span>
<span class="s1">'__weakref__'</span><span class="p">:</span> <span class="o"><</span><span class="n">attribute</span> <span class="s1">'__weakref__'</span> <span class="n">of</span> <span class="s1">'Parent'</span> <span class="n">objects</span><span class="o">></span><span class="p">,</span>
<span class="s1">'__doc__'</span><span class="p">:</span> <span class="kc">None</span><span class="p">}</span>
</pre></div> </div> </div><p>and <code>Child.__dict__</code> is</p><div class="code"><div class="content"><div class="highlight"><pre><span class="n">mappingproxy</span><span class="p">({</span><span class="s1">'__module__'</span><span class="p">:</span> <span class="s1">'__main__'</span><span class="p">,</span>
<span class="s1">'is_even'</span><span class="p">:</span> <span class="o"><</span><span class="n">function</span> <span class="n">__main__</span><span class="o">.</span><span class="n">Child</span><span class="o">.</span><span class="n">is_even</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span><span class="o">></span><span class="p">,</span>
<span class="s1">'__doc__'</span><span class="p">:</span> <span class="kc">None</span><span class="p">})</span>
</pre></div> </div> </div><p>Finally, the bond between the two is established through <code>Child.__bases__</code>, which has the value <code>(__main__.Parent,)</code>.</p><p>So, when we call <code>c.is_even</code> the instance has a bound method that comes from the class <code>Child</code>, as its <code>__dict__</code> contains the function <code>is_even</code>. Conversely, when we call <code>c.info</code> Python has to fetch it from <code>Parent</code>, as <code>Child</code> can't provide it. This mechanism is implemented by the method <code>__getattribute__</code> that is the core of the Python inheritance system.</p><p>As I mentioned before, however, there is a hook into this system that the language provides us, namely the method <code>__getattr__</code>, which is not present by default. What happens is that when a class can't provide an attribute, Python <em>first</em> tries to get the attribute with the standard inheritance mechanism but if it can't be found, as a last resort it tries to run <code>__getattr__</code> passing the attribute name.</p><p>An example can definitely clarify the matter.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Parent</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span>
<span class="k">def</span> <span class="nf">info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Value: </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Child</span><span class="p">(</span><span class="n">Parent</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">is_even</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span>
<span class="k">def</span> <span class="fm">__getattr__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">attr</span><span class="p">):</span>
<span class="k">if</span> <span class="n">attr</span> <span class="o">==</span> <span class="s2">"secret"</span><span class="p">:</span>
<span class="k">return</span> <span class="s2">"a_secret_string"</span>
<span class="k">raise</span> <span class="ne">AttributeError</span>
<span class="o">>>></span> <span class="n">c</span> <span class="o">=</span> <span class="n">Child</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
</pre></div> </div> </div><p>Now, if we try to access <code>c.secret</code>, Python would raise an <code>AttributeError</code>, as neither <code>Child</code> nor <code>Parent</code> can provide that attribute. As a last resort, though, Python runs <code>c.__getattr__("secret")</code>, and the code of that method that we implemented in the class <code>Child</code> returns the string <code>"a_secret_string"</code>. Please note that the value of the argument <code>attr</code> is the <em>name</em> of the attribute as a string.</p><p>Because of the catch-all nature of <code>__getattr__</code>, we eventually have to raise an <code>AttributeError</code> to keep the inheritance mechanism working, unless we actually need or want to implement something very special.</p><p>This opens the door to an interesting hybrid solution where we can compose objects retaining an automatic delegation mechanism.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Parent</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span>
<span class="k">def</span> <span class="nf">info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Value: </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Child</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">parent</span> <span class="o">=</span> <span class="n">Parent</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">is_even</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span>
<span class="k">def</span> <span class="fm">__getattr__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">attr</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">getattr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">parent</span><span class="p">,</span> <span class="n">attr</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">c</span> <span class="o">=</span> <span class="n">Child</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">c</span><span class="o">.</span><span class="n">value</span>
<span class="mi">5</span>
<span class="o">>>></span> <span class="n">c</span><span class="o">.</span><span class="n">info</span><span class="p">()</span>
<span class="n">Value</span><span class="p">:</span> <span class="mi">5</span>
<span class="o">>>></span> <span class="n">c</span><span class="o">.</span><span class="n">is_even</span><span class="p">()</span>
<span class="kc">False</span>
</pre></div> </div> </div><p>As you can see, here <code>Child</code> is composing <code>Parent</code> and there is no inheritance between the two. We can nevertheless access <code>c.value</code> and call <code>c.info</code>, thanks to the face that <code>Child.__getattr__</code> is delegating everything can't be found in <code>Child</code> to the instance of <code>Parent</code> stored in <code>self.parent</code>.</p><p>Note: don't confuse <code>getattr</code> with <code>__getattr__</code>. The former is a builtin function that gets an attribute provided its name, a replacement for the dotted notation when the name of the attribute is known as a string. The latter is the hook into the inheritance mechanism that I described in this section.</p><p>Now, this is very powerful, but is it also useful?</p><p>I think this is not one of the techniques that will drastically change the way you write code in Python, but it can definitely help you to use composition instead of inheritance even when the amount of methods that you have to wrap is high. One of the limits of composition is that you are at the extreme spectrum of automatism; while inheritance is completely automatic, composition doesn't do anything for you. This means that when you compose objects you need to decide which methods or attributes of the contained objects you want to wrap, in order to expose then in the container object. In the previous example, the class <code>Child</code> might want to expose the attribute <code>value</code> and the method <code>info</code>, which would result in something like</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Parent</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span>
<span class="k">def</span> <span class="nf">info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Value: </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Child</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">parent</span> <span class="o">=</span> <span class="n">Parent</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">is_even</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span>
<span class="k">def</span> <span class="nf">info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parent</span><span class="o">.</span><span class="n">info</span><span class="p">()</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">value</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parent</span><span class="o">.</span><span class="n">value</span>
</pre></div> </div> </div><p>As you can easily see, the more <code>Child</code> wants to expose of the <code>Parent</code> interface, the more wrapper methods and properties you need. To be perfectly clear, in this example the code above smells, as there are too many one-liner wrappers, which tells me it would be better to use inheritance. But if the class <code>Child</code> had a dozen of its own methods, suddenly it would make sense to do something like this, and in that case, <code>__getattr__</code> might come in handy.</p><h2 id="final-words-9803">Final words<a class="headerlink" href="#final-words-9803" title="Permanent link">¶</a></h2><p>Both composition and inheritance are tools, and both exist to serve the bigger purpose of code reuse, so learn their strength and their weaknesses, so that you might be able to use the correct one and avoid future issues in your code.</p><p>I hope this rather long discussion helped you to get a better picture of the options you have when you design an object-oriented system, and also maybe introduced some new ideas or points of view if you are already comfortable with the concepts I wrote about.</p><h2 id="updates-0083">Updates<a class="headerlink" href="#updates-0083" title="Permanent link">¶</a></h2><p>2021-03-06 Following the suggestion of <a href="https://github.com/TimoMorris">Tim Morris</a> I added the console output to the source code to make the code easier to understand. Thanks Tim for the feedback!</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>A game of tokens: write an interpreter in Python with TDD - Part 52020-08-09T18:00:00+01:002020-08-09T18:00:00+01:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-08-09:/blog/2020/08/09/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-5/<h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2>
<p>This is part 5 of <a href="https://www.thedigitalcatonline.com/blog/2017/05/09/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-1/">A game of tokens</a>, a series of posts where I build an interpreter in Python following a pure TDD methodology and engaging you in a sort of a game: I give you the tests and you have to write the code that passes them …</p><h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2>
<p>This is part 5 of <a href="https://www.thedigitalcatonline.com/blog/2017/05/09/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-1/">A game of tokens</a>, a series of posts where I build an interpreter in Python following a pure TDD methodology and engaging you in a sort of a game: I give you the tests and you have to write the code that passes them. After part 4 I had a long hiatus because I focused on other projects, but now I resurrected this series and I'm moving on.</p>
<p>First of all I reviewed the first 4 posts, merging the posts that contained the solutions. While this is definitely better for me, I think it might be better for the reader as well, this way it should be easier to follow along. Remember however that you learn if you do, not if you read!</p>
<p>Secondly, I was wondering in which direction to go, and I decided to shamelessly follow the steps of Ruslan Spivak, who first inspired this set of posts and who set off to build an Pascal interpreter; you can find the impressive series of posts Ruslan wrote on <a href="https://ruslanspivak.com">his website</a>. Thank you Ruslan for the great posts!</p>
<p>So, let's go Pascal!</p>
<h2 id="tools-update">Tools update<a class="headerlink" href="#tools-update" title="Permanent link">¶</a></h2>
<p>I introduced black into my development toolset, so I used it to reformat the code</p>
<div class="highlight"><pre><span></span><code>black<span class="w"> </span>smallcalc/*.py<span class="w"> </span>tests/*.py
</code></pre></div>
<p>And added a configuration file <code>.flake8</code> for Flake8 to avoid the two tools to clash</p>
<div class="highlight"><pre><span></span><code><span class="k">[flake8]</span>
<span class="c1"># Recommend matching the black line length (default 88),</span>
<span class="c1"># rather than using the flake8 default of 79:</span>
<span class="na">max-line-length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">100</span>
<span class="na">ignore</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">E231 E741</span>
</code></pre></div>
<h2 id="level-17-reserved-keywords-and-new-assignment">Level 17 - Reserved keywords and new assignment<a class="headerlink" href="#level-17-reserved-keywords-and-new-assignment" title="Permanent link">¶</a></h2>
<p>Since Pascal has reserved keywords, I need tokens that have the keyword itself as value (something similar to Erlang's atoms). For this reason I changed <code>test_empty_token_has_length_zero</code> into</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_empty_token_has_the_length_of_the_type_itself</span><span class="p">():</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="s2">"sometype"</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="o">==</span> <span class="nb">len</span><span class="p">(</span><span class="s2">"sometype"</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">bool</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
</code></pre></div>
<p>and modified the code in the class <code>Token</code> to pass it</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="fm">__len__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="k">else</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">)</span>
</code></pre></div>
<p>The keywords I will introduce in this post are <code>BEGIN</code> and <code>END</code>, so I need a test that shows they are supported</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_begin_and_end</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN END"</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">END</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">),</span>
<span class="p">]</span>
</code></pre></div>
<p>The block <code>BEGIN ... END</code> is a generic compound block in Pascal (more on this later), and a Pascal program is made of that plus a final dot. Since the dot is already used for floats I need a test that shows it is correctly lexed.</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_final_dot</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN END."</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">END</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">DOT</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">),</span>
<span class="p">]</span>
</code></pre></div>
<p>Last, Pascal assignments are sligthly different from what we already implemented, as they use the symbol <code>:=</code> instead of just <code>=</code>. We face a choice here, as we have to decide where to put the logic of our programming language: shall the lexer identify <code>:</code> and <code>=</code> separately, and let the parser deal with the two tokens in sequence, or shall we make the lexer emit an <code>ASSIGNMENT</code> token directly? I went for the first one, so that the lexer can be kept simple (no lookahead in it), but you are obviously free to try something different. For me the test that checks the assignment is</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_assignment_and_semicolon</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"a := 5;"</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">NAME</span><span class="p">,</span> <span class="s2">"a"</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">":"</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">"="</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s2">"5"</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">),</span>
<span class="p">]</span>
</code></pre></div>
<p>You may have noticed I also decided to check for the semicolon in this test. Even here, we might discuss if it's meaningful to test two different things together, and generally speaking I'm in favour of a high granularity in tests, which however means that I try to avoid testing <em>unrelated</em> and <em>complicated</em> features together. In Pascal, the semicolon is used to separate statements, so it is likely be found at the end of something like an assignment. For this reason, and considering that it's a small feature, I put it in a context inside this test, and will extract it if more complex requirements arise in the future.</p>
<p>The parser has to be changed to support the new assignment, and to do that we first need to change the tests. The symbol <code>=</code> has to be replaced with <code>:=</code> in the following tests: <code>test_parse_assignment</code>, <code>test_parse_assignment_with_expression</code>, <code>test_parse_assignment_expression_with_variables</code>, and <code>test_parse_line_supports_assigment</code>.</p>
<h3 id="solution">Solution<a class="headerlink" href="#solution" title="Permanent link">¶</a></h3>
<p>Supporting reserved keywords is just a matter of defining specific token types for them</p>
<div class="highlight"><pre><span></span><code><span class="n">BEGIN</span> <span class="o">=</span> <span class="s2">"BEGIN"</span>
<span class="n">DOT</span> <span class="o">=</span> <span class="s2">"DOT"</span>
<span class="n">RESERVED_KEYWORDS</span> <span class="o">=</span> <span class="p">[</span><span class="n">BEGIN</span><span class="p">,</span> <span class="n">END</span><span class="p">]</span>
</code></pre></div>
<p>and changing the method <code>_process_name</code> in order to detect them</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">_process_name</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">regexp</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s2">"[a-zA-Z_]+"</span><span class="p">)</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">regexp</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">tail</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">match</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">token_string</span> <span class="o">=</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">()</span>
<span class="k">if</span> <span class="n">token_string</span> <span class="ow">in</span> <span class="n">RESERVED_KEYWORDS</span><span class="p">:</span>
<span class="n">tok</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">token_string</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">tok</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">NAME</span><span class="p">,</span> <span class="n">token_string</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span><span class="n">tok</span><span class="p">)</span>
</code></pre></div>
<p>I decided to put the logic in this method because after all reserved keywords are exactly names with a specific meaning. I might have created a dedicated method <code>_process_keyword</code> but it would basically have been a copy of <code>_process_name</code> so this solution makes sense to me.</p>
<p>To support the final dot I added a token for it</p>
<div class="highlight"><pre><span></span><code><span class="n">DOT</span> <span class="o">=</span> <span class="s2">"DOT"</span>
</code></pre></div>
<p>and a processing method</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_process_dot</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">regexp</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s2">"\.$"</span><span class="p">)</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">regexp</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">tail</span><span class="p">)</span>
<span class="k">if</span> <span class="n">match</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">DOT</span><span class="p">))</span>
</code></pre></div>
<p>which is then introduced with a high priority in <code>get_token</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">get_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">eof</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_eof</span><span class="p">()</span>
<span class="k">if</span> <span class="n">eof</span><span class="p">:</span>
<span class="k">return</span> <span class="n">eof</span>
<span class="n">eol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_eol</span><span class="p">()</span>
<span class="k">if</span> <span class="n">eol</span><span class="p">:</span>
<span class="k">return</span> <span class="n">eol</span>
<span class="n">dot</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_dot</span><span class="p">()</span>
<span class="k">if</span> <span class="n">dot</span><span class="p">:</span>
<span class="k">return</span> <span class="n">dot</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_process_whitespace</span><span class="p">()</span>
<span class="n">name</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_name</span><span class="p">()</span>
<span class="k">if</span> <span class="n">name</span><span class="p">:</span>
<span class="k">return</span> <span class="n">name</span>
<span class="n">number</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_number</span><span class="p">()</span>
<span class="k">if</span> <span class="n">number</span><span class="p">:</span>
<span class="k">return</span> <span class="n">number</span>
<span class="n">literal</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_literal</span><span class="p">()</span>
<span class="k">if</span> <span class="n">literal</span><span class="p">:</span>
<span class="k">return</span> <span class="n">literal</span>
</code></pre></div>
<p>To pass the parser tests I just need to change the implementation of <code>parse_assignment</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">parse_assignment</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">variable</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_variable</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">":"</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">"="</span><span class="p">))</span>
<span class="n">value</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
</code></pre></div>
<h2 id="level-18-statements-and-compound-statements">Level 18 - Statements and compound statements<a class="headerlink" href="#level-18-statements-and-compound-statements" title="Permanent link">¶</a></h2>
<p>In Pascal a compound statement is a list of statements enclosed between <code>BEGIN</code> and <code>END</code>, so the final grammar we want to have in this post is</p>
<div class="highlight"><pre><span></span><code>compound_statement : BEGIN statement_list END
statement_list : statement | statement SEMI statement_list
statement : compound_statement | assignment_statement | empty
assignment_statement : variable ASSIGN expr
</code></pre></div>
<p>As you can see this is a recursive definition, as the <code>statement_list</code> contains one or more <code>statement</code>, and each of them can be a <code>compound_statement</code>. The following is indeed a valid Pascal program</p>
<div class="highlight"><pre><span></span><code><span class="k">BEGIN</span>
<span class="w"> </span><span class="k">BEGIN</span>
<span class="w"> </span><span class="k">BEGIN</span>
<span class="w"> </span><span class="nb">writeln</span><span class="p">(</span><span class="err">"</span><span class="n">Valid</span><span class="err">!"</span><span class="p">)</span>
<span class="w"> </span><span class="k">END</span>
<span class="w"> </span><span class="k">END</span>
<span class="k">END</span><span class="o">.</span>
</code></pre></div>
<p>Recursive algorithms are not simple, and it takes some time to tackle them properly. Let's try to implement one small feature at a time. The first test is that <code>parse_statement</code> should be able to parse assignments</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_statement_assignment</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"x := 5"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">}</span>
</code></pre></div>
<p>In future, statements will be more than just assignments, so this test is the first of many others that we will eventually have for <code>parse_statement</code>. The second test we need is that a compound statement can contain an empty list of statements.</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_empty_compound_statement</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN END"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_compound_statement</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span> <span class="s2">"statements"</span><span class="p">:</span> <span class="p">[]}</span>
</code></pre></div>
<p>After this is done, I want to test that the compound statement can contains one single statement</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_compound_statement_one_statement</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN x:= 5 END"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_compound_statement</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="p">}</span>
</code></pre></div>
<p>and multiple statements separated by semicolon</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_compound_statement_multiple_statements</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN x:= 5; y:=6; z:=7 END"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_compound_statement</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"y"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">6</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"z"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">7</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">],</span>
<span class="p">}</span>
</code></pre></div>
<h3 id="solution_1">Solution<a class="headerlink" href="#solution_1" title="Permanent link">¶</a></h3>
<p>To pass the first test it is sufficient to add a method <code>parse_statement</code> that calls <code>parse_assignment</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_assignment</span><span class="p">()</span>
</code></pre></div>
<p>The second test requires a bit more code. I need to define a method <code>parse_compound_statement</code> and this has to return a specific new type of node. A compound statement is s list of statements that have to be executed in order, so it's time to define a class <code>CompoundStatementNode</code></p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">CompoundStatementNode</span><span class="p">(</span><span class="n">Node</span><span class="p">):</span>
<span class="n">node_type</span> <span class="o">=</span> <span class="s2">"compound_statement"</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">statements</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">statements</span> <span class="o">=</span> <span class="n">statements</span> <span class="k">if</span> <span class="n">statements</span> <span class="k">else</span> <span class="p">[]</span>
<span class="k">def</span> <span class="nf">asdict</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">node_type</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span><span class="n">statement</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="k">for</span> <span class="n">statement</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">statements</span><span class="p">],</span>
<span class="p">}</span>
</code></pre></div>
<p>and at this point <code>parse_compound_statement</code> is trivial, at least for now</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_compound_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">END</span><span class="p">))</span>
<span class="k">return</span> <span class="n">CompoundStatementNode</span><span class="p">()</span>
</code></pre></div>
<p>With the third test we have to add the processing of a single statement. As this is optional, it's a good use case for our lexer as a context manager</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_compound_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">))</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">END</span><span class="p">))</span>
<span class="k">return</span> <span class="n">CompoundStatementNode</span><span class="p">(</span><span class="n">nodes</span><span class="p">)</span>
</code></pre></div>
<p>And finally, for the fourth test, I have to process optional further statements separated by semicolons. For this, I make use of the method <code>peek_token</code> to look ahead and see if there is another statement to process</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_compound_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">))</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">))</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">END</span><span class="p">))</span>
<span class="k">return</span> <span class="n">CompoundStatementNode</span><span class="p">(</span><span class="n">nodes</span><span class="p">)</span>
</code></pre></div>
<h2 id="level-19-recursive-compound-statements">Level 19 - Recursive compound statements<a class="headerlink" href="#level-19-recursive-compound-statements" title="Permanent link">¶</a></h2>
<p>To verify that compound statements are actually recursive, we can add this test</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_compound_statement_multiple_statements_with_compund_statement</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN x:= 5; BEGIN y := 6 END ; z:=7 END"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_compound_statement</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"y"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">6</span><span class="p">},</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"z"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">7</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">],</span>
<span class="p">}</span>
</code></pre></div>
<p>where the second statement is a compound statement itself. After this is done we can test the visitor (<code>tests/test_calc_visitor.py</code>) and see if we can process single statements</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_compound_statement_one_statement</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">None</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">5</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"integer"</span>
</code></pre></div>
<p>Multiple statements</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_compound_statement_multiple_statements</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"y"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">6</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"z"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">7</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">],</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">None</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">5</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"integer"</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s2">"y"</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s2">"y"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">6</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s2">"y"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"integer"</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s2">"z"</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s2">"z"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">7</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s2">"z"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"integer"</span>
</code></pre></div>
<p>and recursive compound statements</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_compound_statement_multiple_statements_with_compund_statement</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"y"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">6</span><span class="p">},</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"z"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">7</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">],</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">None</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">5</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"integer"</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s2">"y"</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s2">"y"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">6</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s2">"y"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"integer"</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s2">"z"</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s2">"z"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">7</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s2">"z"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"integer"</span>
</code></pre></div>
<h3 id="solution_2">Solution<a class="headerlink" href="#solution_2" title="Permanent link">¶</a></h3>
<p>Before I added the first test I quickly refactored the code to follow the grammar a bit more closely, introducing <code>parse_statement_list</code> and calling it from <code>parse_compound_statement</code>. This is just a matter of isolating the part of the code that deals with the list of statements in its own method</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_statement_list</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">))</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="k">return</span> <span class="n">nodes</span>
<span class="k">def</span> <span class="nf">parse_compound_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">))</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement_list</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">END</span><span class="p">))</span>
<span class="k">return</span> <span class="n">CompoundStatementNode</span><span class="p">(</span><span class="n">nodes</span><span class="p">)</span>
</code></pre></div>
<p>after this I introduce the new test, and to pass it I need to change <code>parse_statement</code> so that it parses either an assignment or a compound statement</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_assignment</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_compound_statement</span><span class="p">()</span>
</code></pre></div>
<p>Before I move to the visitor, I want to discuss a choice that I have here. The current version of the method <code>parse_statement_list</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_statement_list</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">))</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="k">return</span> <span class="n">nodes</span>
</code></pre></div>
<p>might be easily written in a recursive way, to better match the grammar, becoming</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_statement_list</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">))</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">parse_statement_list</span><span class="p">())</span>
<span class="k">return</span> <span class="n">nodes</span>
</code></pre></div>
<p>As you can see if you replace the code all the test pass, so the solution is technically correct. While recursive algorithms are elegant and compact, however, in this case I will stick to the first version. Using a recursive approach introduces a limit to the number of calls, and while in this little project we won't probably have this issue, I think it is worth mentioning it. Both solutions are correct, though, so feel free to choose the recursive path if you happen to like it more.</p>
<p>The tests for the visitor can be passed with a minimal change, as the visitor itself just needs to be aware of <code>compound_statement</code> nodes and to know how to process them. So, I added a new condition to the method <code>visit</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s2">"type"</span><span class="p">]</span> <span class="o">==</span> <span class="s2">"compound_statement"</span><span class="p">:</span>
<span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">node</span><span class="p">[</span><span class="s2">"statements"</span><span class="p">]]</span>
</code></pre></div>
<p>which passes all the three new tests added for the visitor.</p>
<h2 id="level-20-pascal-programs-and-case-insensitive-names">Level 20 - Pascal programs and case insensitive names<a class="headerlink" href="#level-20-pascal-programs-and-case-insensitive-names" title="Permanent link">¶</a></h2>
<p>A Pascal program ends with a dot, so we should introduce a new endpoint <code>parse_program</code> and test that it works. The first test verifies that we can parse an empty program</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_empty_program</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN END."</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_program</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span> <span class="s2">"statements"</span><span class="p">:</span> <span class="p">[]}</span>
</code></pre></div>
<p>and the second tests that the final dot can't be missing</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">pytest</span>
<span class="kn">from</span> <span class="nn">smallcalc.calc_lexer</span> <span class="kn">import</span> <span class="n">TokenError</span>
<span class="k">def</span> <span class="nf">test_parse_program_requires_the_final_dot</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN END"</span><span class="p">)</span>
<span class="k">with</span> <span class="n">pytest</span><span class="o">.</span><span class="n">raises</span><span class="p">(</span><span class="n">TokenError</span><span class="p">):</span>
<span class="n">p</span><span class="o">.</span><span class="n">parse_program</span><span class="p">()</span>
</code></pre></div>
<p>Notice that I imported <code>pytest</code> and the <code>TokenError</code> exception to build a negative test (i.e. to test something that fails). The last test verifies a non-empty program can be parsed</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_program_with_nested_statements</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN x:= 5; BEGIN y := 6 END ; z:=7 END."</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_program</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"y"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">6</span><span class="p">},</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"z"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">7</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">],</span>
<span class="p">}</span>
</code></pre></div>
<p>When all these tests pass we are almost done for this post, and we just need to make the parser treat names in a case insensitive way. In Pascal, both variables and keywords are case-insensitive, so <code>BEGIN</code> and <code>begin</code> are the same keyword (or <code>BeGiN</code>, though I think this might be a misinterpretation of the concept of "snake case" =) ), and the same is valid for variables: you can define <code>MYVAR</code> and use <code>myvar</code>.</p>
<p>To test this behaviour I changed the test <code>test_get_tokens_understands_uppercase_letters</code> into <code>test_get_tokens_is_case_insensitive</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_is_case_insensitive</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"SomeVar"</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">NAME</span><span class="p">,</span> <span class="s2">"somevar"</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">),</span>
<span class="p">]</span>
</code></pre></div>
<p>and added the test for the two keywords we defined so far</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_begin_and_end_case_insensitive</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"begin end"</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">END</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">),</span>
<span class="p">]</span>
</code></pre></div>
<h3 id="solution_3">Solution<a class="headerlink" href="#solution_3" title="Permanent link">¶</a></h3>
<p>To parse a program we need to introduce the aptly named endpoint <code>parse_program</code>, which just parses a compound statement (the program) and the final dot.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_program</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">compound_statement</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_compound_statement</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">DOT</span><span class="p">))</span>
<span class="k">return</span> <span class="n">compound_statement</span>
</code></pre></div>
<p>As for the case insensitive names, it's just a matter of changing the method <code>_process_name</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_process_name</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">regexp</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s2">"[a-zA-Z_]+"</span><span class="p">)</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">regexp</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">tail</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">match</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">token_string</span> <span class="o">=</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">()</span>
<span class="k">if</span> <span class="n">token_string</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span> <span class="ow">in</span> <span class="n">RESERVED_KEYWORDS</span><span class="p">:</span>
<span class="n">tok</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">token_string</span><span class="o">.</span><span class="n">upper</span><span class="p">())</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">tok</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">NAME</span><span class="p">,</span> <span class="n">token_string</span><span class="o">.</span><span class="n">lower</span><span class="p">())</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span><span class="n">tok</span><span class="p">)</span>
</code></pre></div>
<p>Note that I decided to keep internally keywords with uppercase names and variables with lowercase ones. This is really just a matter of personal taste at this point of the project (and probably will always be), so feel free to follow the structure you like the most.</p>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>That was something! I was honestly impressed by how easily I could introduce changes in the language and add new feature, a testimony that the TDD methodology is a really powerful tool to have in your belt. Thanks again to <a href="https://ruslanspivak.com/pages/about/">Ruslan Spivak</a> for his work and his inspiring posts!</p>
<p>The code I developed in this post is available on the GitHub repository tagged with <code>part5</code> (<a href="https://github.com/lgiordani/smallcalc/tree/part5">link</a>).</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>Flask project setup: TDD, Docker, Postgres and more - Part 32020-07-07T13:00:00+01:002021-02-23T20:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-07-07:/blog/2020/07/07/flask-project-setup-tdd-docker-postgres-and-more-part-3/<p>A step-by-step tutorial on how to setup a Flask project with TDD, Docker and Postgres</p><p>In this series of posts I explore the development of a Flask project with a setup that is built with efficiency and tidiness in mind, using TDD, Docker and Postgres.</p><h2 id="catch-up-7f97">Catch-up<a class="headerlink" href="#catch-up-7f97" title="Permanent link">¶</a></h2><p>In the <a href="https://www.thedigitalcatonline.com/blog/2020/07/05/flask-project-setup-tdd-docker-postgres-and-more-part-1/">first</a> and <a href="https://www.thedigitalcatonline.com/blog/2020/07/06/flask-project-setup-tdd-docker-postgres-and-more-part-2/">second</a> posts I created a Flask project with a tidy setup, using Docker to run the development environment and the tests, and mapping important commands in a management script, so that the configuration can be in a single file and drive the whole system.</p><p>In this post I will show you how to easily create scenarios, that is databases created on the fly with custom data, so that it is possible to test queries in isolation, either with the Flask application or with the command line. I will also show you how to define a configuration for production and give some hints for the deployment.</p><h2 id="step-1---creating-scenarios-59b9">Step 1 - Creating scenarios<a class="headerlink" href="#step-1---creating-scenarios-59b9" title="Permanent link">¶</a></h2><p>The idea of scenarios is simple. Sometimes you need to investigate specific use cases for bugs, or maybe increase the performances of some database queries, and you might need to do this on a customised database. This is a scenario, a Python file that populates the database with a specific set of data and that allows you to run the application or the database shell on it.</p><p>Often the development database is a copy of the production one, maybe with sensitive data stripped to avoid leaking private information, and while this gives us a realistic case where to test queries (e.g. how does the query perform on 1 million lines?) it might not help during the initial investigations, where you need to have all the data in front of you to properly understand what happens. Whoever learned how joins work in relational databases understands what I mean here.</p><p>In principle, to create a scenario we just need to spin up an empty database and to run the scenario code against it. In practice, things are not much more complicated, but there are a couple of minor issues that we need to solve.</p><p>First, I am already running a database for the development and one for the testing. The second is ephemeral, but I decided to setup the project so that I can run the tests while the development database is up, and the way I did it was using port 5432 (the standard Postgres one) for development and 5433 for testing. Spinning up scenarios adds more databases to the equation. Clearly I do not expect to run 5 scenarios at the same time while running the development and the test databases, but I make myself a rule to make something generic as soon I do it for the third time.</p><p>This means that I won't create a database for a scenario on port 5434 and will instead look for a more generic solution. This is offered me by the Docker networking model, where I can map a container port to the host but avoid assigning the destination port, and it will be chosen randomly by Docker itself among the unprivileged ones. This means that I can create a Postgres container mapping port 5432 (the port in the container) and having Docker connect it to port 32838 in the host (for example). As long as the application knows which port to use this is absolutely the same as using port 5432.</p><p>Unfortunately the Docker interface is not extremely script-friendly when it comes to providing information and I have to parse the output a bit. Practically speaking, after I spin up the containers, I will run the command <code>docker-compose port db 5432</code> which will return a string like <code>0.0.0.0:32838</code>, and I will extract the port from it. Nothing major, but these are the (sometimes many) issues you face when you orchestrate different systems together.</p><p>The new management script is</p><div class="code"><div class="title"><code>manage.py</code></div><div class="content"><div class="highlight"><pre><span class="ch">#! /usr/bin/env python</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">import</span> <span class="nn">signal</span>
<span class="kn">import</span> <span class="nn">subprocess</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">shutil</span>
<span class="kn">import</span> <span class="nn">click</span>
<span class="kn">import</span> <span class="nn">psycopg2</span>
<span class="kn">from</span> <span class="nn">psycopg2.extensions</span> <span class="kn">import</span> <span class="n">ISOLATION_LEVEL_AUTOCOMMIT</span>
<span class="c1"># Ensure an environment variable exists and has a value</span>
<span class="k">def</span> <span class="nf">setenv</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span> <span class="n">default</span><span class="p">):</span>
<span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="n">variable</span><span class="p">]</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span> <span class="n">default</span><span class="p">)</span>
<span class="n">setenv</span><span class="p">(</span><span class="s2">"APPLICATION_CONFIG"</span><span class="p">,</span> <span class="s2">"development"</span><span class="p">)</span>
<span class="n">APPLICATION_CONFIG_PATH</span> <span class="o">=</span> <span class="s2">"config"</span>
<span class="n">DOCKER_PATH</span> <span class="o">=</span> <span class="s2">"docker"</span>
<span class="k">def</span> <span class="nf">app_config_file</span><span class="p">(</span><span class="n">config</span><span class="p">):</span>
<span class="k">return</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">APPLICATION_CONFIG_PATH</span><span class="p">,</span> <span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">config</span><span class="si">}</span><span class="s2">.json"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">docker_compose_file</span><span class="p">(</span><span class="n">config</span><span class="p">):</span>
<span class="k">return</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">DOCKER_PATH</span><span class="p">,</span> <span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">config</span><span class="si">}</span><span class="s2">.yml"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">configure_app</span><span class="p">(</span><span class="n">config</span><span class="p">):</span>
<span class="c1"># Read configuration from the relative JSON file</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">app_config_file</span><span class="p">(</span><span class="n">config</span><span class="p">))</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">config_data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="c1"># Convert the config into a usable Python dictionary</span>
<span class="n">config_data</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">((</span><span class="n">i</span><span class="p">[</span><span class="s2">"name"</span><span class="p">],</span> <span class="n">i</span><span class="p">[</span><span class="s2">"value"</span><span class="p">])</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">config_data</span><span class="p">)</span>
<span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">config_data</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="n">setenv</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
<span class="nd">@click</span><span class="o">.</span><span class="n">group</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">cli</span><span class="p">():</span>
<span class="k">pass</span>
<span class="nd">@cli</span><span class="o">.</span><span class="n">command</span><span class="p">(</span><span class="n">context_settings</span><span class="o">=</span><span class="p">{</span><span class="s2">"ignore_unknown_options"</span><span class="p">:</span> <span class="kc">True</span><span class="p">})</span>
<span class="nd">@click</span><span class="o">.</span><span class="n">argument</span><span class="p">(</span><span class="s2">"subcommand"</span><span class="p">,</span> <span class="n">nargs</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">click</span><span class="o">.</span><span class="n">Path</span><span class="p">())</span>
<span class="k">def</span> <span class="nf">flask</span><span class="p">(</span><span class="n">subcommand</span><span class="p">):</span>
<span class="n">configure_app</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"APPLICATION_CONFIG"</span><span class="p">))</span>
<span class="n">cmdline</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"flask"</span><span class="p">]</span> <span class="o">+</span> <span class="nb">list</span><span class="p">(</span><span class="n">subcommand</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">(</span><span class="n">cmdline</span><span class="p">)</span>
<span class="n">p</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">KeyboardInterrupt</span><span class="p">:</span>
<span class="n">p</span><span class="o">.</span><span class="n">send_signal</span><span class="p">(</span><span class="n">signal</span><span class="o">.</span><span class="n">SIGINT</span><span class="p">)</span>
<span class="n">p</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">docker_compose_cmdline</span><span class="p">(</span><span class="n">commands_string</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">config</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"APPLICATION_CONFIG"</span><span class="p">)</span>
<span class="n">configure_app</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
<span class="n">compose_file</span> <span class="o">=</span> <span class="n">docker_compose_file</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isfile</span><span class="p">(</span><span class="n">compose_file</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s2">"The file </span><span class="si">{</span><span class="n">compose_file</span><span class="si">}</span><span class="s2"> does not exist"</span><span class="p">)</span>
<span class="n">command_line</span> <span class="o">=</span> <span class="p">[</span>
<span class="s2">"docker-compose"</span><span class="p">,</span>
<span class="s2">"-p"</span><span class="p">,</span>
<span class="n">config</span><span class="p">,</span>
<span class="s2">"-f"</span><span class="p">,</span>
<span class="n">compose_file</span><span class="p">,</span>
<span class="p">]</span>
<span class="k">if</span> <span class="n">commands_string</span><span class="p">:</span>
<span class="n">command_line</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">commands_string</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">" "</span><span class="p">))</span>
<span class="k">return</span> <span class="n">command_line</span>
<span class="nd">@cli</span><span class="o">.</span><span class="n">command</span><span class="p">(</span><span class="n">context_settings</span><span class="o">=</span><span class="p">{</span><span class="s2">"ignore_unknown_options"</span><span class="p">:</span> <span class="kc">True</span><span class="p">})</span>
<span class="nd">@click</span><span class="o">.</span><span class="n">argument</span><span class="p">(</span><span class="s2">"subcommand"</span><span class="p">,</span> <span class="n">nargs</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">click</span><span class="o">.</span><span class="n">Path</span><span class="p">())</span>
<span class="k">def</span> <span class="nf">compose</span><span class="p">(</span><span class="n">subcommand</span><span class="p">):</span>
<span class="n">cmdline</span> <span class="o">=</span> <span class="n">docker_compose_cmdline</span><span class="p">()</span> <span class="o">+</span> <span class="nb">list</span><span class="p">(</span><span class="n">subcommand</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">(</span><span class="n">cmdline</span><span class="p">)</span>
<span class="n">p</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">KeyboardInterrupt</span><span class="p">:</span>
<span class="n">p</span><span class="o">.</span><span class="n">send_signal</span><span class="p">(</span><span class="n">signal</span><span class="o">.</span><span class="n">SIGINT</span><span class="p">)</span>
<span class="n">p</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">run_sql</span><span class="p">(</span><span class="n">statements</span><span class="p">):</span>
<span class="n">conn</span> <span class="o">=</span> <span class="n">psycopg2</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span>
<span class="n">dbname</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"POSTGRES_DB"</span><span class="p">),</span>
<span class="n">user</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"POSTGRES_USER"</span><span class="p">),</span>
<span class="n">password</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"POSTGRES_PASSWORD"</span><span class="p">),</span>
<span class="n">host</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"POSTGRES_HOSTNAME"</span><span class="p">),</span>
<span class="n">port</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"POSTGRES_PORT"</span><span class="p">),</span>
<span class="p">)</span>
<span class="n">conn</span><span class="o">.</span><span class="n">set_isolation_level</span><span class="p">(</span><span class="n">ISOLATION_LEVEL_AUTOCOMMIT</span><span class="p">)</span>
<span class="n">cursor</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span>
<span class="k">for</span> <span class="n">statement</span> <span class="ow">in</span> <span class="n">statements</span><span class="p">:</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="n">statement</span><span class="p">)</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="n">conn</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">wait_for_logs</span><span class="p">(</span><span class="n">cmdline</span><span class="p">,</span> <span class="n">message</span><span class="p">):</span>
<span class="n">logs</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">(</span><span class="n">cmdline</span><span class="p">)</span>
<span class="k">while</span> <span class="n">message</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">logs</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s2">"utf-8"</span><span class="p">):</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">0.1</span><span class="p">)</span>
<span class="n">logs</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">(</span><span class="n">cmdline</span><span class="p">)</span>
<span class="nd">@cli</span><span class="o">.</span><span class="n">command</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">create_initial_db</span><span class="p">():</span>
<span class="n">configure_app</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"APPLICATION_CONFIG"</span><span class="p">))</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">run_sql</span><span class="p">([</span><span class="sa">f</span><span class="s2">"CREATE DATABASE </span><span class="si">{</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'APPLICATION_DB'</span><span class="p">)</span><span class="si">}</span><span class="s2">"</span><span class="p">])</span>
<span class="k">except</span> <span class="n">psycopg2</span><span class="o">.</span><span class="n">errors</span><span class="o">.</span><span class="n">DuplicateDatabase</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span>
<span class="sa">f</span><span class="s2">"The database </span><span class="si">{</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'APPLICATION_DB'</span><span class="p">)</span><span class="si">}</span><span class="s2"> already exists and will not be recreated"</span>
<span class="p">)</span>
<span class="nd">@cli</span><span class="o">.</span><span class="n">command</span><span class="p">()</span>
<span class="nd">@click</span><span class="o">.</span><span class="n">argument</span><span class="p">(</span><span class="s2">"filenames"</span><span class="p">,</span> <span class="n">nargs</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test</span><span class="p">(</span><span class="n">filenames</span><span class="p">):</span>
<span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">"APPLICATION_CONFIG"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"testing"</span>
<span class="n">configure_app</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"APPLICATION_CONFIG"</span><span class="p">))</span>
<span class="n">cmdline</span> <span class="o">=</span> <span class="n">docker_compose_cmdline</span><span class="p">(</span><span class="s2">"up -d"</span><span class="p">)</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">cmdline</span><span class="p">)</span>
<span class="n">cmdline</span> <span class="o">=</span> <span class="n">docker_compose_cmdline</span><span class="p">(</span><span class="s2">"logs db"</span><span class="p">)</span>
<span class="n">wait_for_logs</span><span class="p">(</span><span class="n">cmdline</span><span class="p">,</span> <span class="s2">"ready to accept connections"</span><span class="p">)</span>
<span class="n">run_sql</span><span class="p">([</span><span class="sa">f</span><span class="s2">"CREATE DATABASE </span><span class="si">{</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'APPLICATION_DB'</span><span class="p">)</span><span class="si">}</span><span class="s2">"</span><span class="p">])</span>
<span class="n">cmdline</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"pytest"</span><span class="p">,</span> <span class="s2">"-svv"</span><span class="p">,</span> <span class="s2">"--cov=application"</span><span class="p">,</span> <span class="s2">"--cov-report=term-missing"</span><span class="p">]</span>
<span class="n">cmdline</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">filenames</span><span class="p">)</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">cmdline</span><span class="p">)</span>
<span class="n">cmdline</span> <span class="o">=</span> <span class="n">docker_compose_cmdline</span><span class="p">(</span><span class="s2">"down"</span><span class="p">)</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">cmdline</span><span class="p">)</span>
<span class="nd">@cli</span><span class="o">.</span><span class="n">group</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">scenario</span><span class="p">():</span>
<span class="k">pass</span>
<span class="nd">@scenario</span><span class="o">.</span><span class="n">command</span><span class="p">()</span>
<span class="nd">@click</span><span class="o">.</span><span class="n">argument</span><span class="p">(</span><span class="s2">"name"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">up</span><span class="p">(</span><span class="n">name</span><span class="p">):</span> <span class="callout">1</span>
<span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">"APPLICATION_CONFIG"</span><span class="p">]</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">"scenario_</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2">"</span>
<span class="n">config</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"APPLICATION_CONFIG"</span><span class="p">)</span>
<span class="n">scenario_config_source_file</span> <span class="o">=</span> <span class="n">app_config_file</span><span class="p">(</span><span class="s2">"scenario"</span><span class="p">)</span>
<span class="n">scenario_config_file</span> <span class="o">=</span> <span class="n">app_config_file</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isfile</span><span class="p">(</span><span class="n">scenario_config_source_file</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s2">"File </span><span class="si">{</span><span class="n">scenario_config_source_file</span><span class="si">}</span><span class="s2"> doesn't exist"</span><span class="p">)</span>
<span class="n">shutil</span><span class="o">.</span><span class="n">copy</span><span class="p">(</span><span class="n">scenario_config_source_file</span><span class="p">,</span> <span class="n">scenario_config_file</span><span class="p">)</span> <span class="callout">3</span>
<span class="n">scenario_docker_source_file</span> <span class="o">=</span> <span class="n">docker_compose_file</span><span class="p">(</span><span class="s2">"scenario"</span><span class="p">)</span>
<span class="n">scenario_docker_file</span> <span class="o">=</span> <span class="n">docker_compose_file</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isfile</span><span class="p">(</span><span class="n">scenario_docker_source_file</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="sa">f</span><span class="s2">"File </span><span class="si">{</span><span class="n">scenario_docker_source_file</span><span class="si">}</span><span class="s2"> doesn't exist"</span><span class="p">)</span>
<span class="n">shutil</span><span class="o">.</span><span class="n">copy</span><span class="p">(</span><span class="n">docker_compose_file</span><span class="p">(</span><span class="s2">"scenario"</span><span class="p">),</span> <span class="n">scenario_docker_file</span><span class="p">)</span> <span class="callout">4</span>
<span class="n">configure_app</span><span class="p">(</span><span class="sa">f</span><span class="s2">"scenario_</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">cmdline</span> <span class="o">=</span> <span class="n">docker_compose_cmdline</span><span class="p">(</span><span class="s2">"up -d"</span><span class="p">)</span> <span class="callout">5</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">cmdline</span><span class="p">)</span>
<span class="n">cmdline</span> <span class="o">=</span> <span class="n">docker_compose_cmdline</span><span class="p">(</span><span class="s2">"logs db"</span><span class="p">)</span>
<span class="n">wait_for_logs</span><span class="p">(</span><span class="n">cmdline</span><span class="p">,</span> <span class="s2">"ready to accept connections"</span><span class="p">)</span>
<span class="n">cmdline</span> <span class="o">=</span> <span class="n">docker_compose_cmdline</span><span class="p">(</span><span class="s2">"port db 5432"</span><span class="p">)</span> <span class="callout">6</span>
<span class="n">out</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">(</span><span class="n">cmdline</span><span class="p">)</span>
<span class="n">port</span> <span class="o">=</span> <span class="n">out</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s2">"utf-8"</span><span class="p">)</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">,</span> <span class="s2">""</span><span class="p">)</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">":"</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">"POSTGRES_PORT"</span><span class="p">]</span> <span class="o">=</span> <span class="n">port</span>
<span class="n">run_sql</span><span class="p">([</span><span class="sa">f</span><span class="s2">"CREATE DATABASE </span><span class="si">{</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s1">'APPLICATION_DB'</span><span class="p">)</span><span class="si">}</span><span class="s2">"</span><span class="p">])</span>
<span class="n">scenario_module</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">"scenarios.</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2">"</span>
<span class="n">scenario_file</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="s2">"scenarios"</span><span class="p">,</span> <span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2">.py"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isfile</span><span class="p">(</span><span class="n">scenario_file</span><span class="p">):</span> <span class="callout">7</span>
<span class="kn">import</span> <span class="nn">importlib</span>
<span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">"APPLICATION_SCENARIO_NAME"</span><span class="p">]</span> <span class="o">=</span> <span class="n">name</span>
<span class="n">scenario</span> <span class="o">=</span> <span class="n">importlib</span><span class="o">.</span><span class="n">import_module</span><span class="p">(</span><span class="n">scenario_module</span><span class="p">)</span>
<span class="n">scenario</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
<span class="n">cmdline</span> <span class="o">=</span> <span class="s2">" "</span><span class="o">.</span><span class="n">join</span><span class="p">(</span> <span class="callout">8</span>
<span class="n">docker_compose_cmdline</span><span class="p">(</span>
<span class="s2">"exec db psql -U </span><span class="si">{}</span><span class="s2"> -d </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"POSTGRES_USER"</span><span class="p">),</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"APPLICATION_DB"</span><span class="p">)</span>
<span class="p">)</span>
<span class="p">)</span>
<span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Your scenario is ready. If you want to open a SQL shell run"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">cmdline</span><span class="p">)</span>
<span class="nd">@scenario</span><span class="o">.</span><span class="n">command</span><span class="p">()</span>
<span class="nd">@click</span><span class="o">.</span><span class="n">argument</span><span class="p">(</span><span class="s2">"name"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">down</span><span class="p">(</span><span class="n">name</span><span class="p">):</span> <span class="callout">2</span>
<span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">"APPLICATION_CONFIG"</span><span class="p">]</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">"scenario_</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2">"</span>
<span class="n">config</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"APPLICATION_CONFIG"</span><span class="p">)</span>
<span class="n">cmdline</span> <span class="o">=</span> <span class="n">docker_compose_cmdline</span><span class="p">(</span><span class="s2">"down"</span><span class="p">)</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">call</span><span class="p">(</span><span class="n">cmdline</span><span class="p">)</span>
<span class="n">scenario_config_file</span> <span class="o">=</span> <span class="n">app_config_file</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">scenario_config_file</span><span class="p">)</span>
<span class="n">scenario_docker_file</span> <span class="o">=</span> <span class="n">docker_compose_file</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">remove</span><span class="p">(</span><span class="n">scenario_docker_file</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="n">cli</span><span class="p">()</span>
</pre></div> </div> </div><p>where I added the commands <code>scenario up</code> <span class="callout">1</span> and <code>scenario down</code> <span class="callout">2</span>. As you can see the function <code>up</code> first copies the files <code>config/scenario.json</code> <span class="callout">3</span> and <code>docker/scenario.yml</code> <span class="callout">4</span> (that I still have to create) into files named after the scenario.</p><p>Then I run the command <code>up -d</code> <span class="callout">5</span> and wait for the database to be ready, as I already do for tests. After that, it's time to extract the port of the container with some very simple Python string processing <span class="callout">6</span> and to initialise the correct environment variable.</p><p>Last, I import and execute the Python file <span class="callout">7</span> containing the code of the scenario itself and print a friendly message with the command line to run <code>psql</code> <span class="callout">8</span> to have a Postgres shell into the newly created database.</p><p>The function <code>down</code> simply tears down the containers and removes the scenario configuration files.</p><p>The two missing config files are pretty simple. The docker compose configuration is</p><div class="code"><div class="title"><code>docker/scenario.yml</code></div><div class="content"><div class="highlight"><pre><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">'3.4'</span>
<span class="nt">services</span><span class="p">:</span>
<span class="w"> </span><span class="nt">db</span><span class="p">:</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">postgres</span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
<span class="w"> </span><span class="nt">POSTGRES_DB</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_DB}</span>
<span class="w"> </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_USER}</span>
<span class="w"> </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_PASSWORD}</span>
<span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="s">"5432"</span> <span class="callout">1</span>
<span class="w"> </span><span class="nt">web</span><span class="p">:</span>
<span class="w"> </span><span class="nt">build</span><span class="p">:</span>
<span class="w"> </span><span class="nt">context</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${PWD}</span>
<span class="w"> </span><span class="nt">dockerfile</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">docker/Dockerfile</span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
<span class="w"> </span><span class="nt">FLASK_ENV</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${FLASK_ENV}</span>
<span class="w"> </span><span class="nt">FLASK_CONFIG</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${FLASK_CONFIG}</span>
<span class="w"> </span><span class="nt">APPLICATION_DB</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${APPLICATION_DB}</span>
<span class="w"> </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_USER}</span>
<span class="w"> </span><span class="nt">POSTGRES_HOSTNAME</span><span class="p">:</span><span class="w"> </span><span class="s">"db"</span>
<span class="w"> </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_PASSWORD}</span>
<span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">flask run --host 0.0.0.0</span>
<span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${PWD}:/opt/code</span>
<span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="s">"5000"</span>
</pre></div> </div> </div><p>Here you can see that the database is ephemeral, that the port on the host is automatically assigned <span class="callout">1</span>, and that I also spin up the application (mapping it to a random port as well to avoid clashing with the development one).</p><p>The configuration file is</p><div class="code"><div class="title"><code>config/scenario.json</code></div><div class="content"><div class="highlight"><pre><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"FLASK_ENV"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"development"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"FLASK_CONFIG"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"development"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"POSTGRES_DB"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"postgres"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"POSTGRES_USER"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"postgres"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"POSTGRES_HOSTNAME"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"localhost"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"POSTGRES_PASSWORD"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"postgres"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"APPLICATION_DB"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"application"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">]</span>
</pre></div> </div> </div><p>which doesn't add anything new to what I already did for development and testing. </p><h3 id="git-commit-3262">Git commit</h3><p>You can see the changes made in this step through <a href="https://github.com/lgiordani/flask_project_setup/commit/dbb54d31af17866f9336199c65f1b495d879eb70">this Git commit</a> or <a href="https://github.com/lgiordani/flask_project_setup/tree/dbb54d31af17866f9336199c65f1b495d879eb70">browse the files</a>.</p><h3 id="resources-9e89">Resources</h3><ul><li><a href="https://docs.docker.com/compose/compose-file/#ports">Expose ports in docker-compose</a></li><li><a href="https://docs.docker.com/compose/reference/port/">Docker Compose port command</a> - A command to print the port exposed by a container</li><li><a href="https://www.postgresql.org/docs/current/app-psql.html">psql</a> - PostgreSQL interactive terminal</li></ul><h3 id="scenario-example-1-25a8">Scenario example 1</h3><p>Let's have a look at a very simple scenario that doesn't do anything on the database, just to understand the system. The code for the scenario is</p><div class="code"><div class="title"><code>scenarios/foo.py</code></div><div class="content"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">os</span>
<span class="k">def</span> <span class="nf">run</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"HEY! This is scenario"</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">"APPLICATION_SCENARIO_NAME"</span><span class="p">])</span>
</pre></div> </div> </div><p>When I run the scenario I get the following output</p><div class="code"><div class="content"><div class="highlight"><pre>$ ./manage.py scenario up foo
Creating network "scenario_foo_default" with the default driver
Creating scenario_foo_db_1 ... done
Creating scenario_foo_web_1 ... done
HEY! This is scenario foo
Your scenario is ready. If you want to open a SQL shell run
docker-compose -p scenario_foo -f docker/scenario_foo.yml exec db psql -U postgres -d application
</pre></div> </div> </div><p>The command <code>docker ps</code> shows that my development environment is happily running alongside with the scenario</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker ps
CONTAINER ID IMAGE COMMAND [...] PORTS NAMES
85258892a2df scenario_foo_web "flask run --host 0.…" [...] 0.0.0.0:32826->5000/tcp scenario_foo_web_1
a031b6429e07 postgres "docker-entrypoint.s…" [...] 0.0.0.0:32827->5432/tcp scenario_foo_db_1
1a449d23da01 development_web "flask run --host 0.…" [...] 0.0.0.0:5000->5000/tcp development_web_1
28aa566321b5 postgres "docker-entrypoint.s…" [...] 0.0.0.0:5432->5432/tcp development_db_1
</pre></div> </div> </div><p>And the output of the command <code>scenario up foo</code> contains the string <code>HEY! This is scenario foo</code> that was printed by the file <code>foo.py</code>. We can also successfully run the suggested command</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker-compose -p scenario_foo -f docker/scenario_foo.yml exec db psql -U postgres -d application
psql (12.3 (Debian 12.3-1.pgdg100+1))
Type "help" for help.
application=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-------------+----------+----------+------------+------------+-----------------------
application | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
application=#
</pre></div> </div> </div><p>And inside the database we find the database <code>application</code> created explicitly for the scenario (the name is specified in <code>config/scenario.json</code>). If you don't know <code>psql</code> you can exit with <code>\q</code> or <code>Ctrl-d</code>.</p><p>Before tearing down the scenario have a look at the two files <code>config/scenario_foo.json</code> and <code>docker/scenario_foo.yml</code>. They are just copies of <code>config/scenario.json</code> and <code>docker/scenario.yml</code> but I think seeing them there might help to understand how the whole thing works. When you are done run <code>./manage.py scenario down foo</code>.</p><h3 id="git-commit-3262">Git commit</h3><p>You can see the changes made in this step through <a href="https://github.com/lgiordani/flask_project_setup/commit/9d9601508cfa7dc5d718d76cd0827396069035fd">this Git commit</a> or <a href="https://github.com/lgiordani/flask_project_setup/tree/9d9601508cfa7dc5d718d76cd0827396069035fd">browse the files</a>.</p><h3 id="scenario-example-2-66ea">Scenario example 2</h3><p>Let's do something a bit more interesting. The new scenario is contained in <code>scenarios/users.py</code></p><div class="code"><div class="title"><code>scenarios/users.py</code></div><div class="content"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">application.app</span> <span class="kn">import</span> <span class="n">create_app</span>
<span class="kn">from</span> <span class="nn">application.models</span> <span class="kn">import</span> <span class="n">db</span><span class="p">,</span> <span class="n">User</span>
<span class="n">app</span> <span class="o">=</span> <span class="n">create_app</span><span class="p">(</span><span class="s2">"development"</span><span class="p">)</span> <span class="callout">1</span>
<span class="k">def</span> <span class="nf">run</span><span class="p">():</span>
<span class="k">with</span> <span class="n">app</span><span class="o">.</span><span class="n">app_context</span><span class="p">():</span>
<span class="n">db</span><span class="o">.</span><span class="n">drop_all</span><span class="p">()</span>
<span class="n">db</span><span class="o">.</span><span class="n">create_all</span><span class="p">()</span>
<span class="c1"># Administrator</span>
<span class="n">admin</span> <span class="o">=</span> <span class="n">User</span><span class="p">(</span><span class="n">email</span><span class="o">=</span><span class="s2">"admin@server.com"</span><span class="p">)</span>
<span class="n">db</span><span class="o">.</span><span class="n">session</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">admin</span><span class="p">)</span> <span class="callout">2</span>
<span class="c1"># First user</span>
<span class="n">user1</span> <span class="o">=</span> <span class="n">User</span><span class="p">(</span><span class="n">email</span><span class="o">=</span><span class="s2">"user1@server.com"</span><span class="p">)</span>
<span class="n">db</span><span class="o">.</span><span class="n">session</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">user1</span><span class="p">)</span>
<span class="c1"># Second user</span>
<span class="n">user2</span> <span class="o">=</span> <span class="n">User</span><span class="p">(</span><span class="n">email</span><span class="o">=</span><span class="s2">"user2@server.com"</span><span class="p">)</span>
<span class="n">db</span><span class="o">.</span><span class="n">session</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">user2</span><span class="p">)</span>
<span class="n">db</span><span class="o">.</span><span class="n">session</span><span class="o">.</span><span class="n">commit</span><span class="p">()</span>
</pre></div> </div> </div><p>I decided to be as agnostic as possible in the scenarios, to avoid creating something too specific that eventually would not give me enough flexibility to test what I need. This means that the scenario has to create the app <span class="callout">1</span> and to use the database session explicitly <span class="callout">2</span>, as I do in this example. The application is created with the configuration <code>"development"</code> <span class="callout">1</span>. Remember that this is the Flask configuration that you find in <code>application/config.py</code>, not the one that is in <code>config/development.json</code>.</p><p>I can run the scenario with</p><div class="code"><div class="content"><div class="highlight"><pre>$ ./manage.py scenario up users
</pre></div> </div> </div><p>and then connect to the database to find my users</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker-compose -p scenario_users -f docker/scenario_users.yml exec db psql -U postgres -d application
psql (12.3 (Debian 12.3-1.pgdg100+1))
Type "help" for help.
application=# \dt
List of relations
Schema | Name | Type | Owner
--------+-------+-------+----------
public | users | table | postgres
(1 row)
application=# select * from users;
id | email
----+------------------
1 | admin@server.com
2 | user1@server.com
3 | user2@server.com
(3 rows)
application=# \q
</pre></div> </div> </div><h3 id="git-commit-3262">Git commit</h3><p>You can see the changes made in this step through <a href="https://github.com/lgiordani/flask_project_setup/commit/b475c5e3d455098691fa1c736de573182d0e44ec">this Git commit</a> or <a href="https://github.com/lgiordani/flask_project_setup/tree/b475c5e3d455098691fa1c736de573182d0e44ec">browse the files</a>.</p>
<div class="advertisement">
<a href="https://www.thedigitalcat.academy/freebie-first-class-objects">
<img src="/images/first-class-objects/cover.jpg" />
</a>
<div class="body">
<h2 id="first-class-objects-in-python-fffa">First-class objects in Python<a class="headerlink" href="#first-class-objects-in-python-fffa" title="Permanent link">¶</a></h2>
<p>Higher-order functions, wrappers, and factories</p>
<p>Learn all you need to know to understand first-class citizenship in Python, the gateway to grasp how decorators work and how functional programming can supercharge your code.</p>
<div class="actions">
<a class="action" href="https://www.thedigitalcat.academy/freebie-first-class-objects">Get your FREE copy</a>
</div>
</div>
</div>
<h2 id="step-2---simulating-the-production-environment-80ac">Step 2 - Simulating the production environment<a class="headerlink" href="#step-2---simulating-the-production-environment-80ac" title="Permanent link">¶</a></h2><p>As I stated at the very beginning of this mini series of posts, one of my goals was to run in development the same database that I run in production, and for this reason I went through the configuration steps that allowed me to have a Postgres container running both in development and during tests. In a real production scenario Postgres would probably run in a separate instance, for example on the RDS service in AWS, but as long as you have the connection parameters nothing changes in the configuration.</p><p>Docker actually allows us to easily simulate the production environment. If our notebook was connected 24/7 we might as well host the production there directly. Not that I recommend this nowadays, but this is how many important companies begun many years ago when cloud computing had not been here yet. Instead of installing a LAMP stack we configure containers, but the idea doesn't change.</p><p>I will then create a configuration that simulates a production environment and then give some hints on how to translate this into a proper production infrastructure. If you want to have a clear picture of the components of a web application in production read my post <a href="https://www.thedigitalcatonline.com/blog/2020/02/16/dissecting-a-web-stack/">Dissecting a web stack</a> that analyses them one by one.</p><p>The first component that we have to change here is the HTTP server. In development we use Flask's development server, and the first message that server prints is <code>WARNING: This is a development server. Do not use it in a production deployment.</code> Got it, Flask! A good choice to replace it is Gunicorn, so first of all I add it in the requirements</p><div class="code"><div class="title"><code>requirements/production.txt</code></div><div class="content"><div class="highlight"><pre>Flask
flask-sqlalchemy
psycopg2
flask-migrate
<span class="hll">gunicorn
</pre></div> </div> </div><p>Then I need to create a docker-compose configuration for production</p><div class="code"><div class="title"><code>docker/production.yml</code></div><div class="content"><div class="highlight"><pre><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">'3.4'</span>
<span class="nt">services</span><span class="p">:</span>
<span class="w"> </span><span class="nt">db</span><span class="p">:</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">postgres</span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
<span class="w"> </span><span class="nt">POSTGRES_DB</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_DB}</span>
<span class="w"> </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_USER}</span>
<span class="w"> </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_PASSWORD}</span>
<span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="s">"${POSTGRES_PORT}:5432"</span>
<span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">pgdata:/var/lib/postgresql/data</span>
<span class="w"> </span><span class="nt">web</span><span class="p">:</span>
<span class="w"> </span><span class="nt">build</span><span class="p">:</span>
<span class="w"> </span><span class="nt">context</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${PWD}</span>
<span class="w"> </span><span class="nt">dockerfile</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">docker/Dockerfile.production</span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
<span class="w"> </span><span class="nt">FLASK_ENV</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${FLASK_ENV}</span>
<span class="w"> </span><span class="nt">FLASK_CONFIG</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${FLASK_CONFIG}</span>
<span class="w"> </span><span class="nt">APPLICATION_DB</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${APPLICATION_DB}</span>
<span class="w"> </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_USER}</span>
<span class="w"> </span><span class="nt">POSTGRES_HOSTNAME</span><span class="p">:</span><span class="w"> </span><span class="s">"db"</span>
<span class="w"> </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_PASSWORD}</span>
<span class="w"> </span><span class="nt">POSTGRES_PORT</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_PORT}</span>
<span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">gunicorn -w 4 -b 0.0.0.0 wsgi:app</span> <span class="callout">1</span>
<span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${PWD}:/opt/code</span>
<span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="s">"8000:8000"</span> <span class="callout">2</span>
<span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="nt">pgdata</span><span class="p">:</span>
</pre></div> </div> </div><p>As you can see here the command that runs the application is slightly different <span class="callout">1</span>. It exposes 4 processes (<code>-w 4</code>) on the container's address 0.0.0.0 loading the object <code>app</code> from the file <code>wsgi.py</code> (<code>wsgi:app</code>). As by default Gunicorn exposes port 8000 I mapped that <span class="callout">2</span> to the same port in the host.</p><p>Then I created the file <code>Dockerfile.production</code> that defines the production image of the web application</p><div class="code"><div class="title"><code>docker/Dockerfile.production</code></div><div class="content"><div class="highlight"><pre><span class="k">FROM</span><span class="w"> </span><span class="s">python:3</span>
<span class="k">ENV</span><span class="w"> </span>PYTHONUNBUFFERED<span class="w"> </span><span class="m">1</span>
<span class="k">RUN</span><span class="w"> </span>mkdir<span class="w"> </span>/opt/code
<span class="k">RUN</span><span class="w"> </span>mkdir<span class="w"> </span>/opt/requirements
<span class="k">WORKDIR</span><span class="w"> </span><span class="s">/opt/code</span>
<span class="k">ADD</span><span class="w"> </span>requirements<span class="w"> </span>/opt/requirements
<span class="k">RUN</span><span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>/opt/requirements/production.txt
</pre></div> </div> </div><p>The last thing I need is a configuration file</p><div class="code"><div class="title"><code>config/production.json</code></div><div class="content"><div class="highlight"><pre><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"FLASK_ENV"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"production"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"FLASK_CONFIG"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"production"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"POSTGRES_DB"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"postgres"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"POSTGRES_USER"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"postgres"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"POSTGRES_HOSTNAME"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"localhost"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"POSTGRES_PORT"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"5432"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"POSTGRES_PASSWORD"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"postgres"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"APPLICATION_DB"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"application"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">]</span>
</pre></div> </div> </div><p>as you can notice this is not very different from the development one, as I just changed the values of <code>FLASK_ENV</code> and <code>FLASK_CONFIG</code>. Clearly this contains a secret that shouldn't be written in plain text, <code>POSTGRES_PASSWORD</code>, but after all this is a simulation of production. In a real environment secrets should be kept in an encrypted manager such as <a href="https://aws.amazon.com/secrets-manager/">AWS Secrets Manager</a>.</p><p>Remember that <code>FLASK_ENV</code> changes the internal settings of Flask, most notably disabling the debugger, and that <code>FLASK_CONFIG=production</code> loads the object <code>ProductionConfig</code> from <code>application/config.py</code>. That object is empty for the moment, but it might contain public configuration for the production server.</p><p>I can now build the image with</p><div class="code"><div class="content"><div class="highlight"><pre>$ APPLICATION_CONFIG="production" ./manage.py compose build web
</pre></div> </div> </div><h3 id="git-commit-3262">Git commit</h3><p>You can see the changes made in this step through <a href="https://github.com/lgiordani/flask_project_setup/commit/1c0fcf13c54ea13bb6e1307452de00529dbd57af">this Git commit</a> or <a href="https://github.com/lgiordani/flask_project_setup/tree/1c0fcf13c54ea13bb6e1307452de00529dbd57af">browse the files</a>.</p><h3 id="resources-9e89">Resources</h3><ul><li><a href="https://gunicorn.org/">Gunicorn</a> - A Python WSGI HTTP Server</li></ul><h2 id="step-3---scale-up-ce9e">Step 3 - Scale up<a class="headerlink" href="#step-3---scale-up-ce9e" title="Permanent link">¶</a></h2><p>Mapping the container port to the host is not a great idea, though, as it makes it impossible to scale up and down to serve more load, which is the main point of running containers in production. This might be solved in many ways in the cloud, for example in AWS you might run the container in AWS Fargate and register them in an Application Load Balancer. Another way to do it on a single host is to run a Web Server in front of your HTTP server, and this might be easily implemented with Docker Compose.</p><p>I will add nginx and serve HTTP from there, reverse proxying the application containers through docker-compose networking. First of all the new configuration for docker-compose</p><div class="code"><div class="title"><code>docker/production.yml</code></div><div class="content"><div class="highlight"><pre><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">'3.4'</span>
<span class="nt">services</span><span class="p">:</span>
<span class="w"> </span><span class="nt">db</span><span class="p">:</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">postgres</span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
<span class="w"> </span><span class="nt">POSTGRES_DB</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_DB}</span>
<span class="w"> </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_USER}</span>
<span class="w"> </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_PASSWORD}</span>
<span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="s">"${POSTGRES_PORT}:5432"</span>
<span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">pgdata:/var/lib/postgresql/data</span>
<span class="w"> </span><span class="nt">web</span><span class="p">:</span>
<span class="w"> </span><span class="nt">build</span><span class="p">:</span>
<span class="w"> </span><span class="nt">context</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${PWD}</span>
<span class="w"> </span><span class="nt">dockerfile</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">docker/Dockerfile.production</span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
<span class="w"> </span><span class="nt">FLASK_ENV</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${FLASK_ENV}</span>
<span class="w"> </span><span class="nt">FLASK_CONFIG</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${FLASK_CONFIG}</span>
<span class="w"> </span><span class="nt">APPLICATION_DB</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${APPLICATION_DB}</span>
<span class="w"> </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_USER}</span>
<span class="w"> </span><span class="nt">POSTGRES_HOSTNAME</span><span class="p">:</span><span class="w"> </span><span class="s">"db"</span>
<span class="w"> </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_PASSWORD}</span>
<span class="w"> </span><span class="nt">POSTGRES_PORT</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${POSTGRES_PORT}</span>
<span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">gunicorn -w 4 -b 0.0.0.0 wsgi:app</span>
<span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">${PWD}:/opt/code</span>
<span class="hll"><span class="w"> </span><span class="nt">nginx</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">nginx</span>
</span><span class="hll"><span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">./nginx/nginx.conf:/etc/nginx/nginx.conf:ro</span>
</span><span class="hll"><span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">8080:8080</span>
</span>
<span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="nt">pgdata</span><span class="p">:</span>
</pre></div> </div> </div><p>As you can see I added a service <code>nginx</code> that runs the default Nginx image, mapping a custom configuration file that I will create in a minute. The application container doesn't need any port mapping, as I won't access it directly from the host anymore. The Nginx configuration file is</p><div class="code"><div class="title"><code>docker/nginx/nginx.conf</code></div><div class="content"><div class="highlight"><pre><span class="k">worker_processes</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="k">events</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="kn">worker_connections</span><span class="w"> </span><span class="mi">1024</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="k">http</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">sendfile</span><span class="w"> </span><span class="no">on</span><span class="p">;</span>
<span class="w"> </span><span class="kn">upstream</span><span class="w"> </span><span class="s">app</span><span class="w"> </span><span class="p">{</span> <span class="callout">1</span>
<span class="w"> </span><span class="kn">server</span><span class="w"> </span><span class="n">web</span><span class="p">:</span><span class="mi">8000</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kn">server</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">8080</span><span class="p">;</span> <span class="callout">2</span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">http://app</span><span class="p">;</span>
<span class="w"> </span><span class="kn">proxy_redirect</span><span class="w"> </span><span class="no">off</span><span class="p">;</span>
<span class="w"> </span><span class="kn">proxy_set_header</span><span class="w"> </span><span class="s">Host</span><span class="w"> </span><span class="nv">$host</span><span class="p">;</span>
<span class="w"> </span><span class="kn">proxy_set_header</span><span class="w"> </span><span class="s">X-Real-IP</span><span class="w"> </span><span class="nv">$remote_addr</span><span class="p">;</span>
<span class="w"> </span><span class="kn">proxy_set_header</span><span class="w"> </span><span class="s">X-Forwarded-For</span><span class="w"> </span><span class="nv">$proxy_add_x_forwarded_for</span><span class="p">;</span>
<span class="w"> </span><span class="kn">proxy_set_header</span><span class="w"> </span><span class="s">X-Forwarded-Host</span><span class="w"> </span><span class="nv">$server_name</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>This is a pretty standard configuration, and in a real production environment I would add many other configuration values (most notably serving HTTPS instead of HTTP). The section <code>upstream</code> <span class="callout">1</span> leverages docker-compose networking referring to <code>web</code>, which in the internal DNS directly maps to the IPs of the service with the same name. The port 8000 comes from the default Gunicorn port that I already mentioned before. I won't run the nginx container as root on my notebook, so I expose port 8080 <span class="callout">2</span> instead of the traditional 80 for HTTP, and this is also something that might be different in a real production environment.</p><p>I can at this point run</p><div class="code"><div class="content"><div class="highlight"><pre>$ APPLICATION_CONFIG="production" ./manage.py compose up -d
Starting production_db_1 ... done
Starting production_nginx_1 ... done
Starting production_web_1 ... done
</pre></div> </div> </div><p>It's interesting to have a look at the logs of the nginx container, as Nginx by default prints all the incoming requests</p><div class="code"><div class="content"><div class="highlight"><pre>$ APPLICATION_CONFIG="production" ./manage.py compose logs -f nginx
Attaching to production_nginx_1
[...]
nginx_1 | 172.30.0.1 - - [05/Jul/2020:10:40:44 +0000] "GET / HTTP/1.1" 200 13 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0"
</pre></div> </div> </div><p>The last line is what I get when I visit localhost:8080 while the production setup is up and running.</p><p>Scaling up and down the service is now a breeze</p><div class="code"><div class="content"><div class="highlight"><pre>$ APPLICATION_CONFIG="production" ./manage.py compose up -d --scale web=3
production_db_1 is up-to-date
Starting production_web_1 ...
Starting production_web_1 ... done
Creating production_web_2 ... done
Creating production_web_3 ... done
</pre></div> </div> </div><h3 id="git-commit-3262">Git commit</h3><p>You can see the changes made in this step through <a href="https://github.com/lgiordani/flask_project_setup/commit/2794ea1ca6e7c56a823ddf30201e69700f596bf2">this Git commit</a> or <a href="https://github.com/lgiordani/flask_project_setup/tree/2794ea1ca6e7c56a823ddf30201e69700f596bf2">browse the files</a>.</p><h3 id="resources-9e89">Resources</h3><ul><li><a href="https://nginx.org/en/">Nginx</a> - An HTTP and reverse proxy server (and more)</li><li><a href="https://hub.docker.com/_/nginx">Docker nginx</a> - the official nginx Docker image</li><li><a href="https://docs.docker.com/compose/reference/logs/">Docker Compose logs command</a> - A command to print container logs</li><li><a href="https://docs.docker.com/compose/reference/up/">Docker Compose up command</a> - The new way to scale containers in docker-compose</li></ul><h2 id="bonus-step---a-closer-look-at-docker-networking-f6d5">Bonus step - A closer look at Docker networking<a class="headerlink" href="#bonus-step---a-closer-look-at-docker-networking-f6d5" title="Permanent link">¶</a></h2><p>I mentioned that Docker Compose creates a connection between services, and used that in the configuration of the nginx container, but I understand that this might look like black magic to some people. While I believe that this is actually black magic, I also think that we can investigate it a bit, so let's open the grimoire and reveal (some of) the dark secrets of Docker networking.</p><p>While the production setup is running we can connect to the nginx container and see what is happening in real time, so first of all I run a bash shell on it</p><div class="code"><div class="content"><div class="highlight"><pre>$ APPLICATION_CONFIG="production" ./manage.py compose exec nginx bash
</pre></div> </div> </div><p>Once inside I can see my configuration file at <code>/etc/nginx/nginx.conf</code>, but this has not changed. Remember that Docker networking doesn't work as a templating engine, but with a local DNS. This means that if we try to resolve <code>web</code> from inside the container we should see multiple IPs. The command <code>dig</code> is a good tool to investigate the DNS, but it doesn't come preinstalled in the nginx container, so I need to run</p><div class="code"><div class="content"><div class="highlight"><pre>root@33cbaea369be:/# apt update && apt install dnsutils
</pre></div> </div> </div><p>and at this point I can run it</p><div class="code"><div class="content"><div class="highlight"><pre>root@33cbaea369be:/# dig web
; <<>> DiG 9.11.5-P4-5.1+deb10u1-Debian <<>> web
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30539
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;web. IN A
;; ANSWER SECTION:
web. 600 IN A 172.30.0.4
web. 600 IN A 172.30.0.6
web. 600 IN A 172.30.0.5
;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Sun Jul 05 10:58:18 UTC 2020
;; MSG SIZE rcvd: 78
root@33cbaea369be:/#
</pre></div> </div> </div><p>The command outputs 3 IPs, which correspond to the 3 containers of the service <code>web</code> that I am currently running. If I scale down (from outside the container)</p><div class="code"><div class="content"><div class="highlight"><pre>$ APPLICATION_CONFIG="production" ./manage.py compose up -d --scale web=1
</pre></div> </div> </div><p>then the output of <code>dig</code> becomes</p><div class="code"><div class="content"><div class="highlight"><pre>root@33cbaea369be:/# dig web
; <<>> DiG 9.11.5-P4-5.1+deb10u1-Debian <<>> web
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13146
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;web. IN A
;; ANSWER SECTION:
web. 600 IN A 172.30.0.4
;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Sun Jul 05 11:01:46 UTC 2020
;; MSG SIZE rcvd: 40
root@33cbaea369be:/#
</pre></div> </div> </div>
<div class="advertisement">
<a href="https://www.thedigitalcat.academy/freebie-first-class-objects">
<img src="/images/first-class-objects/cover.jpg" />
</a>
<div class="body">
<h2 id="first-class-objects-in-python-fffa">First-class objects in Python<a class="headerlink" href="#first-class-objects-in-python-fffa" title="Permanent link">¶</a></h2>
<p>Higher-order functions, wrappers, and factories</p>
<p>Learn all you need to know to understand first-class citizenship in Python, the gateway to grasp how decorators work and how functional programming can supercharge your code.</p>
<div class="actions">
<a class="action" href="https://www.thedigitalcat.academy/freebie-first-class-objects">Get your FREE copy</a>
</div>
</div>
</div>
<h2 id="how-to-create-the-production-infrastructure-d9bc">How to create the production infrastructure<a class="headerlink" href="#how-to-create-the-production-infrastructure-d9bc" title="Permanent link">¶</a></h2><p>This will be a very short section, as creating infrastructure and deploying in production are complex topics, so I want to just give some hints to stimulate your research.</p><p><a href="https://aws.amazon.com/ecs/">AWS ECS</a> is basically Docker in the cloud, and the whole structure can map almost 1 to 1 to the docker-compose setup, so it is worth learning. ECS can work on explicit EC2 instances that you manage, or in <a href="https://aws.amazon.com/fargate/">Fargate</a>, which means that the EC2 instances running the containers are transparently managed by AWS itself.</p><p><a href="https://www.terraform.io/">Terraform</a> is a good tool to create infrastructure. It has many limitations, mostly coming from its custom HCL language, but it's slowly becoming better (version 0.13 will finally allow us to run for loops on modules, for example). Despite its shortcomings, it's a great tool to create static infrastructure, so I recommend working on it.</p><p>Terraform is not the right tool to deploy your code, though, as that requires a dynamic interaction with the system, so you need to setup a good Continuous Integration system. <a href="https://www.jenkins.io/">Jenkins</a> is a very well known open source CI, but I personally ended up dropping it because it doesn't seem to be designed for large scale systems. For example, it is very complicated to automate the deploy of a Jenkins server, and dynamic large scale systems should require zero manual intervention to be created. Anyway, Jenkins is a good tool to start with, but you might want to have a look at other products like <a href="https://circleci.com/">CircleCI</a> or <a href="https://buildkite.com/">Buildkite</a>.</p><p>When you create your deploy pipeline you need to do much more than just creating the image and running it, at least for real applications. You need to decide when to apply database migrations and if you have a web front-end you will also need to compile and install the JavaScript assets. Since you don't want to have downtime when you deploy you will need to look into blue/green deployments, and in general to strategies that allow you to run different versions of the application at the same time, at least for short periods of time. Or for longer periods, if you want to perform A/B testing or zonal deployments.</p><h2 id="final-words-9803">Final words<a class="headerlink" href="#final-words-9803" title="Permanent link">¶</a></h2><p>This is the last post of this short series. I hope you learned something useful, and that it encouraged you to properly setup your projects and to investigate technologies like Docker. As always, feel free to send me feedback or questions, and if you find my posts useful please share them with whoever you thing might be interested.</p><h2 id="updates-0083">Updates<a class="headerlink" href="#updates-0083" title="Permanent link">¶</a></h2><p>2020-12-22 I reviewed the whole tutorial and corrected several typos</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>