The Digital Cat - Djangohttps://www.thedigitalcatonline.com/2022-10-08T14:00:00+01:00Adventures of a curious cat in the land of programmingMultiple inheritance and mixin classes in Python2020-03-27T12:00:00+01:002022-10-08T14:00:00+01:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-03-27:/blog/2020/03/27/mixin-classes-in-python/<p>This post describes what mixin classes are in theory, why we need them and how they can be implemented in Python. It also shows a working example from the class-based views code of the Django framework.</p><p>I recently revisited three old posts on Django class-based views that I wrote for this blog, updating them to Django 3.0 (you can find them <a href="https://www.thedigitalcatonline.com/blog/2013/10/28/digging-up-django-class-based-views-1/">here</a>) and noticed once again that the code base uses <em>mixin classes</em> to increase code reuse. I also realised that mixins are not very popular in Python, so I decided to explore them, brushing up my knowledge of the OOP theory in the meanwhile.</p><p>To fully appreciate the content of the post, be sure you grasp two pillars of the OOP approach: <strong>delegation</strong>, in particular how it is implemented through inheritance, and <strong>polymorphism</strong>. <a href="https://www.thedigitalcatonline.com/blog/2014/08/20/python-3-oop-part-3-delegation-composition-and-inheritance/">This post about delegation</a> and <a href="https://www.thedigitalcatonline.com/blog/2014/08/21/python-3-oop-part-4-polymorphism/">this post about polymorphism</a> contain all you need to understand how Python implements those concepts.</p><h2 id="multiple-inheritance-blessing-and-curse-1a08">Multiple inheritance: blessing and curse<a class="headerlink" href="#multiple-inheritance-blessing-and-curse-1a08" title="Permanent link">¶</a></h2><h3 id="general-concepts-2aca">General concepts</h3><p>To discuss mixins we need to start from one of the most controversial subjects in the whole OOP world: multiple inheritance. This is a natural extension of the concept of simple inheritance, where a class automatically delegates method and attribute resolution to another class (the parent class).</p><p>Let me state it again, as it is important for the rest of the discussion: <em>inheritance is just an automatic delegation mechanism</em>.</p><p>Delegation was introduced in OOP as a way to reduce code duplication. When an object needs a specific feature it just delegates it to another class (either explicitly or implicitly), so the code is written just once.</p><p>Let's consider the example of code management website, clearly completely fictional and not inspired by any existing product. Let's assume we created the following hierarchy</p><div class="code"><div class="content"><div class="highlight"><pre> assignable reviewable item
(assign_to_user, ask_review_to_user)
^
|
|
|
pull request
</pre></div> </div> </div><p>which allows us to put in <code>pull request</code> only the specific code required by that element. This is a great achievement, as it is what libraries do for code, but on live objects. Method calls and delegation are nothing more than messages between objects, so the delegation hierarchy is just a simple networked system.</p><p>Unfortunately, the use of inheritance over composition often leads to systems that, paradoxically, increase code duplication. The main problem lies in the fact that inheritance can directly delegate to only one other class (the parent class), as opposed to composition, where the object can delegate to any number of other ones. This limitation of inheritance means that we might have a class that inherits from another one because it needs some of its features, but doing this receives features it doesn't want, or shouldn't have.</p><p>Let's continue the example of the code management portal, and consider an <code>issue</code>, which is an item that we want to store in the system, but cannot be reviewed by a user. If we create a hierarchy like this</p><div class="code"><div class="content"><div class="highlight"><pre> assignable reviewable item
(assign_to_user, ask_review_to_user)
^
|
|
|
|
+--------+--------+
| |
| |
| |
issue pull request
(not reviewable)
</pre></div> </div> </div><p>we end up putting the features related to the review process in an object that shouldn't have them. The standard solution to this problem is that of increasing the depth of the inheritance hierarchy and to derive from the new simpler ancestor.</p><div class="code"><div class="content"><div class="highlight"><pre> assignable item
(assign_to_user)
^
|
|
|
|
+------+--------------+
| |
| |
| |
| reviewable assignable item
| (ask_review_to_user)
| ^
| |
| |
| |
issue pull request
</pre></div> </div> </div><p>However, this approach stops being viable as soon as an object needs to inherit from a given class but not from the parent of that class. For example, an element that has to be reviewable but not assignable, like a <code>best practice</code> that we want to add to the site. If we want to keep using inheritance, the only solution at this point is to duplicate the code that implements the reviewable nature of the item (or the code that implements the assignable feature) and create two different class hierarchies.</p><div class="code"><div class="content"><div class="highlight"><pre> assignable item +--------> reviewable item
(assign_to_user) | (ask_review_to_user)
^ | ^
| | |
| | |
| CODE DUPLICATION |
| | |
+------+--------------+ | |
| | | |
| | | |
| | V |
| reviewable assignable item |
| (ask_review_to_user) |
| ^ |
| | |
| | |
| | |
issue pull request best practice
</pre></div> </div> </div><p>Please note that this doesn't even take into account that the new <code>reviewable item</code> might need attributes from <code>assignable item</code>, which prompts for another level of depth in the hierarchy, where we isolate those features in a more generic class. So, unfortunately, chances are that this is only the first of many compromises we will have to accept to keep the system in a stable state if we can't change our approach.</p><p>Multiple inheritance was then introduced in OOP, as it was clear that an object might want to delegate certain actions to a given class, and other actions to a different one, mimicking what life forms do when they inherit traits from multiple ancestors (parents, grandparents, etc.).</p><p>The above situation can then be solved having <code>pull request</code> inherit from both the class that provides the assign feature and from the one that implements the reviewable nature. </p><div class="code"><div class="content"><div class="highlight"><pre> assignable item reviewable item
(assign_to_user) (ask_review_to_user)
^ ^ ^
| | |
| | |
| | |
| | |
+------+-------------+ +----------------------+ |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
issue pull request best practice
</pre></div> </div> </div><p>Generally speaking, then, multiple inheritance is introduced to give the programmer a way to keep using inheritance without introducing code duplication, keeping the class hierarchy simpler and cleaner. Eventually, everything we do in software design is to try and separate concerns, that is, to isolate features, and multiple inheritance can help to do this.</p><p>These are just examples and might be valid or not, depending on the concrete case, but they clearly show the issues that we can have even with a very simple hierarchy of 4 classes. Many of these problems clearly arise from the fact that we wanted to implement delegation only through inheritance, and I dare to say that 80% of the architectural errors in OOP projects come from using inheritance instead of composition and from using god objects, that is classes that have responsibilities over too many different parts of the system. Always remember that OOP was born with the idea of small objects interacting through messages, so the considerations we make for monolithic architectures are valid even here.</p><p>That said, as inheritance and composition implement two different types of delegation (<em>to be</em> and <em>to have</em>), they are both valuable, and multiple inheritance is the way to remove the single provider limitation that comes from having only one parent class.</p><h3 id="why-is-it-controversial-a9c1">Why is it controversial?</h3><p>Given what I just said, multiple inheritance seems to be a blessing. When an object can inherit from multiple parents, we can easily spread responsibilities among different classes and use only the ones we need, promoting code reuse and avoiding god objects.</p><p>Unfortunately, things are not that simple. First of all, we face the issue that every microservice-oriented architecture faces, that is the risk of going from god objects (the extreme monolithic architecture) to almost empty objects (the extreme distributed approach), burdening the programmer with too a fine-grained control that eventually results in a system where relationships between objects are so complicated that it becomes impossible to grasp the effect of a change in the code.</p><p>There is a more immediate problem in multiple inheritance, though. As it happens with the natural inheritance, parents can provide the same "genetic trait" in two different flavours, but the resulting individual will have only one. Leaving aside genetics (which is incredibly more complicated than programming) and going back to OOP, we face a problem when an object inherits from two other objects that provide the same attribute.</p><p>So, if your class <code>Child</code> inherits from parents <code>Parent1</code> and <code>Parent2</code>, and both provide the <code>__init__</code> method, which one should your object use?</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Parent1</span><span class="p">():</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">class</span> <span class="nc">Parent2</span><span class="p">():</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">class</span> <span class="nc">Child</span><span class="p">(</span><span class="n">Parent1</span><span class="p">,</span> <span class="n">Parent2</span><span class="p">):</span>
<span class="c1"># This inherits from both Parent1 and Parent2,</span>
<span class="c1"># which __init__ does it use?</span>
<span class="k">pass</span>
</pre></div> </div> </div><p>Things can even get worse, as parents can have different signatures of the common method, for example</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Parent1</span><span class="p">:</span>
<span class="c1"># This inherits from Ancestor but redefines __init__</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">status</span><span class="p">):</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">class</span> <span class="nc">Parent2</span><span class="p">:</span>
<span class="c1"># This inherits from Ancestor but redefines __init__</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">class</span> <span class="nc">Child</span><span class="p">(</span><span class="n">Parent1</span><span class="p">,</span> <span class="n">Parent2</span><span class="p">):</span>
<span class="c1"># This inherits from both Parent1 and Parent2,</span>
<span class="c1"># which __init__ does it use?</span>
<span class="k">pass</span>
</pre></div> </div> </div><p>The problem can be extended even further, introducing a common ancestor above <code>Parent1</code> and <code>Parent2</code>.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Ancestor</span><span class="p">:</span>
<span class="c1"># The common ancestor, defines its own __init__ method</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">class</span> <span class="nc">Parent1</span><span class="p">(</span><span class="n">Ancestor</span><span class="p">):</span>
<span class="c1"># This inherits from Ancestor but redefines __init__</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">status</span><span class="p">):</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">class</span> <span class="nc">Parent2</span><span class="p">(</span><span class="n">Ancestor</span><span class="p">):</span>
<span class="c1"># This inherits from Ancestor but redefines __init__</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">class</span> <span class="nc">Child</span><span class="p">(</span><span class="n">Parent1</span><span class="p">,</span> <span class="n">Parent2</span><span class="p">):</span>
<span class="c1"># This inherits from both Parent1 and Parent2,</span>
<span class="c1"># which __init__ does it use?</span>
<span class="k">pass</span>
</pre></div> </div> </div><p>As you can see, we already have a problem when we introduce multiple parents, and a common ancestor just adds a new level of complexity. The ancestor class can clearly be at any point of the inheritance tree (grandparent, grand-grandparent, etc.), the important part is that it is shared between <code>Parent1</code> and <code>Parent2</code>. This is the so-called diamond problem, as the inheritance graph has the shape of a diamond</p><div class="code"><div class="content"><div class="highlight"><pre> Ancestor
^ ^
/ \
/ \
Parent1 Parent2
^ ^
\ /
\ /
Child
</pre></div> </div> </div><p>So, while with single-parent inheritance the rules are straightforward, with multiple inheritance we immediately have a more complex situation that doesn't have a trivial solution. Does all this prevent multiple inheritance from being implemented?</p><p>Not at all! There are solutions to this problem, as we will see shortly, but this further level of intricacy makes multiple inheritance something that doesn't fit easily in a design and has to be implemented carefully to avoid subtle bugs. Remember that inheritance is an automatic delegation mechanism, as this makes what happens in the code less evident. For these reasons, multiple inheritance is often depicted as scary and convoluted, and usually given some space only in the advanced OOP courses, at least in the Python world. I believe every Python programmer, instead, should familiarise with it and learn how to take advantage of it.</p>
<div class="advertisement">
<a href="https://www.thedigitalcat.academy/freebie-first-class-objects">
<img src="/images/first-class-objects/cover.jpg" />
</a>
<div class="body">
<h2 id="first-class-objects-in-python-fffa">First-class objects in Python<a class="headerlink" href="#first-class-objects-in-python-fffa" title="Permanent link">¶</a></h2>
<p>Higher-order functions, wrappers, and factories</p>
<p>Learn all you need to know to understand first-class citizenship in Python, the gateway to grasp how decorators work and how functional programming can supercharge your code.</p>
<div class="actions">
<a class="action" href="https://www.thedigitalcat.academy/freebie-first-class-objects">Get your FREE copy</a>
</div>
</div>
</div>
<h3 id="multiple-inheritance-the-python-way-d87f">Multiple inheritance: the Python way</h3><p>Let's see how it is possible to solve the diamond problem. Unlike genetics, we programmers can't afford any level of uncertainty or randomness in our processes, so in the presence of a possible ambiguity as the one created by multiple inheritance, we need to write down a rule that will be strictly followed in every case. In Python, this rule goes by the name of MRO (Method Resolution Order), which was introduced in Python 2.3 and is described in <a href="https://www.python.org/download/releases/2.3/mro/">this document</a> by Michele Simionato.</p><p>There is a lot to say about MRO and the underlying C3 linearisation algorithm, but for the scope of this post, it is enough to see how it solves the diamond problem. In case of multiple inheritance, Python follows the usual inheritance rules (automatic delegation to an ancestor if the attribute is not present locally), but the <em>order</em> followed to traverse the inheritance tree now includes all the classes that are specified in the class signature. In the example above, Python would look for attributes in the following order: <code>Child</code>, <code>Parent1</code>, <code>Parent2</code>, <code>Ancestor</code>.</p><p>So, as in the case of standard inheritance, this means that the first class in the list that implements a specific attribute will be the selected provider for that resolution. An example might clarify the matter</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">Ancestor</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">rewind</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Ancestor: rewind"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Parent1</span><span class="p">(</span><span class="n">Ancestor</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">open</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Parent1: open"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Parent2</span><span class="p">(</span><span class="n">Ancestor</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">open</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Parent2: open"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">close</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Parent2: close"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">flush</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Parent2: flush"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Child</span><span class="p">(</span><span class="n">Parent1</span><span class="p">,</span> <span class="n">Parent2</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">flush</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Child: flush"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">Child</span><span class="o">.</span><span class="vm">__mro__</span><span class="p">)</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">Child</span><span class="p">()</span>
<span class="n">c</span><span class="o">.</span><span class="n">rewind</span><span class="p">()</span>
<span class="n">c</span><span class="o">.</span><span class="n">open</span><span class="p">()</span>
<span class="n">c</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="n">c</span><span class="o">.</span><span class="n">flush</span><span class="p">()</span>
</pre></div> </div> </div><p>As you can see, we can access the MRO of any class reading its <code>__mro__</code> attribute, and as we expected its value is <code>(<class '__main__.Child'>, <class '__main__.Parent1'>, <class '__main__.Parent2'>, <class '__main__.Ancestor'>, <class 'object'>)</code>.</p><p>So, in this case an instance <code>c</code> of <code>Child</code> provides <code>rewind</code>, <code>open</code>, <code>close</code>, and <code>flush</code>. When <code>c.rewind</code> is called, the code in <code>Ancestor</code> is executed, as this is the first class in the MRO list that provides that method. The method <code>open</code> is provided by <code>Parent1</code>, while <code>close</code> is provided by <code>Parent2</code>. If the method <code>c.flush</code> is called, the code is provided by the <code>Child</code> class itself, that redefines it overriding the one provided by <code>Parent2</code>.</p><p>As we see with the <code>flush</code> method, Python doesn't change its behaviour when it comes to method overriding with multiple parents. The first implementation of a method with that name is executed, and the parent's implementation is not automatically called. As in the case of standard inheritance, then, it's up to us to design classes with matching method signatures.</p><h3 id="under-the-bonnet-cf95">Under the bonnet</h3><p>How does multiple inheritance work internally? How does Python create the MRO list?</p><p>Python has a very simple approach to OOP (even though it ultimately ends with a mind-blowing ouroboros, see <a href="https://www.thedigitalcatonline.com/blog/2014/09/01/python-3-oop-part-5-metaclasses/">here</a>). Classes are objects themselves, so they contain data structures that are used by the language to provide features, and delegation makes no exception. When we run a method on an object, Python silently uses the <code>__getattribute__</code> method (provided by <code>object</code>), which uses <code>__class__</code> to reach the class from the instance, and <code>__bases__</code> to find the parent classes. The latter, in particular, is a tuple, so it is ordered, and it contains all the classes that the current class inherits from.</p><p>The MRO is created using only <code>__bases__</code>, but the underlying algorithm is not that trivial and has to with the monotonicity of the resulting class linearisation. It is less scary than it sounds, but not something you want to read while suntanning, probably. If that's the case, the aforementioned <a href="https://www.python.org/download/releases/2.3/mro/">document</a> by Michele Simionato contains all the gory details on class linearisation that you always wanted to explore while lying on the beach.</p><h2 id="inheritance-and-interfaces-42cb">Inheritance and interfaces<a class="headerlink" href="#inheritance-and-interfaces-42cb" title="Permanent link">¶</a></h2><p>To approach mixins, we need to discuss inheritance in detail, and specifically the role of method signatures.</p><p>In Python, when you override a method provided by an ancestor class, you have to decide if and when to call its original implementation. This gives the programmer the freedom to decide whether they need to just augment a method or to replace it completely. Remember that the only thing Python does when a class inherits from another is to automatically delegate methods that are not implemented.</p><p>When a class inherits from another we are ideally creating objects that keep the backward compatibility with the interface of the parent class, to allow a polymorphic use of them. This means that when we inherit from a class and override a method changing its signature we are doing something that is dangerous and, at least from the point of view of polymorphism, wrong. Have a look at this example</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">GraphicalEntity</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_x</span> <span class="o">=</span> <span class="n">pos_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_y</span> <span class="o">=</span> <span class="n">pos_y</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_x</span> <span class="o">=</span> <span class="n">size_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_y</span> <span class="o">=</span> <span class="n">size_y</span>
<span class="k">def</span> <span class="nf">move</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_x</span> <span class="o">=</span> <span class="n">pos_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_y</span> <span class="o">=</span> <span class="n">pos_y</span>
<span class="k">def</span> <span class="nf">resize</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_x</span> <span class="o">=</span> <span class="n">size_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_y</span> <span class="o">=</span> <span class="n">size_y</span>
<span class="k">class</span> <span class="nc">Rectangle</span><span class="p">(</span><span class="n">GraphicalEntity</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">class</span> <span class="nc">Square</span><span class="p">(</span><span class="n">GraphicalEntity</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">size</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">resize</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">size</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">resize</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">size</span><span class="p">)</span>
</pre></div> </div> </div><p>Please note that <code>Square</code> changes the signature of both <code>__init__</code> and <code>resize</code>. Now, when we instantiate those classes we need to keep in mind the different signature of <code>__init__</code> in <code>Square</code></p><div class="code"><div class="content"><div class="highlight"><pre><span class="n">r1</span> <span class="o">=</span> <span class="n">Rectangle</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">30</span><span class="p">)</span>
<span class="n">r2</span> <span class="o">=</span> <span class="n">Rectangle</span><span class="p">(</span><span class="mi">150</span><span class="p">,</span> <span class="mi">280</span><span class="p">,</span> <span class="mi">23</span><span class="p">,</span> <span class="mi">55</span><span class="p">)</span>
<span class="n">q1</span> <span class="o">=</span> <span class="n">Square</span><span class="p">(</span><span class="mi">300</span><span class="p">,</span> <span class="mi">400</span><span class="p">,</span> <span class="mi">50</span><span class="p">)</span>
</pre></div> </div> </div><p>We usually accept that an enhanced version of a class accepts different parameters when it is initialised, as we do not expect it to be polymorphic on <code>__init__</code>. Problems arise when we try to leverage polymorphism on other methods, for example resizing all <code>GraphicalEntity</code> objects in a list</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">for</span> <span class="n">shape</span> <span class="ow">in</span> <span class="p">[</span><span class="n">r1</span><span class="p">,</span> <span class="n">r2</span><span class="p">,</span> <span class="n">q1</span><span class="p">]:</span>
<span class="n">size_x</span> <span class="o">=</span> <span class="n">shape</span><span class="o">.</span><span class="n">size_x</span>
<span class="n">size_y</span> <span class="o">=</span> <span class="n">shape</span><span class="o">.</span><span class="n">size_y</span>
<span class="n">shape</span><span class="o">.</span><span class="n">resize</span><span class="p">(</span><span class="n">size_x</span><span class="o">*</span><span class="mi">2</span><span class="p">,</span> <span class="n">size_y</span><span class="o">*</span><span class="mi">2</span><span class="p">)</span>
</pre></div> </div> </div><p>Since <code>r1</code>, <code>r2</code>, and <code>q1</code> are all objects that inherit from <code>GraphicalEntity</code> we expect them to provide the interface provided by that class, but this fails, because <code>Square</code> changed the signature of <code>resize</code>. The same would happen if we instantiated them in a for loop from a list of classes, but as I said it is generally accepted that child classes change the signature of the <code>__init__</code> method. This is not true, for example, in a plugin-based system, where all plugins shall be initialised the same way.</p><p>This is a classic problem in OOP. While we, as humans, perceive a square just as a slightly special rectangle, from the interface point of view the two classes are different, and thus should not be in the same inheritance tree when we are dealing with dimensions. This is an important consideration: <code>Rectangle</code> and <code>Square</code> are polymorphic on the <code>move</code> method, but not on <code>__init__</code> and <code>resize</code>. So, the question is if we could somehow separate the two natures of being movable and resizeable.</p><p>Now, discussing interfaces, polymorphism, and the reasons behind them would require an entirely separate post, so in the following sections, I'm going to ignore the matter and just consider the object interface optional. You will thus find examples of objects that break the interface of the parent, and objects that keep it. Just remember: whenever you change the signature of a method you change the (implicit) interface of the object, and thus you stop polymorphism. I'll discuss another time if I consider this right or wrong.</p><h2 id="mixin-classes-cf82">Mixin classes<a class="headerlink" href="#mixin-classes-cf82" title="Permanent link">¶</a></h2><p>MRO is a good solution that prevents ambiguity, but it leaves programmers with the responsibility of creating sensible inheritance trees. The algorithm helps to resolve complicated situations, but this doesn't mean we should create them in the first place. So, how can we leverage multiple inheritance without creating systems that are too complicated to grasp? Moreover, is it possible to use multiple inheritance to solve the problem of managing the double (or multiple) nature of an object, as in the previous example of a movable and resizeable shape?</p><p>The solution comes from mixin classes: those are small classes that provide attributes but are not included in the standard inheritance tree, working more as "additions" to the current class than as proper ancestors. Mixins originate in the LISP programming language, and specifically in what could be considered the first version of the Common Lisp Object System, the Flavors extension. Modern OOP languages implement mixins in many different ways: Scala, for example, has a feature called <em>traits</em>, which live in their own space with a specific hierarchy that doesn't interfere with the proper class inheritance.</p><h3 id="mixin-classes-in-python-a023">Mixin classes in Python</h3><p>Python doesn't provide support for mixins with any dedicated language feature, so we use multiple inheritance to implement them. This clearly requires great discipline from the programmer, as it violates one of the main assumptions for mixins: their orthogonality to the inheritance tree. In Python, so-called mixins are classes that live in the normal inheritance tree, but they are kept small to avoid creating hierarchies that are too complicated for the programmer to grasp. In particular, mixins shouldn't have common ancestors other than <code>object</code> with the other parent classes.</p><p>Let's have a look at a simple example</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">GraphicalEntity</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_x</span> <span class="o">=</span> <span class="n">pos_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_y</span> <span class="o">=</span> <span class="n">pos_y</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_x</span> <span class="o">=</span> <span class="n">size_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_y</span> <span class="o">=</span> <span class="n">size_y</span>
<span class="k">class</span> <span class="nc">ResizableMixin</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">resize</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_x</span> <span class="o">=</span> <span class="n">size_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_y</span> <span class="o">=</span> <span class="n">size_y</span>
<span class="k">class</span> <span class="nc">ResizableGraphicalEntity</span><span class="p">(</span><span class="n">GraphicalEntity</span><span class="p">,</span> <span class="n">ResizableMixin</span><span class="p">):</span>
<span class="k">pass</span>
<span class="n">rge</span> <span class="o">=</span> <span class="n">ResizableGraphicalEntity</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="mi">300</span><span class="p">)</span>
<span class="n">rge</span><span class="o">.</span><span class="n">resize</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span> <span class="mi">2000</span><span class="p">)</span>
</pre></div> </div> </div><p>Here, the class <code>ResizableMixin</code> doesn't inherit from <code>GraphicalEntity</code>, but directly from <code>object</code>, so <code>ResizableGraphicalEntity</code> gets from it just the <code>resize</code> method. As we said before, this simplifies the inheritance tree of <code>ResizableGraphicalEntity</code> and helps to reduce the risk of the diamond problem. It leaves us free to use <code>GraphicalEntity</code> as a parent for other classes without having to inherit methods that we don't want. Please remember that this happens because the classes are designed to avoid it, and not because of language features: the MRO algorithm just ensures that there will always be an unambiguous choice in case of multiple ancestors.</p><p>Mixins cannot usually be too generic. After all, they are designed to add features to classes, but these new features often interact with other pre-existing features of the augmented class. In this case, the <code>resize</code> method interacts with the attributes <code>size_x</code> and <code>size_y</code> that have to be present in the object. Obviously, there are obviously examples of <em>pure</em> mixins, but since they would require no initialization their scope is definitely limited.</p><h3 id="using-mixins-to-hijack-inheritance-030c">Using mixins to hijack inheritance</h3><p>Thanks to the MRO, Python programmers can leverage multiple inheritance to override methods that objects inherit from their parents, allowing them to customise classes without code duplication. Let's have a look at this example</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">GraphicalEntity</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_x</span> <span class="o">=</span> <span class="n">pos_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_y</span> <span class="o">=</span> <span class="n">pos_y</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_x</span> <span class="o">=</span> <span class="n">size_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_y</span> <span class="o">=</span> <span class="n">size_y</span>
<span class="k">class</span> <span class="nc">Button</span><span class="p">(</span><span class="n">GraphicalEntity</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="kc">False</span>
<span class="k">def</span> <span class="nf">toggle</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">status</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">Button</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
</pre></div> </div> </div><p>As you can see the <code>Button</code> class extends the <code>GraphicalEntity</code> one in a classic way, using <code>super</code> to call the parent's <code>__init__</code> method before adding the new <code>status</code> attribute. Now, if I wanted to create a <code>SquareButton</code> class I have two choices.</p><p>I might just override <code>__init__</code> in the new class</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">GraphicalEntity</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_x</span> <span class="o">=</span> <span class="n">pos_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_y</span> <span class="o">=</span> <span class="n">pos_y</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_x</span> <span class="o">=</span> <span class="n">size_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_y</span> <span class="o">=</span> <span class="n">size_y</span>
<span class="k">class</span> <span class="nc">Button</span><span class="p">(</span><span class="n">GraphicalEntity</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="kc">False</span>
<span class="k">def</span> <span class="nf">toggle</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">status</span>
<span class="k">class</span> <span class="nc">SquareButton</span><span class="p">(</span><span class="n">Button</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">size</span><span class="p">)</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">SquareButton</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">200</span><span class="p">)</span>
</pre></div> </div> </div><p>which performs the requested job, but strongly connects the feature of having a single dimension with the <code>Button</code> nature. If we wanted to create a circular image we could not inherit from <code>SquareButton</code>, as the image has a different nature.</p><p>The second option is that of isolating the features connected with having a single dimension in a mixin class, and add it as a parent for the new class</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">GraphicalEntity</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_x</span> <span class="o">=</span> <span class="n">pos_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_y</span> <span class="o">=</span> <span class="n">pos_y</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_x</span> <span class="o">=</span> <span class="n">size_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_y</span> <span class="o">=</span> <span class="n">size_y</span>
<span class="k">class</span> <span class="nc">Button</span><span class="p">(</span><span class="n">GraphicalEntity</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="kc">False</span>
<span class="k">def</span> <span class="nf">toggle</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">status</span>
<span class="k">class</span> <span class="nc">SingleDimensionMixin</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">size</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">SquareButton</span><span class="p">(</span><span class="n">SingleDimensionMixin</span><span class="p">,</span> <span class="n">Button</span><span class="p">):</span>
<span class="k">pass</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">SquareButton</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">200</span><span class="p">)</span>
</pre></div> </div> </div><p>The second solution gives the same final result, but promotes code reuse, as now the <code>SingleDimensionMixin</code> class can be applied to other classes derived from <code>GraphicalEntity</code> and make them accept only one size, while in the first solution that feature was tightly connected with the <code>Button</code> ancestor class.</p><p>Please note that the position of the mixin is important as <code>super</code> follows the MRO. As it is, the MRO of <code>SquareButton</code> is <code>(SquareButton, SingleDimensionMixin, Button, GraphicalEntity, object)</code>, so, when we instantiate it the <code>__init__</code> method is provided by <code>SingleDimensionMixin</code>, which in turn calls through <code>super</code> the method <code>__init__</code> of <code>Button</code>. The call <code>super().__init__(pos_x, pos_y, size, size)</code> in <code>SingleDimensionMixin</code> and the signature <code>def __init__(self, pos_x, pos_y, size_x, size_y):</code> in <code>Button</code> match, so everything works.</p><p>If we defined <code>SquareButton</code> as</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SquareButton</span><span class="p">(</span><span class="n">Button</span><span class="p">,</span> <span class="n">SingleDimensionMixin</span><span class="p">):</span>
<span class="k">pass</span>
</pre></div> </div> </div><p>then the <code>__init__</code> method would first be provided by <code>Button</code>, and its <code>super</code> would call the <code>__init__</code> method of <code>GraphicalEntity</code>. This would however result in an error, as we run <code>SquareButton(10, 20, 200)</code>, and <code>Button.__init__</code> expects four parameters.</p><p>Mixins are not used only when you want to change the object's interface, though. Leveraging <code>super</code> we can achieve interesting designs like</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">GraphicalEntity</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_x</span> <span class="o">=</span> <span class="n">pos_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pos_y</span> <span class="o">=</span> <span class="n">pos_y</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_x</span> <span class="o">=</span> <span class="n">size_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_y</span> <span class="o">=</span> <span class="n">size_y</span>
<span class="k">class</span> <span class="nc">Button</span><span class="p">(</span><span class="n">GraphicalEntity</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">):</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="kc">False</span>
<span class="k">def</span> <span class="nf">toggle</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">status</span> <span class="o">=</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">status</span>
<span class="k">class</span> <span class="nc">LimitSizeMixin</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">):</span>
<span class="n">size_x</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">size_x</span><span class="p">,</span> <span class="mi">500</span><span class="p">)</span>
<span class="n">size_y</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">size_y</span><span class="p">,</span> <span class="mi">400</span><span class="p">)</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">pos_x</span><span class="p">,</span> <span class="n">pos_y</span><span class="p">,</span> <span class="n">size_x</span><span class="p">,</span> <span class="n">size_y</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">LimitSizeButton</span><span class="p">(</span><span class="n">LimitSizeMixin</span><span class="p">,</span> <span class="n">Button</span><span class="p">):</span>
<span class="k">pass</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">LimitSizeButton</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">2000</span><span class="p">,</span> <span class="mi">1000</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">b</span><span class="o">.</span><span class="n">size_x</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">b</span><span class="o">.</span><span class="n">size_y</span><span class="p">)</span>
</pre></div> </div> </div><p>Here, the MRO or <code>LimitSizeButton</code> is <code>(<class '__main__.LimitSizeButton'>, <class '__main__.LimitSizeMixin'>, <class '__main__.Button'>, <class '__main__.GraphicalEntity'>, <class 'object'>)</code>, which means that when we initialize it the <code>__init__</code> method is first provided by <code>LimitSizeMixin</code>, which then calls through <code>super</code> the <code>__init__</code> method of <code>Button</code>, and through the latter the <code>__init__</code> method of <code>GraphicalEntity</code>.</p><p>Remember that in Python, you are never forced to call the parent's implementation of a method, so the mixin here might also stop the dispatching mechanism if that is the requirement of the business logic of the new object.</p>
<div class="advertisement">
<a href="https://www.thedigitalcat.academy/freebie-first-class-objects">
<img src="/images/first-class-objects/cover.jpg" />
</a>
<div class="body">
<h2 id="first-class-objects-in-python-fffa">First-class objects in Python<a class="headerlink" href="#first-class-objects-in-python-fffa" title="Permanent link">¶</a></h2>
<p>Higher-order functions, wrappers, and factories</p>
<p>Learn all you need to know to understand first-class citizenship in Python, the gateway to grasp how decorators work and how functional programming can supercharge your code.</p>
<div class="actions">
<a class="action" href="https://www.thedigitalcat.academy/freebie-first-class-objects">Get your FREE copy</a>
</div>
</div>
</div>
<h2 id="a-concrete-example-django-class-based-views-f83d">A concrete example: Django class-based views<a class="headerlink" href="#a-concrete-example-django-class-based-views-f83d" title="Permanent link">¶</a></h2><p>Finally, let's get to the original source of inspiration for this post: the Django codebase. I will show you here how the Django programmers used multiple inheritance and mixin classes to promote code reuse, and you will now hopefully grasp all the reasons behind them.</p><p>The example I chose can be found in the <a href="https://github.com/django/django/blob/3.0/django/views/generic/base.py#L117">code of generic views</a>, and in particular in two classes: <code>TemplateResponseMixin</code> and <code>TemplateView</code>.</p><p>As you might know, Django <code>View</code> class is the ancestor of all class-based views and provides a <code>dispatch</code> method that converts HTTP request methods into Python function calls (<a href="https://github.com/django/django/blob/3.0/django/views/generic/base.py#L89">CODE</a>). Now, the <code>TemplateView</code> is a view that answers to a GET request rendering a template with the data coming from a context passed when the view is called. Given the mechanism behind Django views, then, <code>TemplateView</code> should implement a <code>get</code> method and return the content of the HTTP response. The code of the class is</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">TemplateView</span><span class="p">(</span><span class="n">TemplateResponseMixin</span><span class="p">,</span> <span class="n">ContextMixin</span><span class="p">,</span> <span class="n">View</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""</span>
<span class="sd"> Render a template. Pass keyword arguments from the URLconf to the context.</span>
<span class="sd"> """</span>
<span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">context</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_context_data</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">render_to_response</span><span class="p">(</span><span class="n">context</span><span class="p">)</span>
</pre></div> </div> </div><p>As you can see <code>TemplateView</code> is a <code>View</code>, but it uses two mixins to inject features. Let's have a look at <code>TemplateResponseMixin</code></p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">TemplateResponseMixin</span><span class="p">:</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">render_to_response</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">context</span><span class="p">,</span> <span class="o">**</span><span class="n">response_kwargs</span><span class="p">):</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">get_template_names</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
</pre></div> </div> </div><p>It is clear that <code>TemplateResponseMixin</code> just adds to any class the two methods <code>get_template_names</code> and <code>render_to_response</code>. The latter is called in the <code>get</code> method of <code>TemplateView</code> to create the response. Let's have a look at a simplified schema of the calls:</p><div class="code"><div class="content"><div class="highlight"><pre>GET request --> TemplateView.dispatch --> View.dispatch --> TemplateView.get --> TemplateResponseMixin.render_to_response
</pre></div> </div> </div><p>It might look complicated, but try to follow the code a couple of times and the whole picture will start to make sense. The important thing I want to stress is that the code in <code>TemplateResponseMixin</code> is available for any class that wants to have the feature of rendering a template, for example <code>DetailView</code> (<a href="https://github.com/django/django/blob/3.0/django/views/generic/detail.py#L164">CODE</a>), which receives the feature of showing the details of a single object by <code>SingleObjectTemplateResponseMixin</code>, which inherits from <code>TemplateResponseMixin</code>, overriding its method <code>get_template_names</code> (<a href="https://github.com/django/django/blob/3.0/django/views/generic/detail.py#L111">CODE</a>).</p><p>As we discussed before, mixins cannot be too generic, and here we see a good example of a mixin designed to work on specific classes. <code>TemplateResponseMixin</code> has to be applied to classes that contain <code>self.request</code> (<a href="https://github.com/django/django/blob/3.0/django/views/generic/base.py#L133">CODE</a>), and while this doesn't mean exclusively classes derived from <code>View</code>, it is clear that it has been designed to augment that specific type.</p><h2 id="takeaway-points-72a2">Takeaway points<a class="headerlink" href="#takeaway-points-72a2" title="Permanent link">¶</a></h2><ul><li>Inheritance is designed to promote code reuse but can lead to the opposite result</li><li>Multiple inheritance allows us to keep the inheritance tree simple</li><li>Multiple inheritance leads to possible problems that are solved in Python through the MRO</li><li>Interfaces (either implicit or explicit) should be part of your design</li><li>Mixin classes are used to add simple changes to classes</li><li>Mixins are implemented in Python using multiple inheritance: they have great expressive power but require careful design.</li></ul><h2 id="final-words-9803">Final words<a class="headerlink" href="#final-words-9803" title="Permanent link">¶</a></h2><p>I hope this post helped you to understand a bit more how multiple inheritance works, and to be less scared by it. I also hope I managed to show you that classes have to be carefully designed and that there is a lot to consider when you create a class system. Once again, please don't forget composition, it's a powerful and too often forgotten tool.</p><h2 id="updates-0083">Updates<a class="headerlink" href="#updates-0083" title="Permanent link">¶</a></h2><p>2020-03-13: GitHub user <a href="https://github.com/sureshvv">sureshvv</a> noticed that the <code>LimitSizeMixin</code> method <code>__init__</code> had the wrong parameters <code>pos_x</code> and <code>pos_y</code>, instead of <code>size_x</code> and <code>size_y</code>. Thanks!</p><p>2021-12-20: <a href="https://github.com/akocur">Alexander</a> fixed a mistake in the part relative to <code>SquareButton</code> and the behaviour of <code>super()</code>. Thanks!</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>Dissecting a Web stack2020-02-16T15:00:00+00:002020-10-27T08:30:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-02-16:/blog/2020/02/16/dissecting-a-web-stack/<p>A layer-by-layer review of the components of a web stack and the reasons behind them</p><blockquote>
<p>It was gross. They wanted me to dissect a frog.</p>
<p>(Beetlejuice, 1988)</p>
</blockquote>
<h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2>
<p>Having recently worked with young web developers who were exposed for the first time to proper production infrastructure, I received many questions about the various components that one can find in the architecture of a "Web service". These questions clearly expressed the confusion (and sometimes the frustration) of developers who understand how to create endpoints in a high-level language such as Node.js or Python, but were never introduced to the complexity of what happens between the user's browser and their framework of choice. Most of the times they don't know why the framework itself is there in the first place.</p>
<p>The challenge is clear if we just list (in random order), some of the words we use when we discuss (Python) Web development: HTTP, cookies, web server, Websockets, FTP, multi-threaded, reverse proxy, Django, nginx, static files, POST, certificates, framework, Flask, SSL, GET, WSGI, session management, TLS, load balancing, Apache.</p>
<p>In this post, I want to review all the words mentioned above (and a couple more) trying to build a production-ready web service from the ground up. I hope this might help young developers to get the whole picture and to make sense of these "obscure" names that senior developers like me tend to drop in everyday conversations (sometimes arguably out of turn).</p>
<p>As the focus of the post is the global architecture and the reasons behind the presence of specific components, the example service I will use will be a basic HTML web page. The reference language will be Python but the overall discussion applies to any language or framework.</p>
<p>My approach will be that of first stating the rationale and then implementing a possible solution. After this, I will point out missing pieces or unresolved issues and move on with the next layer. At the end of the process, the reader should have a clear picture of why each component has been added to the system.</p>
<h2 id="the-perfect-architecture">The perfect architecture<a class="headerlink" href="#the-perfect-architecture" title="Permanent link">¶</a></h2>
<p>A very important underlying concept of system architectures is that there is no <em>perfect solution</em> devised by some wiser genius, that we just need to apply. Unfortunately, often people mistake design patterns for such a "magic solution". The "Design Patterns" original book, however, states that</p>
<blockquote>
<p>Your design should be specific to the problem at hand but also general enough to address future problems and requirements. You also want to avoid redesign, or at least minimize it.</p>
</blockquote>
<p>And later</p>
<blockquote>
<p>Design patterns make it easier to reuse successful designs and architectures. [...] Design patterns help you choose design alternatives that make a system reusable and avoid alternatives that compromise reusability.</p>
</blockquote>
<p>The authors of the book are discussing Object-oriented Programming, but these sentences can be applied to any architecture. As you can see, we have a "problem at hand" and "design alternatives", which means that the most important thing to understand is the requirements, both the present and future ones. Only with clear requirements in mind, one can effectively design a solution, possibly tapping into the great number of patterns that other designers already devised.</p>
<p>A very last remark. A web stack is a complex beast, made of several components and software packages developed by different programmers with different goals in mind. It is perfectly understandable, then, that such components have some degree of superposition. While the division line between theoretical layers is usually very clear, in practice the separation is often blurry. Expect this a lot, and you will never be lost in a web stack anymore.</p>
<h2 id="some-definitions">Some definitions<a class="headerlink" href="#some-definitions" title="Permanent link">¶</a></h2>
<p>Let's briefly review some of the most important concepts involved in a Web stack, the protocols.</p>
<h3 id="tcpip">TCP/IP<a class="headerlink" href="#tcpip" title="Permanent link">¶</a></h3>
<p>TCP/IP is a network protocol, that is, a <em>set of established rules</em> two computers have to follow to get connected over a physical network to exchange messages. TCP/IP is composed of two different protocols covering two different layers of the OSI stack, namely the Transport (TCP) and the Network (IP) ones. TCP/IP can be implemented on top of any physical interface (Data Link and Physical OSI layers), such as Ethernet and Wireless. Actors in a TCP/IP network are identified by a <em>socket</em>, which is a tuple made of an IP address and a port number.</p>
<p>As far as we are concerned when developing a Web service, however, we need to be aware that TCP/IP is a <em>reliable</em> protocol, which in telecommunications means that the protocol itself takes care or retransmissions when packets get lost. In other words, while the speed of the communication is not granted, we can be sure that once a message is sent it will reach its destination without errors.</p>
<h3 id="http">HTTP<a class="headerlink" href="#http" title="Permanent link">¶</a></h3>
<p>TCP/IP can guarantee that the raw bytes one computer sends will reach their destination, but this leaves completely untouched the problem of how to send meaningful information. In particular, in 1989 the problem Tim Barners-Lee wanted to solve was how to uniquely name hypertext resources in a network and how to access them.</p>
<p>HTTP is the protocol that was devised to solve such a problem and has since greatly evolved. With the help of other protocols such as WebSocket, HTTP invaded areas of communication for which it was originally considered unsuitable such as real-time communication or gaming.</p>
<p>At its core, HTTP is a protocol that states the format of a text request and the possible text responses. The initial version 0.9 published in 1991 defined the concept of URL and allowed only the GET operation that requested a specific resource. HTTP 1.0 and 1.1 added crucial features such as headers, more methods, and important performance optimisations. At the time of writing the adoption of HTTP/2 is around 45% of the websites in the world, and HTTP/3 is still a draft.</p>
<p>The most important feature of HTTP we need to keep in mind as developers is that it is a <em>stateless</em> protocol. This means that the protocol doesn't require the server to keep track of the state of the communication between requests, basically leaving session management to the developer of the service itself.</p>
<p>Session management is crucial nowadays because you usually want to have an authentication layer in front of a service, where a user provides credentials and accesses some private data. It is, however, useful in other contexts such as visual preferences or choices made by the user and re-used in later accesses to the same website. Typical solutions to the session management problem of HTTP involve the use of cookies or session tokens.</p>
<h3 id="https">HTTPS<a class="headerlink" href="#https" title="Permanent link">¶</a></h3>
<p>Security has become a very important word in recent years, and with a reason. The amount of sensitive data we exchange on the Internet or store on digital devices is increasing exponentially, but unfortunately so is the number of malicious attackers and the level of damage they can cause with their actions. The HTTP protocol is inherently</p>
<p>HTTP is inherently insecure, being a plain text communication between two servers that usually happens on a completely untrustable network such as the Internet. While security wasn't an issue when the protocol was initially conceived, it is nowadays a problem of paramount importance, as we exchange private information, often vital for people's security or for businesses. We need to be sure we are sending information to the correct server and that the data we send cannot be intercepted.</p>
<p>HTTPS solves both the problem of tampering and eavesdropping, encrypting HTTP with the Transport Layer Security (TLS) protocol, that also enforces the usage of digital certificates, issued by a trusted authority. At the time of writing, approximately 80% of websites loaded by Firefox use HTTPS by default. When a server receives an HTTPS connection and transforms it into an HTTP one it is usually said that it <em>terminates TLS</em> (or SSL, the old name of TLS).</p>
<h3 id="websocket">WebSocket<a class="headerlink" href="#websocket" title="Permanent link">¶</a></h3>
<p>One great disadvantage of HTTP is that communication is always initiated by the client and that the server can send data only when this is explicitly requested. Polling can be implemented to provide an initial solution, but it cannot guarantee the performances of proper full-duplex communication, where a channel is kept open between server and client and both can send data without being requested. Such a channel is provided by the WebSocket protocol.</p>
<p>WebSocket is a killer technology for applications like online gaming, real-time feeds like financial tickers or sports news, or multimedia communication like conferencing or remote education.</p>
<p>It is important to understand that WebSocket is not HTTP, and can exist without it. It is also true that this new protocol was designed to be used on top of an existing HTTP connection, so a WebSocket communication is often found in parts of a Web page, which was originally retrieved using HTTP in the first place.</p>
<h2 id="implementing-a-service-over-http">Implementing a service over HTTP<a class="headerlink" href="#implementing-a-service-over-http" title="Permanent link">¶</a></h2>
<p>Let's finally start discussing bits and bytes. The starting point for our journey is a service over HTTP, which means there is an HTTP request-response exchange. As an example, let us consider a GET request, the simplest of the HTTP methods.</p>
<div class="highlight"><pre><span></span><code><span class="nf">GET</span> <span class="nn">/</span> <span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span>
<span class="na">Host</span><span class="o">:</span> <span class="l">localhost</span>
<span class="na">User-Agent</span><span class="o">:</span> <span class="l">curl/7.65.3</span>
<span class="na">Accept</span><span class="o">:</span> <span class="l">*/*</span>
</code></pre></div>
<p>As you can see, the client is sending a pure text message to the server, with the format specified by the HTTP protocol. The first line contains the method name (<code>GET</code>), the URL (<code>/</code>) and the protocol we are using, including its version (<code>HTTP/1.1</code>). The remaining lines are called <em>headers</em> and contain metadata that can help the server to manage the request. The complete value of the <code>Host</code> header is in this case <code>localhost:80</code>, but as the standard port for HTTP services is 80, we don't need to specify it.</p>
<p>If the server <code>localhost</code> is <em>serving</em> HTTP (i.e. running some software that understands HTTP) on port 80 the response we might get is something similar to</p>
<div class="highlight"><pre><span></span><code><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.0</span> <span class="m">200</span> <span class="ne">OK</span>
<span class="na">Date</span><span class="o">:</span> <span class="l">Mon, 10 Feb 2020 08:41:33 GMT</span>
<span class="na">Content-type</span><span class="o">:</span> <span class="l">text/html</span>
<span class="na">Content-Length</span><span class="o">:</span> <span class="l">26889</span>
<span class="na">Last-Modified</span><span class="o">:</span> <span class="l">Mon, 10 Feb 2020 08:41:27 GMT</span>
<span class="cp"><!DOCTYPE HTML></span>
<span class="p"><</span><span class="nt">html</span><span class="p">></span>
...
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</code></pre></div>
<p>As happened for the request, the response is a text message, formatted according to the standard. The first line mentions the protocol and the status of the request (<code>200</code> in this case, that means success), while the following lines contain metadata in various headers. Finally, after an empty line, the message contains the resource the client asked for, the source code of the base URL of the website in this case. Since this HTML page probably contains references to other resources like CSS, JS, images, and so on, the browser will send several other requests to gather all the data it needs to properly show the page to the user.</p>
<p>So, the first problem we have is that of implementing a server that understands this protocol and sends a proper response when it receives an HTTP request. We should try to load the requested resource and return either a success (HTTP 200) if we can find it, or a failure (HTTP 404) if we can't.</p>
<h2 id="1-sockets-and-parsers">1 Sockets and parsers<a class="headerlink" href="#1-sockets-and-parsers" title="Permanent link">¶</a></h2>
<h3 id="11-rationale">1.1 Rationale<a class="headerlink" href="#11-rationale" title="Permanent link">¶</a></h3>
<p>TCP/IP is a network protocol that works with <em>sockets</em>. A socket is a tuple of an IP address (unique in the network) and a port (unique for a specific IP address) that the computer uses to communicate with others. A socket is a file-like object in an operating system, that can be thus <em>opened</em> and <em>closed</em>, and that we can <em>read</em> from or <em>write</em> to. Socket programming is a pretty low-level approach to the network, but you need to be aware that every software in your computer that provides network access has ultimately to deal with sockets (most probably through some library, though).</p>
<p>Since we are building things from the ground up, let's implement a small Python program that opens a socket connection, receives an HTTP request, and sends an HTTP response. As port 80 is a "low port" (a number smaller than 1024), we usually don't have permissions to open sockets there, so I will use port 8080. This is not a problem for now, as HTTP can be served on any port.</p>
<h3 id="12-implementation">1.2 Implementation<a class="headerlink" href="#12-implementation" title="Permanent link">¶</a></h3>
<p>Create the file <code>server.py</code> and type this code. Yes, <strong>type it</strong>, don't just copy and paste, you will not learn anything otherwise.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">socket</span>
<span class="c1">## Create a socket instance</span>
<span class="c1">## AF_INET: use IP protocol version 4</span>
<span class="c1">## SOCK_STREAM: full-duplex byte stream</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span>
<span class="c1">## Allow reuse of addresses</span>
<span class="n">s</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1">## Bind the socket to any address, port 8080, and listen</span>
<span class="n">s</span><span class="o">.</span><span class="n">bind</span><span class="p">((</span><span class="s1">''</span><span class="p">,</span> <span class="mi">8080</span><span class="p">))</span>
<span class="n">s</span><span class="o">.</span><span class="n">listen</span><span class="p">()</span>
<span class="c1">## Serve forever</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="c1"># Accept the connection</span>
<span class="n">conn</span><span class="p">,</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span>
<span class="c1"># Receive data from this socket using a buffer of 1024 bytes</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span>
<span class="c1"># Print out the data</span>
<span class="nb">print</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">))</span>
<span class="c1"># Close the connection</span>
<span class="n">conn</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</code></pre></div>
<p>This little program accepts a connection on port 8080 and prints the received data on the terminal. You can test it executing it and then running <code>curl localhost:8080</code> in another terminal. You should see something like</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>python3<span class="w"> </span>server.py<span class="w"> </span>
GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1
Host:<span class="w"> </span>localhost:8080
User-Agent:<span class="w"> </span>curl/7.65.3
Accept:<span class="w"> </span>*/*
</code></pre></div>
<p>The server keeps running the code in the <code>while</code> loop, so if you want to terminate it you have to do it with Ctrl+C. So far so good, but this is not an HTTP server yet, as it sends no response; you should actually receive an error message from curl that says <code>curl: (52) Empty reply from server</code>.</p>
<p>Sending back a standard response is very simple, we just need to call <code>conn.sendall</code> passing the raw bytes. A minimal HTTP response contains the protocol and the status, an empty line, and the actual content, for example</p>
<div class="highlight"><pre><span></span><code><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">200</span> <span class="ne">OK</span>
Hi there!
</code></pre></div>
<p>Our server becomes then</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">socket</span>
<span class="c1">## Create a socket instance</span>
<span class="c1">## AF_INET: use IP protocol version 4</span>
<span class="c1">## SOCK_STREAM: full-duplex byte stream</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span>
<span class="c1">## Allow reuse of addresses</span>
<span class="n">s</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1">## Bind the socket to any address, port 8080, and listen</span>
<span class="n">s</span><span class="o">.</span><span class="n">bind</span><span class="p">((</span><span class="s1">''</span><span class="p">,</span> <span class="mi">8080</span><span class="p">))</span>
<span class="n">s</span><span class="o">.</span><span class="n">listen</span><span class="p">()</span>
<span class="c1">## Serve forever</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="c1"># Accept the connection</span>
<span class="n">conn</span><span class="p">,</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span>
<span class="c1"># Receive data from this socket using a buffer of 1024 bytes</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span>
<span class="c1"># Print out the data</span>
<span class="nb">print</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">))</span>
<span class="n">conn</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="s2">"HTTP/1.1 200 OK</span><span class="se">\n\n</span><span class="s2">Hi there!</span><span class="se">\n</span><span class="s2">"</span><span class="p">,</span> <span class="s1">'utf-8'</span><span class="p">))</span>
<span class="c1"># Close the connection</span>
<span class="n">conn</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</code></pre></div>
<p>At this point, we are not really responding to the user's request, however. Try different curl command lines like <code>curl localhost:8080/index.html</code> or <code>curl localhost:8080/main.css</code> and you will always receive the same response. We should try to find the resource the user is asking for and send that back in the response content.</p>
<p>This version of the HTTP server properly extracts the resource and tries to load it from the current directory, returning either a success of a failure</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">socket</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="c1">## Create a socket instance</span>
<span class="c1">## AF_INET: use IP protocol version 4</span>
<span class="c1">## SOCK_STREAM: full-duplex byte stream</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span>
<span class="c1">## Allow reuse of addresses</span>
<span class="n">s</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1">## Bind the socket to any address, port 8080, and listen</span>
<span class="n">s</span><span class="o">.</span><span class="n">bind</span><span class="p">((</span><span class="s1">''</span><span class="p">,</span> <span class="mi">8080</span><span class="p">))</span>
<span class="n">s</span><span class="o">.</span><span class="n">listen</span><span class="p">()</span>
<span class="n">HEAD_200</span> <span class="o">=</span> <span class="s2">"HTTP/1.1 200 OK</span><span class="se">\n\n</span><span class="s2">"</span>
<span class="n">HEAD_404</span> <span class="o">=</span> <span class="s2">"HTTP/1.1 404 Not Found</span><span class="se">\n\n</span><span class="s2">"</span>
<span class="c1">## Serve forever</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="c1"># Accept the connection</span>
<span class="n">conn</span><span class="p">,</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span>
<span class="c1"># Receive data from this socket using a buffer of 1024 bytes</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span>
<span class="n">request</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">)</span>
<span class="c1"># Print out the data</span>
<span class="nb">print</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="n">resource</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="sa">r</span><span class="s1">'GET /(.*) HTTP'</span><span class="p">,</span> <span class="n">request</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">resource</span><span class="p">,</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">content</span> <span class="o">=</span> <span class="n">HEAD_200</span> <span class="o">+</span> <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Resource </span><span class="si">{}</span><span class="s1"> correctly served'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">resource</span><span class="p">))</span>
<span class="k">except</span> <span class="ne">FileNotFoundError</span><span class="p">:</span>
<span class="n">content</span> <span class="o">=</span> <span class="n">HEAD_404</span> <span class="o">+</span> <span class="s2">"Resource /</span><span class="si">{}</span><span class="s2"> cannot be found</span><span class="se">\n</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">resource</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Resource </span><span class="si">{}</span><span class="s1"> cannot be loaded'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">resource</span><span class="p">))</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'--------------------'</span><span class="p">)</span>
<span class="n">conn</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="n">content</span><span class="p">,</span> <span class="s1">'utf-8'</span><span class="p">))</span>
<span class="c1"># Close the connection</span>
<span class="n">conn</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</code></pre></div>
<p>As you can see this implementation is extremely simple. If you create a simple local file named <code>index.html</code> with this content</p>
<div class="highlight"><pre><span></span><code><span class="p"><</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">title</span><span class="p">></span>This is my page<span class="p"></</span><span class="nt">title</span><span class="p">></span>
<span class="p"><</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">"stylesheet"</span> <span class="na">href</span><span class="o">=</span><span class="s">"main.css"</span><span class="p">></span>
<span class="p"></</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">html</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span>Some random content<span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</code></pre></div>
<p>and run <code>curl localhost:8080/index.html</code> you will see the content of the file. At this point, you can even use your browser to open <code>http://localhost:8080/index.html</code> and you will see the title of the page and the content. A Web browser is a software capable of sending HTTP requests and of interpreting the content of the responses if this is HTML (and many other file types like images or videos), so it can <em>render</em> the content of the message. The browser is also responsible of retrieving the missing resources needed for the rendering, so when you provide links to style sheets or JS scripts with the <code><link></code> or the <code><script></code> tags in the HTML code of a page, you are instructing the browser to send an HTTP GET request for those files as well.</p>
<p>The output of <code>server.py</code> when I access <code>http://localhost:8080/index.html</code> is</p>
<div class="highlight"><pre><span></span><code><span class="nf">GET</span> <span class="nn">/index.html</span> <span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span>
<span class="na">Host</span><span class="o">:</span> <span class="l">localhost:8080</span>
<span class="na">User-Agent</span><span class="o">:</span> <span class="l">Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0</span>
<span class="na">Accept</span><span class="o">:</span> <span class="l">text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8</span>
<span class="na">Accept-Language</span><span class="o">:</span> <span class="l">en-GB,en;q=0.5</span>
<span class="na">Accept-Encoding</span><span class="o">:</span> <span class="l">gzip, deflate</span>
<span class="na">Connection</span><span class="o">:</span> <span class="l">keep-alive</span>
<span class="na">Upgrade-Insecure-Requests</span><span class="o">:</span> <span class="l">1</span>
<span class="na">Pragma</span><span class="o">:</span> <span class="l">no-cache</span>
<span class="na">Cache-Control</span><span class="o">:</span> <span class="l">no-cache</span>
Resource index.html correctly served
--------------------
GET /main.css HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0
Accept: text/css,*/*;q=0.1
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer: http://localhost:8080/index.html
Pragma: no-cache
Cache-Control: no-cache
Resource main.css cannot be loaded
--------------------
GET /favicon.ico HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0
Accept: image/webp,*/*
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Resource favicon.ico cannot be loaded
--------------------
</code></pre></div>
<p>As you can see the browser sends rich HTTP requests, with a lot of headers, automatically requesting the CSS file mentioned in the HTML code and automatically trying to retrieve a favicon image.</p>
<h3 id="13-resources">1.3 Resources<a class="headerlink" href="#13-resources" title="Permanent link">¶</a></h3>
<p>These resources provide more detailed information on the topics discussed in this section</p>
<ul>
<li><a href="https://docs.python.org/3/howto/sockets.html">Python 3 Socket Programming HOWTO</a></li>
<li><a href="https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5">HTTP/1.1 Request format</a></li>
<li><a href="https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html#sec6">HTTP/1.1 Response format</a></li>
<li>The source code of this example is available <a href="https://github.com/lgiordani/dissecting-a-web-stack-code/tree/master/1_sockets_and_parsers">here</a></li>
</ul>
<h3 id="14-issues">1.4 Issues<a class="headerlink" href="#14-issues" title="Permanent link">¶</a></h3>
<p>It gives a certain dose of satisfaction to build something from scratch and discover that it works smoothly with full-fledged software like the browser you use every day. I also think it is very interesting to discover that technologies like HTTP, that basically run the world nowadays, are at their core very simple.</p>
<p>That said, there are many features of HTTP that we didn't cover with our simple socket programming. For starters, HTTP/1.0 introduced other methods after GET, such as POST that is of paramount importance for today's websites, where users keep sending information to servers through forms. To implement all 9 HTTP methods we need to properly parse the incoming request and add relevant functions to our code.</p>
<p>At this point, however, you might notice that we are dealing a lot with low-level details of the protocol, which is usually not the core of our business. When we build a service over HTTP we believe that we have the knowledge to properly implement some code that can simplify a certain process, be it searching for other websites, shopping for books or sharing pictures with friends. We don't want to spend our time understanding the subtleties of the TCP/IP sockets and writing parsers for request-response protocols. It is nice to see how these technologies work, but on a daily basis, we need to focus on something at a higher level.</p>
<p>The situation of our small HTTP server is possibly worsened by the fact that HTTP is a stateless protocol. The protocol doesn't provide any way to connect two successive requests, thus keeping track of the <em>state</em> of the communication, which is the cornerstone of modern Internet. Every time we authenticate on a website and we want to visit other pages we need the server to remember who we are, and this implies keeping track of the state of the connection.</p>
<p>Long story short: to work as a proper HTTP server, our code should at this point implement all HTTP methods and cookies management. We also need to support other protocols like Websockets. These are all but trivial tasks, so we definitely need to add some component to the whole system that lets us focus on the business logic and not on the low-level details of application protocols.</p>
<h2 id="2-web-framework">2 Web framework<a class="headerlink" href="#2-web-framework" title="Permanent link">¶</a></h2>
<h3 id="21-rationale">2.1 Rationale<a class="headerlink" href="#21-rationale" title="Permanent link">¶</a></h3>
<p>Enter the Web framework!</p>
<p>As I discussed many times (see <a href="https://www.thedigitalcatonline.com/blog/2018/12/20/cabook/">the book on clean architectures</a> or <a href="https://www.thedigitalcatonline.com/blog/2016/11/14/clean-architectures-in-python-a-step-by-step-example/">the relative post</a>) the role of the Web framework is that of <em>converting HTTP requests into function calls</em>, and function return values into HTTP responses. The framework's true nature is that of a layer that connects a working business logic to the Web, through HTTP and related protocols. The framework takes care of session management for us and maps URLs to functions, allowing us to focus on the application logic.</p>
<p>In the grand scheme of an HTTP service, this is what the framework is supposed to do. Everything the framework provides out of this scope, like layers to access DBs, template engines, and interfaces to other systems, is an addition that you, as a programmer, may find useful, but is not in principle part of the reason why we added the framework to the system. We add the framework because it acts as a layer between our business logic and HTTP.</p>
<h3 id="22-implementation">2.2 Implementation<a class="headerlink" href="#22-implementation" title="Permanent link">¶</a></h3>
<p>Thanks to Miguel Gringberg and his <a href="https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world">amazing Flask mega-tutorial</a> I can set up Flask in seconds. I will not run through the tutorial here, as you can follow it on Miguel's website. I will only use the content of the first article (out of 23!) to create an extremely simple "Hello, world" application.</p>
<p>To run the following example you will need a virtual environment and you will have to <code>pip install flask</code>. Follow Miguel's tutorial if you need more details on this.</p>
<p>The <code>app/__init__.py</code> file is</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">Flask</span>
<span class="n">application</span> <span class="o">=</span> <span class="n">Flask</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span>
<span class="kn">from</span> <span class="nn">app</span> <span class="kn">import</span> <span class="n">routes</span>
</code></pre></div>
<p>and the <code>app/routes.py</code> file is</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">app</span> <span class="kn">import</span> <span class="n">application</span>
<span class="nd">@application</span><span class="o">.</span><span class="n">route</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span>
<span class="nd">@application</span><span class="o">.</span><span class="n">route</span><span class="p">(</span><span class="s1">'/index'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">index</span><span class="p">():</span>
<span class="k">return</span> <span class="s2">"Hello, world!"</span>
</code></pre></div>
<p>You can already see here the power of a framework in action. We defined an <code>index</code> function and connected it with two different URLs (<code>/</code> and <code>/index</code>) in 3 lines of Python. This leaves us time and energy to properly work on the business logic, that in this case is a revolutionary "Hello, world!". Nobody ever did this before.</p>
<p>Finally, the <code>service.py</code> file is</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">app</span> <span class="kn">import</span> <span class="n">application</span>
</code></pre></div>
<p>Flask comes with a so-called development web server (do these words ring any bell now?) that we can run on a terminal</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span><span class="nv">FLASK_APP</span><span class="o">=</span>service.py<span class="w"> </span>flask<span class="w"> </span>run
<span class="w"> </span>*<span class="w"> </span>Serving<span class="w"> </span>Flask<span class="w"> </span>app<span class="w"> </span><span class="s2">"service.py"</span>
<span class="w"> </span>*<span class="w"> </span>Environment:<span class="w"> </span>production
<span class="w"> </span>WARNING:<span class="w"> </span>This<span class="w"> </span>is<span class="w"> </span>a<span class="w"> </span>development<span class="w"> </span>server.<span class="w"> </span>Do<span class="w"> </span>not<span class="w"> </span>use<span class="w"> </span>it<span class="w"> </span><span class="k">in</span><span class="w"> </span>a<span class="w"> </span>production<span class="w"> </span>deployment.
<span class="w"> </span>Use<span class="w"> </span>a<span class="w"> </span>production<span class="w"> </span>WSGI<span class="w"> </span>server<span class="w"> </span>instead.
<span class="w"> </span>*<span class="w"> </span>Debug<span class="w"> </span>mode:<span class="w"> </span>off
<span class="w"> </span>*<span class="w"> </span>Running<span class="w"> </span>on<span class="w"> </span>http://127.0.0.1:5000/<span class="w"> </span><span class="o">(</span>Press<span class="w"> </span>CTRL+C<span class="w"> </span>to<span class="w"> </span>quit<span class="o">)</span>
</code></pre></div>
<p>You can now visit the given URL with your browser and see that everything works properly. Remember that 127.0.0.1 is the special IP address that refers to "this computer"; the name <code>localhost</code> is usually created by the operating system as an alias for that, so the two are interchangeable. As you can see the standard port for Flask's development server is 5000, so you have to mention it explicitly, otherwise your browser would try to access port 80 (the default HTTP one). When you connect with the browser you will see some log messages about the HTTP requests</p>
<div class="highlight"><pre><span></span><code><span class="m">127</span>.0.0.1<span class="w"> </span>-<span class="w"> </span>-<span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020<span class="w"> </span><span class="m">14</span>:54:27<span class="o">]</span><span class="w"> </span><span class="s2">"GET / HTTP/1.1"</span><span class="w"> </span><span class="m">200</span><span class="w"> </span>-
<span class="m">127</span>.0.0.1<span class="w"> </span>-<span class="w"> </span>-<span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020<span class="w"> </span><span class="m">14</span>:54:28<span class="o">]</span><span class="w"> </span><span class="s2">"GET /favicon.ico HTTP/1.1"</span><span class="w"> </span><span class="m">404</span><span class="w"> </span>-
</code></pre></div>
<p>You can recognise both now, as those are the same request we got with our little server in the previous part of the article.</p>
<h3 id="23-resources">2.3 Resources<a class="headerlink" href="#23-resources" title="Permanent link">¶</a></h3>
<p>These resources provide more detailed information on the topics discussed in this section</p>
<ul>
<li><a href="https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world">Miguel Gringberg's amazing Flask mega-tutorial</a></li>
<li><a href="https://en.wikipedia.org/wiki/Localhost">What is localhost</a></li>
<li>The source code of this example is available <a href="https://github.com/lgiordani/dissecting-a-web-stack-code/tree/master/2_web_framework">here</a></li>
</ul>
<h3 id="24-issues">2.4 Issues<a class="headerlink" href="#24-issues" title="Permanent link">¶</a></h3>
<p>Apparently, we solved all our problems, and many programmers just stop here. They learn how to use the framework (which is a big achievement!), but as we will shortly discover, this is not enough for a production system. Let's have a closer look at the output of the Flask server. It clearly says, among other things</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span>WARNING:<span class="w"> </span>This<span class="w"> </span>is<span class="w"> </span>a<span class="w"> </span>development<span class="w"> </span>server.<span class="w"> </span>Do<span class="w"> </span>not<span class="w"> </span>use<span class="w"> </span>it<span class="w"> </span><span class="k">in</span><span class="w"> </span>a<span class="w"> </span>production<span class="w"> </span>deployment.
<span class="w"> </span>Use<span class="w"> </span>a<span class="w"> </span>production<span class="w"> </span>WSGI<span class="w"> </span>server<span class="w"> </span>instead.
</code></pre></div>
<p>The main issue we have when we deal with any production system is represented by performances. Think about what we do with JavaScript when we minimise the code: we consciously obfuscate the code in order to make the file smaller, but this is done for the sole purpose of making the file faster to retrieve.</p>
<p>For HTTP servers the story is not very different. The Web framework usually provides a development Web server, as Flask does, which properly implements HTTP, but does it in a very inefficient way. For starters, this is a <em>blocking</em> framework, which means that if our request takes seconds to be served (for example because the endpoint retrieves data from a very slow database), any other request will have to wait to be served in a queue. That ultimately means that the user will see a spinner in the browser's tab and just shake their head thinking that we can't build a modern website. Other performances concerns might be connected with memory management or disk caches, but in general, we are safe to say that this web server cannot handle any production load (i.e. multiple users accessing the web site at the same time and expecting good quality of service).</p>
<p>This is hardly surprising. After all, we didn't want to deal with TCP/IP connections to focus on our business, so we delegated this to other coders who maintain the framework. The framework's authors, in turn, want to focus on things like middleware, routes, proper handling of HTTP methods, and so on. They don't want to spend time trying to optimise the performances of the "multi-user" experience. This is especially true in the Python world (and somehow less true for Node.js, for example): Python is not heavily concurrency-oriented, and both the style of programming and the performances are not favouring fast, non-blocking applications. This is changing lately, with async and improvements in the interpreter, but I leave this for another post.</p>
<p>So, now that we have a full-fledged HTTP service, we need to make it so fast that users won't even notice this is not running locally on their computer.</p>
<h2 id="3-concurrency-and-facades">3 Concurrency and façades<a class="headerlink" href="#3-concurrency-and-facades" title="Permanent link">¶</a></h2>
<h3 id="31-rationale">3.1 Rationale<a class="headerlink" href="#31-rationale" title="Permanent link">¶</a></h3>
<p>Well, whenever you have performance issues, just go for concurrency. Now you have many problems!
(see <a href="https://twitter.com/davidlohr/status/288786300067270656?lang=en">here</a>)</p>
<p>Yes, concurrency solves many problems and it's the source of just as much, so we need to find a way to use it in the safest and less complicated way. We basically might want to add a layer that runs the framework in some concurrent way, without requiring us to change anything in the framework itself.</p>
<p>And whenever you have to homogenise different things just create a layer of indirection. This solves any problem but one. (see <a href="https://en.wikipedia.org/wiki/Fundamental_theorem_of_software_engineering">here</a>)</p>
<p>So we need to create a layer that runs our service in a concurrent way, but we also want to keep it detached from the specific implementation of the service, that is independent of the framework or library that we are using.</p>
<h3 id="32-implementation">3.2 Implementation<a class="headerlink" href="#32-implementation" title="Permanent link">¶</a></h3>
<p>In this case, the solution is that of giving a <em>specification</em> of the API that web frameworks have to expose, in order to be usable by independent third-party components. In the Python world, this set of rules has been named WSGI, the Web Server Gateway Interface, but such interfaces exist for other languages such as Java or Ruby. The "gateway" mentioned here is the part of the system outside the framework, which in this discussion is the part that deals with production performances. Through WSGI we are defining a way for frameworks to expose a common interface, leaving people interested in concurrency free to implement something independently.</p>
<p>If the framework is compatible with the gateway interface, we can add software that deals with concurrency and uses the framework through the compatibility layer. Such a component is a production-ready HTTP server, and two common choices in the Python world are Gunicorn and uWSGI.</p>
<p>Production-ready HTTP server means that the software understands HTTP as the development server already did, but at the same time pushes performances in order to sustain a bigger workload, and as we said before this is done through concurrency.</p>
<p>Flask is compatible with WSGI, so we can make it work with Gunicorn. To install it in our virtual environment run <code>pip install gunicorn</code> and set it up creating a file names <code>wsgi.py</code> with the following content</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">app</span> <span class="kn">import</span> <span class="n">application</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="n">application</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
</code></pre></div>
<p>To run Gunicorn specify the number of concurrent instances and the external port</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>gunicorn<span class="w"> </span>--workers<span class="w"> </span><span class="m">3</span><span class="w"> </span>--bind<span class="w"> </span><span class="m">0</span>.0.0.0:8000<span class="w"> </span>wsgi
<span class="o">[</span><span class="m">2020</span>-02-12<span class="w"> </span><span class="m">18</span>:39:07<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">13393</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Starting<span class="w"> </span>gunicorn<span class="w"> </span><span class="m">20</span>.0.4
<span class="o">[</span><span class="m">2020</span>-02-12<span class="w"> </span><span class="m">18</span>:39:07<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">13393</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Listening<span class="w"> </span>at:<span class="w"> </span>http://0.0.0.0:8000<span class="w"> </span><span class="o">(</span><span class="m">13393</span><span class="o">)</span>
<span class="o">[</span><span class="m">2020</span>-02-12<span class="w"> </span><span class="m">18</span>:39:07<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">13393</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Using<span class="w"> </span>worker:<span class="w"> </span>sync
<span class="o">[</span><span class="m">2020</span>-02-12<span class="w"> </span><span class="m">18</span>:39:07<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">13396</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Booting<span class="w"> </span>worker<span class="w"> </span>with<span class="w"> </span>pid:<span class="w"> </span><span class="m">13396</span>
<span class="o">[</span><span class="m">2020</span>-02-12<span class="w"> </span><span class="m">18</span>:39:07<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">13397</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Booting<span class="w"> </span>worker<span class="w"> </span>with<span class="w"> </span>pid:<span class="w"> </span><span class="m">13397</span>
<span class="o">[</span><span class="m">2020</span>-02-12<span class="w"> </span><span class="m">18</span>:39:07<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">13398</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Booting<span class="w"> </span>worker<span class="w"> </span>with<span class="w"> </span>pid:<span class="w"> </span><span class="m">13398</span>
</code></pre></div>
<p>As you can see, Gunicorn has the concept of <em>workers</em> which are a generic way to express concurrency. Specifically, Gunicorn implements a pre-fork worker model, which means that it (pre)creates a different Unix process for each worker. You can check this running <code>ps</code></p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>ps<span class="w"> </span>ax<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>gunicorn
<span class="m">14919</span><span class="w"> </span>pts/1<span class="w"> </span>S+<span class="w"> </span><span class="m">0</span>:00<span class="w"> </span>~/venv3/bin/python3<span class="w"> </span>~/venv3/bin/gunicorn<span class="w"> </span>--workers<span class="w"> </span><span class="m">3</span><span class="w"> </span>--bind<span class="w"> </span><span class="m">0</span>.0.0.0:8000<span class="w"> </span>wsgi
<span class="m">14922</span><span class="w"> </span>pts/1<span class="w"> </span>S+<span class="w"> </span><span class="m">0</span>:00<span class="w"> </span>~/venv3/bin/python3<span class="w"> </span>~/venv3/bin/gunicorn<span class="w"> </span>--workers<span class="w"> </span><span class="m">3</span><span class="w"> </span>--bind<span class="w"> </span><span class="m">0</span>.0.0.0:8000<span class="w"> </span>wsgi
<span class="m">14923</span><span class="w"> </span>pts/1<span class="w"> </span>S+<span class="w"> </span><span class="m">0</span>:00<span class="w"> </span>~/venv3/bin/python3<span class="w"> </span>~/venv3/bin/gunicorn<span class="w"> </span>--workers<span class="w"> </span><span class="m">3</span><span class="w"> </span>--bind<span class="w"> </span><span class="m">0</span>.0.0.0:8000<span class="w"> </span>wsgi
<span class="m">14924</span><span class="w"> </span>pts/1<span class="w"> </span>S+<span class="w"> </span><span class="m">0</span>:00<span class="w"> </span>~/venv3/bin/python3<span class="w"> </span>~/venv3/bin/gunicorn<span class="w"> </span>--workers<span class="w"> </span><span class="m">3</span><span class="w"> </span>--bind<span class="w"> </span><span class="m">0</span>.0.0.0:8000<span class="w"> </span>wsgi
</code></pre></div>
<p>Using processes is just one of the two ways to implement concurrency in a Unix system, the other being using threads. The benefits and demerits of each solution are outside the scope of this post, however. For the time being just remember that you are dealing with multiple workers that process incoming requests asynchronously, thus implementing a non-blocking server, ready to accept multiple connections.</p>
<h3 id="33-resources">3.3 Resources<a class="headerlink" href="#33-resources" title="Permanent link">¶</a></h3>
<p>These resources provide more detailed information on the topics discussed in this section</p>
<ul>
<li>The <a href="https://wsgi.readthedocs.io/en/latest/index.html">WSGI official documentation</a> and the <a href="https://en.wikipedia.org/wiki/Web_Server_Gateway_Interface">Wikipedia page
</a></li>
<li>The homepages of <a href="https://gunicorn.org/">Gunicorn</a> and <a href="https://uwsgi-docs.readthedocs.io/en/latest/">uWSGI</a></li>
<li>A good entry point for your journey into the crazy world of concurrency: <a href="https://en.wikipedia.org/wiki/Multithreading_(computer_architecture)">multithreading</a>.</li>
<li>The source code of this example is available <a href="https://github.com/lgiordani/dissecting-a-web-stack-code/tree/master/3_concurrency_and_facades">here</a></li>
</ul>
<h3 id="34-issues">3.4 Issues<a class="headerlink" href="#34-issues" title="Permanent link">¶</a></h3>
<p>Using a Gunicorn we have now a production-ready HTTP server, and apparently implemented everything we need. There are still many considerations and missing pieces, though.</p>
<h4 id="performances-again">Performances (again)<a class="headerlink" href="#performances-again" title="Permanent link">¶</a></h4>
<p>Are 3 workers enough to sustain the load of our new killer mobile application? We expect thousands of visitors per minute, so maybe we should add some. But while we increase the amount of workers, we have to keep in mind that the machine we are using has a finite amount of CPU power and memory. So, once again, we have to focus on performances, and in particular on scalability: how can we keep adding workers without having to stop the application, replace the machine with a more powerful one, and restart the service?</p>
<h4 id="embrace-change">Embrace change<a class="headerlink" href="#embrace-change" title="Permanent link">¶</a></h4>
<p>This is not the only problem we have to face in production. An important aspect of technology is that it changes over time, as new and (hopefully) better solutions become widespread. We usually design systems dividing them as much as possible into communicating layers exactly because we want to be free to replace a layer with something else, be it a simpler component or a more advanced one, one with better performances or maybe just a cheaper one. So, once again, we want to be able to evolve the underlying system keeping the same interface, exactly as we did in the case of web frameworks.</p>
<h4 id="https_1">HTTPS<a class="headerlink" href="#https_1" title="Permanent link">¶</a></h4>
<p>Another missing part of the system is HTTPS. Gunicorn and uWSGI do not understand the HTTPS protocol, so we need something in front of them that will deal with the "S" part of the protocol, leaving the "HTTP" part to the internal layers.</p>
<h4 id="load-balancers">Load balancers<a class="headerlink" href="#load-balancers" title="Permanent link">¶</a></h4>
<p>In general, a <em>load balancer</em> is just a component in a system that distributes work among a pool of workers. Gunicorn is already distributing load among its workers, so this is not a new concept, but we generally want to do it on a bigger level, among machines or among entire systems. Load balancing can be hierarchical and be structured on many levels. We can also assign more importance to some components of the system, flagging them as ready to accept more load (for example because their hardware is better). Load balancers are extremely important in network services, and the definition of load can be extremely different from system to system: generally speaking, in a Web service the number of connections is the standard measure of the load, as we assume that on average all connections bring the same amount of work to the system.</p>
<h4 id="reverse-proxies">Reverse proxies<a class="headerlink" href="#reverse-proxies" title="Permanent link">¶</a></h4>
<p>Load balancers are forward proxies, as they allow a client to contact any server in a pool. At the same time, a <em>reverse proxy</em> allows a client to retrieve data produced by several systems through the same entry point. Reverse proxies are a perfect way to route HTTP requests to sub-systems that can be implemented with different technologies. For example, you might want to have part of the system implemented with Python, using Django and Postgres, and another part served by an AWS Lambda function written in Go and connected with a non-relational database such as DynamoDB. Usually, in HTTP services this choice is made according to the URL (for example routing every URL that begins with <code>/api/</code>).</p>
<h4 id="logic">Logic<a class="headerlink" href="#logic" title="Permanent link">¶</a></h4>
<p>We also want a layer that can implement a certain amount of logic, to manage simple rules that are not related to the service we implemented. A typical example is that of HTTP redirections: what happens if a user accesses the service with an <code>http://</code> prefix instead of <code>https://</code>? The correct way to deal with this is through an HTTP 301 code, but you don't want such a request to reach your framework, wasting resources for such a simple task.</p>
<h2 id="4-the-web-server">4 The Web server<a class="headerlink" href="#4-the-web-server" title="Permanent link">¶</a></h2>
<h3 id="41-rationale">4.1 Rationale<a class="headerlink" href="#41-rationale" title="Permanent link">¶</a></h3>
<p>The general label of <em>Web server</em> is given to software that performs the tasks we discussed. Two very common choices for this part of the system are nginx and Apache, two open source projects that are currently leading the market. With different technical approaches, they both implement all the features we discussed in the previous section (and many more).</p>
<h3 id="42-implementation">4.2 Implementation<a class="headerlink" href="#42-implementation" title="Permanent link">¶</a></h3>
<p>To test nginx without having to fight with the OS and install too many packages we can use Docker. Docker is useful to simulate a multi-machine environment, but it might also be your technology of choice for the actual production environment (AWS ECS works with Docker containers, for example).</p>
<p>The base configuration that we will run is very simple. One container will contain the Flask code and run the framework with Gunicorn, while the other container will run nginx. Gunicorn will serve HTTP on the internal port 8000, not exposed by Docker and thus not reachable from our browser, while nignx will expose port 80, the traditional HTTP port.</p>
<p>In the same directory of the file <code>wsgi.py</code>, create a <code>Dockerfile</code></p>
<div class="highlight"><pre><span></span><code><span class="k">FROM</span><span class="w"> </span><span class="s">python:3.6</span>
<span class="k">ADD</span><span class="w"> </span>app<span class="w"> </span>/app
<span class="k">ADD</span><span class="w"> </span>wsgi.py<span class="w"> </span>/
<span class="k">WORKDIR</span><span class="w"> </span><span class="s">.</span>
<span class="k">RUN</span><span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>flask<span class="w"> </span>gunicorn
<span class="k">EXPOSE</span><span class="w"> </span><span class="s">8000</span>
</code></pre></div>
<p>This starts from a Python Docker image, adds the <code>app</code> directory and the <code>wsgi.py</code> file, and installs Gunicorn. Now create a configuration for nginx in a file called <code>nginx.conf</code> in the same directory</p>
<div class="highlight"><pre><span></span><code><span class="k">server</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span>
<span class="w"> </span><span class="kn">server_name</span><span class="w"> </span><span class="s">localhost</span><span class="p">;</span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">http://application:8000/</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>This defines a server that listens on port 80 and that connects all the URL starting with <code>/</code> with a server called <code>application</code> on port 8000, which is the container running Gunicorn.</p>
<p>Last, create a file <code>docker-compose.yml</code> that will describe the configuration of the containers.</p>
<div class="highlight"><pre><span></span><code><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">"3.7"</span>
<span class="nt">services</span><span class="p">:</span>
<span class="w"> </span><span class="nt">application</span><span class="p">:</span>
<span class="w"> </span><span class="nt">build</span><span class="p">:</span>
<span class="w"> </span><span class="nt">context</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">.</span>
<span class="w"> </span><span class="nt">dockerfile</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Dockerfile</span>
<span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">gunicorn --workers 3 --bind 0.0.0.0:8000 wsgi</span>
<span class="w"> </span><span class="nt">expose</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">8000</span>
<span class="w"> </span><span class="nt">nginx</span><span class="p">:</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">nginx</span>
<span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">./nginx.conf:/etc/nginx/conf.d/default.conf</span>
<span class="w"> </span><span class="nt">ports</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">8080:80</span>
<span class="w"> </span><span class="nt">depends_on</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">application</span>
</code></pre></div>
<p>As you can see the name <code>application</code> that we mentioned in the nginx configuration file is not a magic string, but is the name we assigned to the Gunicorn container in the Docker Compose configuration. Please note that nginx listens on port 80 inside the container, but the port is published as 8080 on the host.</p>
<p>To create this infrastructure we need to install Docker Compose in our virtual environment through <code>pip install docker-compose</code>. I also created a file named <code>.env</code> with the name of the project</p>
<div class="highlight"><pre><span></span><code><span class="nv">COMPOSE_PROJECT_NAME</span><span class="o">=</span>service
</code></pre></div>
<p>At this point you can run Docker Compose with <code>docker-compose up -d</code></p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>up<span class="w"> </span>-d
Creating<span class="w"> </span>network<span class="w"> </span><span class="s2">"service_default"</span><span class="w"> </span>with<span class="w"> </span>the<span class="w"> </span>default<span class="w"> </span>driver
Creating<span class="w"> </span>service_application_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Creating<span class="w"> </span>service_nginx_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
</code></pre></div>
<p>If everything is working correctly, opening the browser and visiting <code>localhost:8080</code> should show you the HTML page Flask is serving.</p>
<p>Through <code>docker-compose logs</code> we can check what services are doing. We can recognise the output of Gunicorn in the logs of the service named <code>application</code></p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>logs<span class="w"> </span>application
Attaching<span class="w"> </span>to<span class="w"> </span>service_application_1
application_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">2020</span>-02-14<span class="w"> </span><span class="m">08</span>:35:42<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Starting<span class="w"> </span>gunicorn<span class="w"> </span><span class="m">20</span>.0.4
application_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">2020</span>-02-14<span class="w"> </span><span class="m">08</span>:35:42<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Listening<span class="w"> </span>at:<span class="w"> </span>http://0.0.0.0:8000<span class="w"> </span><span class="o">(</span><span class="m">1</span><span class="o">)</span>
application_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">2020</span>-02-14<span class="w"> </span><span class="m">08</span>:35:42<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Using<span class="w"> </span>worker:<span class="w"> </span>sync
application_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">2020</span>-02-14<span class="w"> </span><span class="m">08</span>:35:42<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">8</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Booting<span class="w"> </span>worker<span class="w"> </span>with<span class="w"> </span>pid:<span class="w"> </span><span class="m">8</span>
application_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">2020</span>-02-14<span class="w"> </span><span class="m">08</span>:35:42<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">9</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Booting<span class="w"> </span>worker<span class="w"> </span>with<span class="w"> </span>pid:<span class="w"> </span><span class="m">9</span>
application_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">2020</span>-02-14<span class="w"> </span><span class="m">08</span>:35:42<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">10</span><span class="o">]</span><span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Booting<span class="w"> </span>worker<span class="w"> </span>with<span class="w"> </span>pid:<span class="w"> </span><span class="m">10</span>
</code></pre></div>
<p>but the one we are mostly interested with now is the service named <code>nginx</code>, so let's follow the logs in real-time with <code>docker-compose logs -f nginx</code>. Refresh the <code>localhost</code> page you visited with the browser, and the container should output something like</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>logs<span class="w"> </span>-f<span class="w"> </span>nginx
Attaching<span class="w"> </span>to<span class="w"> </span>service_nginx_1
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">192</span>.168.192.1<span class="w"> </span>-<span class="w"> </span>-<span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:08:42:20<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span><span class="s2">"GET / HTTP/1.1"</span><span class="w"> </span><span class="m">200</span><span class="w"> </span><span class="m">13</span><span class="w"> </span><span class="s2">"-"</span><span class="w"> </span><span class="s2">"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0"</span><span class="w"> </span><span class="s2">"-"</span>
</code></pre></div>
<p>which is the standard log format of nginx. It shows the IP address of the client (<code>192.168.192.1</code>), the connection timestamp, the HTTP request and the response status code (200), plus other information on the client itself.</p>
<p>Let's now increase the number of services, to see the load balancing mechanism in action. To do this, first we need to change the log format of nginx to show the IP address of the machine that served the request. Change the <code>nginx.conf</code> file adding the <code>log_format</code> and <code>access_log</code> options</p>
<div class="highlight"><pre><span></span><code><span class="k">log_format</span><span class="w"> </span><span class="s">upstreamlog</span><span class="w"> </span><span class="s">'[</span><span class="nv">$time_local]</span><span class="w"> </span><span class="nv">$host</span><span class="w"> </span><span class="s">to:</span><span class="w"> </span><span class="nv">$upstream_addr:</span><span class="w"> </span><span class="nv">$request</span><span class="w"> </span><span class="nv">$status'</span><span class="p">;</span>
<span class="k">server</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span>
<span class="w"> </span><span class="kn">server_name</span><span class="w"> </span><span class="s">localhost</span><span class="p">;</span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">http://application:8000</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kn">access_log</span><span class="w"> </span><span class="s">/var/log/nginx/access.log</span><span class="w"> </span><span class="s">upstreamlog</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>The <code>$upstream_addr</code> variable is the one that contains the IP address of the server proxied by nginx. Now run <code>docker-compose down</code> to stop all containers and then <code>docker-compose up -d --scale application=3</code> to start them again</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>down
Stopping<span class="w"> </span>service_nginx_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Stopping<span class="w"> </span>service_application_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Removing<span class="w"> </span>service_nginx_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Removing<span class="w"> </span>service_application_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Removing<span class="w"> </span>network<span class="w"> </span>service_default
$<span class="w"> </span>docker-compose<span class="w"> </span>up<span class="w"> </span>-d<span class="w"> </span>--scale<span class="w"> </span><span class="nv">application</span><span class="o">=</span><span class="m">3</span>
Creating<span class="w"> </span>network<span class="w"> </span><span class="s2">"service_default"</span><span class="w"> </span>with<span class="w"> </span>the<span class="w"> </span>default<span class="w"> </span>driver
Creating<span class="w"> </span>service_application_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Creating<span class="w"> </span>service_application_2<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Creating<span class="w"> </span>service_application_3<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Creating<span class="w"> </span>service_nginx_1<span class="w"> </span>...<span class="w"> </span><span class="k">done</span>
</code></pre></div>
<p>As you can see, Docker Compose runs now 3 containers for the <code>application</code> service. If you open the logs stream and visit the page in the browser you will now see a slightly different output</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>logs<span class="w"> </span>-f<span class="w"> </span>nginx
Attaching<span class="w"> </span>to<span class="w"> </span>service_nginx_1
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:09:00:16<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">192</span>.168.240.4:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
</code></pre></div>
<p>where you can spot <code>to: 192.168.240.4:8000</code> which is the IP address of one of the application containers. Please note that the IP address you see might be different, as it depends on the Docker network settings. If you now visit the page again multiple times you should notice a change in the upstream address, something like</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>logs<span class="w"> </span>-f<span class="w"> </span>nginx
Attaching<span class="w"> </span>to<span class="w"> </span>service_nginx_1
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:09:00:16<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">192</span>.168.240.4:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:09:00:17<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">192</span>.168.240.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:09:00:17<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">192</span>.168.240.3:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:09:00:17<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">192</span>.168.240.4:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:09:00:17<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">192</span>.168.240.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
</code></pre></div>
<p>This shows that nginx is performing load balancing, but to tell the truth this is happening through Docker's DNS, and not by an explicit action performed by the web server. We can verify this accessing the nginx container and running <code>dig application</code> (you need to run <code>apt update</code> and <code>apt install dnsutils</code> to install <code>dig</code>)</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>nginx<span class="w"> </span>/bin/bash
root@99c2f348140e:/#<span class="w"> </span>apt<span class="w"> </span>update
root@99c2f348140e:/#<span class="w"> </span>apt<span class="w"> </span>install<span class="w"> </span>-y<span class="w"> </span>dnsutils
root@99c2f348140e:/#<span class="w"> </span>dig<span class="w"> </span>application
<span class="p">;</span><span class="w"> </span><<>><span class="w"> </span>DiG<span class="w"> </span><span class="m">9</span>.11.5-P4-5.1-Debian<span class="w"> </span><<>><span class="w"> </span>application
<span class="p">;;</span><span class="w"> </span>global<span class="w"> </span>options:<span class="w"> </span>+cmd
<span class="p">;;</span><span class="w"> </span>Got<span class="w"> </span>answer:
<span class="p">;;</span><span class="w"> </span>->>HEADER<span class="s"><<- opco</span>de:<span class="w"> </span>QUERY,<span class="w"> </span>status:<span class="w"> </span>NOERROR,<span class="w"> </span>id:<span class="w"> </span><span class="m">7221</span>
<span class="p">;;</span><span class="w"> </span>flags:<span class="w"> </span>qr<span class="w"> </span>rd<span class="w"> </span>ra<span class="p">;</span><span class="w"> </span>QUERY:<span class="w"> </span><span class="m">1</span>,<span class="w"> </span>ANSWER:<span class="w"> </span><span class="m">3</span>,<span class="w"> </span>AUTHORITY:<span class="w"> </span><span class="m">0</span>,<span class="w"> </span>ADDITIONAL:<span class="w"> </span><span class="m">0</span>
<span class="p">;;</span><span class="w"> </span>QUESTION<span class="w"> </span>SECTION:
<span class="p">;</span>application.<span class="w"> </span>IN<span class="w"> </span>A
<span class="p">;;</span><span class="w"> </span>ANSWER<span class="w"> </span>SECTION:
application.<span class="w"> </span><span class="m">600</span><span class="w"> </span>IN<span class="w"> </span>A<span class="w"> </span><span class="m">192</span>.168.240.2
application.<span class="w"> </span><span class="m">600</span><span class="w"> </span>IN<span class="w"> </span>A<span class="w"> </span><span class="m">192</span>.168.240.4
application.<span class="w"> </span><span class="m">600</span><span class="w"> </span>IN<span class="w"> </span>A<span class="w"> </span><span class="m">192</span>.168.240.3
<span class="p">;;</span><span class="w"> </span>Query<span class="w"> </span>time:<span class="w"> </span><span class="m">1</span><span class="w"> </span>msec
<span class="p">;;</span><span class="w"> </span>SERVER:<span class="w"> </span><span class="m">127</span>.0.0.11#53<span class="o">(</span><span class="m">127</span>.0.0.11<span class="o">)</span>
<span class="p">;;</span><span class="w"> </span>WHEN:<span class="w"> </span>Fri<span class="w"> </span>Feb<span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="m">09</span>:57:24<span class="w"> </span>UTC<span class="w"> </span><span class="m">2020</span>
<span class="p">;;</span><span class="w"> </span>MSG<span class="w"> </span>SIZE<span class="w"> </span>rcvd:<span class="w"> </span><span class="m">110</span>
</code></pre></div>
<p>To see load balancing performed by nginx we can explicitly define two services and assign them different weights. Run <code>docker-compose down</code> and change the nginx configuration to</p>
<div class="highlight"><pre><span></span><code><span class="k">upstream</span><span class="w"> </span><span class="s">app</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">server</span><span class="w"> </span><span class="n">application1</span><span class="p">:</span><span class="mi">8000</span><span class="w"> </span><span class="s">weight=3</span><span class="p">;</span>
<span class="w"> </span><span class="kn">server</span><span class="w"> </span><span class="n">application2</span><span class="p">:</span><span class="mi">8000</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">log_format</span><span class="w"> </span><span class="s">upstreamlog</span><span class="w"> </span><span class="s">'[</span><span class="nv">$time_local]</span><span class="w"> </span><span class="nv">$host</span><span class="w"> </span><span class="s">to:</span><span class="w"> </span><span class="nv">$upstream_addr:</span><span class="w"> </span><span class="nv">$request</span><span class="w"> </span><span class="nv">$status'</span><span class="p">;</span>
<span class="k">server</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span>
<span class="w"> </span><span class="kn">server_name</span><span class="w"> </span><span class="s">localhost</span><span class="p">;</span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">http://app</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kn">access_log</span><span class="w"> </span><span class="s">/var/log/nginx/access.log</span><span class="w"> </span><span class="s">upstreamlog</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>We defined here an <code>upstream</code> structure that lists two different services, <code>application1</code> and <code>application2</code>, giving to the first one a weight of 3. This mean that each 4 requests, 3 will be routed to the first service, and one to the second service. Now nginx is not just relying on the DNS, but consciously choosing between two different services.</p>
<p>Let's define the services accordingly in the Docker Compose configuration file</p>
<div class="highlight"><pre><span></span><code>version: "3"
services:
application1:
build:
context: .
dockerfile: Dockerfile
command: gunicorn --workers 6 --bind 0.0.0.0:8000 wsgi
expose:
<span class="k">-</span> 8000
application2:
build:
context: .
dockerfile: Dockerfile
command: gunicorn --workers 3 --bind 0.0.0.0:8000 wsgi
expose:
<span class="k">-</span> 8000
nginx:
image: nginx
volumes:
<span class="k">-</span> ./nginx.conf:/etc/nginx/conf.d/default.conf
ports:
<span class="k">-</span> 80:80
depends_on:
<span class="k">-</span> application1
<span class="k">-</span> application2
</code></pre></div>
<p>I basically duplicated the definition of <code>application</code>, but the first service is running now 6 workers, just for the sake of showing a possible difference between the two. Now run <code>docker-compose up -d</code> and <code>docker-compose logs -f nginx</code>. If you refresh the page on the browser multiple times you will see something like</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>docker-compose<span class="w"> </span>logs<span class="w"> </span>-f<span class="w"> </span>nginx
Attaching<span class="w"> </span>to<span class="w"> </span>service_nginx_1
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:25<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:25<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/favicon.ico<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">404</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:30<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.3:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:31<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:32<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:33<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:33<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.3:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:34<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:34<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:35<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.2:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
nginx_1<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="o">[</span><span class="m">14</span>/Feb/2020:11:03:35<span class="w"> </span>+0000<span class="o">]</span><span class="w"> </span>localhost<span class="w"> </span>to:<span class="w"> </span><span class="m">172</span>.18.0.3:8000:<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1<span class="w"> </span><span class="m">200</span>
</code></pre></div>
<p>where you can clearly notice the load balancing between <code>172.18.0.2</code> (<code>application1</code>) and <code>172.18.0.3</code> (<code>application2</code>) in action.</p>
<p>I will not show here an example of reverse proxy or HTTPS to prevent this post to become too long. You can find resources on those topics in the next section.</p>
<h3 id="43-resources">4.3 Resources<a class="headerlink" href="#43-resources" title="Permanent link">¶</a></h3>
<p>These resources provide more detailed information on the topics discussed in this section</p>
<ul>
<li>Docker Compose <a href="https://docs.docker.com/compose/">official documentation</a></li>
<li>nginx <a href="http://nginx.org/en/docs/">documentation</a>: in particular the sections about <a href="http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format">log_format</a> and <a href="http://nginx.org/en/docs/http/ngx_http_upstream_module.html#upstream">upstream</a> directives</li>
<li>How to <a href="https://docs.nginx.com/nginx/admin-guide/monitoring/logging/">configure logging</a> in nginx</li>
<li>How to <a href="https://docs.nginx.com/nginx/admin-guide/load-balancer/http-load-balancer/">configure load balancing</a> in nginx</li>
<li><a href="https://docs.nginx.com/nginx/admin-guide/security-controls/terminating-ssl-http/">Setting up an HTTPS Server</a> with nginx and <a href="https://www.humankode.com/ssl/create-a-selfsigned-certificate-for-nginx-in-5-minutes">how to created self-signed certificates</a></li>
<li>How to <a href="https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/">create a reverse proxy</a> with nginx, the documentation of the <a href="http://nginx.org/en/docs/http/ngx_http_core_module.html#location"><code>location</code></a> directive and <a href="https://www.digitalocean.com/community/tutorials/understanding-nginx-server-and-location-block-selection-algorithms">some insights</a> on the location choosing algorithms (one of the most complex parts of nginx)</li>
<li>The source code of this example is available <a href="https://github.com/lgiordani/dissecting-a-web-stack-code/tree/master/4_the_web_server">here</a></li>
</ul>
<h3 id="44-issues">4.4 Issues<a class="headerlink" href="#44-issues" title="Permanent link">¶</a></h3>
<p>Well, finally we can say that the job is done. Now we have a production-ready web server in front of our multi-threaded web framework and we can focus on writing Python code instead of dealing with HTTP headers.</p>
<p>Using a web server allows us to scale the infrastructure just adding new instances behind it, without interrupting the service. The HTTP concurrent server runs multiple instances of our framework, and the framework itself abstracts HTTP, mapping it to our high-level language.</p>
<h2 id="bonus-cloud-infrastructures">Bonus: cloud infrastructures<a class="headerlink" href="#bonus-cloud-infrastructures" title="Permanent link">¶</a></h2>
<p>Back in the early years of the Internet, companies used to have their own servers on-premise, and system administrators used to run the whole stack directly on the bare operating system. Needless to say, this was complicated, expensive, and failure-prone.</p>
<p>Nowadays "the cloud" is the way to go, so I want to briefly mention some components that can help you run such a web stack on AWS, which is the platform I know the most and the most widespread cloud provider in the world at the time of writing.</p>
<h3 id="elastic-beanstalk">Elastic Beanstalk<a class="headerlink" href="#elastic-beanstalk" title="Permanent link">¶</a></h3>
<p>This is the entry-level solution for simple applications, being a managed infrastructure that provides load balancing, auto-scaling, and monitoring. You can use several programming languages (among which Python and Node.js) and choose between different web servers like for example Apache or nginx. The components of an EB service are not hidden, but you don't have direct access to them, and you have to rely on configuration files to change the way they work. It's a good solution for simple services, but you will probably soon need more control.</p>
<p><a href="https://aws.amazon.com/elasticbeanstalk">Go to Elastic Beanstalk</a></p>
<h3 id="elastic-container-service-ecs">Elastic Container Service (ECS)<a class="headerlink" href="#elastic-container-service-ecs" title="Permanent link">¶</a></h3>
<p>With ECS you can run Docker containers grouping them in clusters and setting up auto-scale policies connected with metrics coming from CloudWatch. You have the choice of running them on EC2 instances (virtual machines) managed by you or on a serverless infrastructure called Fargate. ECS will run your Docker containers, but you still have to create DNS entries and load balancers on your own. You also have the choice of running your containers on Kubernetes using EKS (Elastic Kubernetes Service).</p>
<p><a href="https://aws.amazon.com/ecs/">Go to Elastic Container Service</a></p>
<h3 id="elastic-compute-cloud-ec2">Elastic Compute Cloud (EC2)<a class="headerlink" href="#elastic-compute-cloud-ec2" title="Permanent link">¶</a></h3>
<p>This is the bare metal of AWS, where you spin up stand-alone virtual machines or auto-scaling group of them. You can SSH into these instances and provide scripts to install and configure software. You can install here your application, web servers, databases, whatever you want. While this used to be the way to go at the very beginning of the cloud computing age I don't think you should go for it. There is so much a cloud provider can give you in terms of associated services like logs or monitoring, and in terms of performances, that it doesn't make sense to avoid using them. EC2 is still there, anyway, and if you run ECS on top of it you need to know what you can and what you can't do.</p>
<p><a href="https://aws.amazon.com/ec2/">Go to Elastic Compute Cloud</a></p>
<h3 id="elastic-load-balancing">Elastic Load Balancing<a class="headerlink" href="#elastic-load-balancing" title="Permanent link">¶</a></h3>
<p>While Network Load Balancers (NLB) manage pure TCP/IP connections, Application Load Balancers are dedicated to HTTP, and they can perform many of the services we need. They can reverse proxy through rules (that were recently improved) and they can terminate TLS, using certificates created in ACM (AWS Certificate Manager). As you can see, ALBs are a good replacement for a web server, even though they clearly lack the extreme configurability of a software. You can, however, use them as the first layer of load balancing, still using nginx or Apache behind them if you need some of the features they provide.</p>
<p><a href="https://aws.amazon.com/elasticloadbalancing/">Go to Elastic Load Balancing</a></p>
<h3 id="cloudfront">CloudFront<a class="headerlink" href="#cloudfront" title="Permanent link">¶</a></h3>
<p>CloudFront is a Content Delivery Network, that is a geographically-distributed cache that provides faster access to your content. While CDNs are not part of the stack that I discussed in this post I think it is worth mentioning CF as it can speed-up any static content, and also terminate TLS in connection with AWS Certificate Manager.</p>
<p><a href="https://aws.amazon.com/cloudfront/">Go to CloudFront</a></p>
<h2 id="conclusion">Conclusion<a class="headerlink" href="#conclusion" title="Permanent link">¶</a></h2>
<p>As you can see a web stack is a pretty rich set of components, and the reason behind them is often related to performances. There are a lot of technologies that we take for granted, and that fortunately have become easier to deploy, but I still believe a full-stack engineer should be aware not only of the existence of such layers, but also of their purpose and at least their basic configuration.</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>Digging up Django class-based views - 32014-02-14T16:13:41+01:002020-03-16T07:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2014-02-14:/blog/2014/02/14/digging-up-django-class-based-views-3/<p>A detailed explanation of the nature of class-based views in Django - detail views, base views, and date-based views</p><p>In the first two issues of this short series we discussed the basic concepts of class-based views in Django, and started understanding and using two of the basic generic views Django makes available to you: <code>ListView</code> and <code>DetailView</code>. Both are views that read some data from the database and show them on a rendered template. We also briefly reviewed the base views that allow us to build heavily customised views, and date-based views.</p>
<p>This third issue will introduce the reader to the class-based version of Django forms. This post is not meant to be a full introduction to the Django form library; rather, I want to show how class-based generic views implement the CUD part of the CRUD operations (Create, Read, Update, Delete), the Read one being implemented by "standard" generic views.</p>
<h2 id="a-very-basic-example">A very basic example<a class="headerlink" href="#a-very-basic-example" title="Permanent link">¶</a></h2>
<p>To start working with CBFs (class-based forms) let's consider a simple example. We have a <code>StickyNote</code> class which represents a simple text note with a date:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">StickyNote</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">timestamp</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">()</span>
<span class="n">text</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">(</span><span class="n">blank</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">null</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</code></pre></div>
<p>One of the first things we usually want to do is to build a form that allows the user to create a new entry in the database, in this case a new sticky note. We can create a page that allows us to input data for a new <code>StickyNote</code> simply creating the following view</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">NoteAdd</span><span class="p">(</span><span class="n">CreateView</span><span class="p">):</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">StickyNote</span>
</code></pre></div>
<p>It is no surprise that the class is mostly empty. Thanks to inheritance, as happened in the first two posts with standard views, the class contains a bunch of code that lives somewhere in the class hierarchy and works behind the scenes. Our mission is now to uncover that code to figure out how exactly CBFs work and how we can change them to perform what we need.</p>
<p>To make the post easier to follow, please always remember that "class-based form" is a short name for "class-based form view". That is, CBFs are views, so their job is to process incoming HTTP requests and return an HTTP response. Form views do this in a slightly different way than the standard ones, mostly due to the different nature of POST requests compared with GET ones. Let us take a look at this concept before moving on.</p>
<h2 id="http-requests-get-and-post">HTTP requests: GET and POST<a class="headerlink" href="#http-requests-get-and-post" title="Permanent link">¶</a></h2>
<p><em>Please note that this is a broad subject and that the present section wants only to be a very quick review of the main concepts that are related to Django CBFs</em></p>
<p>HTTP requests come in different forms, depending on the <strong>method</strong> they carry. Those methods are called <strong>HTTP verbs</strong> and the two most used ones are <strong>GET</strong> and <strong>POST</strong>. The GET method tells the server that the client wants to retrieve a resource (the one connected with the relative URL) and shall have no side effects (such as changing the resource). The POST method is used to send some data to the server, the given URL being the resource that shall handle the data.</p>
<p>As you can see, the definition of POST is very broad: the server accepts the incoming data and is allowed to perform any type of action with it, such as creating a new entity, editing or deleting one or more of them, and so on.</p>
<p>Keep in mind that forms are not the same thing as POST request. As a matter of fact, they are connected just incidentally: a form is a way to collect data from a user browsing a HTML page, while POST requests are the way that data is transmitted to the server. You do not need to have a form to make a POST request, you just need some data to send. HTML forms are just a useful way to send POST requests, but not the only one.</p>
<h2 id="form-views">Form views<a class="headerlink" href="#form-views" title="Permanent link">¶</a></h2>
<p>Why are form views different from standard views? The answer can be found looking at the flow of a typical data submission on a Web site:</p>
<ol>
<li>The user browses a web page (GET)</li>
<li>The server answers the GET request with a page containing a form</li>
<li>The user fills the form and submits it (POST)</li>
<li>The server receives and processes data</li>
</ol>
<p>As you can see the procedure involves a double interaction with the server: the first request GETs the page, the second POSTs the data. So you need to build a view that answers the GET request and a view that answers the POST one.</p>
<p>Since most of the time the URL we use to POST data is the same URL we used to GET the page, we need to build a view that accepts both methods. It is time to dig into the class-based forms that Django provides to understand how they deal with this double interaction.</p>
<p>Let us start with the <code>CreateView</code> class we used in our simple example (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L175">CODE</a>. It is an almost empty class that inherits from <code>SingleObjectTemplateResponseMixin</code> and <code>BaseCreateView</code>. The first class deals with the template selected to render the response and we can leave it aside for the moment. The second class (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L160">CODE</a>), on the other hand, is the one we are interested in now, as it implements two methods which names are self explaining, <code>get</code> and <code>post</code>.</p>
<h2 id="processing-get-and-post-requests">Processing GET and POST requests<a class="headerlink" href="#processing-get-and-post-requests" title="Permanent link">¶</a></h2>
<p>We already met the <code>get</code> method in the <a href="https://www.thedigitalcatonline.com/blog/2013/12/11/digging-up-django-class-based-views-2/">previous article</a> when we talked about the <code>dispatch</code> method of the <code>View</code> class. A quick recap of its purpose: this method is uses to process an incoming HTTP request, and is called when the HTTP method is GET. Unsurprisingly, the <code>post</code> method is called when the incoming request is a POST one. The two methods are already defined by an ancestor of the <code>BaseCreateView</code> class, namely <code>ProcessFormView</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L129">CODE</a>), so it is useful to have a look at the source code of this last class:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">ProcessFormView</span><span class="p">(</span><span class="n">View</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Render a form on GET and processes it on POST."""</span>
<span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Handle GET requests: instantiate a blank version of the form."""</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">render_to_response</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">get_context_data</span><span class="p">())</span>
<span class="k">def</span> <span class="nf">post</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""</span>
<span class="sd"> Handle POST requests: instantiate a form instance with the passed</span>
<span class="sd"> POST variables and then check if it's valid.</span>
<span class="sd"> """</span>
<span class="n">form</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_form</span><span class="p">()</span>
<span class="k">if</span> <span class="n">form</span><span class="o">.</span><span class="n">is_valid</span><span class="p">():</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">form_valid</span><span class="p">(</span><span class="n">form</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">form_invalid</span><span class="p">(</span><span class="n">form</span><span class="p">)</span>
</code></pre></div>
<p>As you can see the two methods are pretty straightforward, but it's clear that a lot is going on under the hood.</p>
<h2 id="the-form-workflow">The form workflow<a class="headerlink" href="#the-form-workflow" title="Permanent link">¶</a></h2>
<p>Let's start with <code>get</code>, which apparently doesn't do much. It just calls <code>render_to_response</code> passing the result of <code>get_context_data</code>, so we need to track the latter to see what the template will get. <code>ProcessFormView</code> or its ancestors don't provide any method called <code>get_context_data</code>; instead, the <code>BaseCreateView</code> class receives it from <code>ModelFormMixin</code>, which in turn receives it from <code>FormMixin</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L63">CODE</a>).</p>
<p>The class hierarchy is pretty complex, but don't be scared, the important part is that the method <code>get_context_data</code> provided by <code>FormMixin</code> injects a <code>'form'</code> value into the context (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L65">CODE</a>), and the form is provided by the <code>get_form</code> method defined in the same class (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L29">CODE</a>), and this eventually uses the <code>form_class</code> attribute to instantiate the form (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L27">CODE</a>). As you can see there are plenty of steps, which means plenty of chances to customise the behaviour, if we should need to provide a personalised solution.</p>
<p>It is interesting to have a even more in-depth look at the form creating mechanism, though, as this is the crucial point of the whole GET/POST difference. Once the method <code>get_form</code> retrieved the form class, it instantiates it to create the form itself, and the parameters passed to the class are provided by <code>get_form_kwargs</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L35">CODE</a>). When the HTTP method is GET, <code>get_form_kwargs</code> returns a dictionary with the <code>initial</code> and <code>prefix</code> keys, which are taken from the attributes with the same names. I don't want to dig too much into forms now, as they are out of the scope of the post, but if you read the definition of <code>BaseForm</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/forms/forms.py#L57">CODE</a>) you will notice that its <code>__init__</code> method accepts the same two attributes <code>inital</code> and <code>prefix</code>. Pay attention that this is a simplification of the whole process, as the <code>ModelFormMixin</code> class injects a slightly more complicated version of both <code>got_form_class</code> and <code>get_form_kwargs</code> to provide naming conventions related to the Django model in use.</p>
<p>Back to <code>ProcessFormView</code>, the <code>post</code> method does not directly render the template since it has to process incoming data before doing that last step. The method, thus, calls <code>get_form</code> directly and then runs the validation process on it, calling then either <code>form_valid</code> or <code>form_invalid</code>, depending on the result of the test. See the <a href="https://docs.djangoproject.com/en/3.0/ref/forms/validation/">official documentation</a> for more information about form validation.</p>
<p>This time, <code>get_form_kwargs</code> adds two keys to the form when it is instantiated, namely <code>data</code> and <code>files</code>. These come directly from the <code>POST</code> and <code>FILES</code> attributes of the request, and contain the data the user is sending to the server.</p>
<p>Last, let's have a look at <code>form_valid</code> and <code>form_invalid</code>. Both methods are provided by <code>FormMixin</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L55">CODE</a>), but the former is augmented by <code>ModelFormMixin</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L123">CODE</a>). The base version of <code>form_invalid</code> calls <code>render_to_response</code> passing the context data initialised with the form itself. This way it is possible to fill the template with the form values and error messages for the wrong ones, while <code>form_valid</code>, in its base form, just returns an <code>HttpResponseRedirect</code> to the <code>success_url</code>. As I said, <code>form_valid</code> is overridden by <code>ModelFormMixin</code>, which first saves the form, and then calls the base version of the method.</p>
<p>Let's recap the process until here.</p>
<ol>
<li>The URL dispatcher requests a page containing a form with GET.</li>
<li>The <code>get</code> method of <code>ProcessFormView</code> finds the form class of choice through <code>get_form_class</code></li>
<li>The form class is instantiated by <code>get_form</code> with the values contained in the <code>self.initial</code> dictionary</li>
<li>At this point a template is rendered with a context returned by <code>get_context_data</code> as usual. The context contains the form.</li>
<li>When the use submits the form the URL dispatcher requests the page with a POST that contains the data</li>
<li>The <code>post</code> method of <code>ProcessFormView</code> validates the form and acts accordingly, rendering the page again if the data is invalid or processing it and rendering a success template with the newly created object.</li>
</ol>
<h2 id="update-and-delete-operations">Update and Delete operations<a class="headerlink" href="#update-and-delete-operations" title="Permanent link">¶</a></h2>
<p>This rather rich code tour unveiled the inner mechanism of the <code>CreateView</code> class, which can be used to create a new object in the database. The <code>UpdateView</code> and <code>DeleteView</code> classes follow a similar path, with minor changes to perform the different action they are implementing.</p>
<p><code>UpdateView</code> wants to show the form already filled with values, so it instantiates an object before processing the request (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L189">CODE</a>). This makes the object available in the keywords dictionary under the <code>instance</code> key (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L107">CODE</a>), which is used by model forms to initialize the data (<a href="https://github.com/django/django/blob/stable/3.0.x/django/forms/models.py#L293">CODE</a>). The <code>save</code> method of <code>BaseModelForm</code> is smart enough to understand if the object has been created or just changed (<a href="https://github.com/django/django/blob/stable/3.0.x/django/forms/models.py#L454">CODE</a> so the <code>post</code> method of <code>UpdateView</code> works just like the one of <code>CreateView</code>.</p>
<p><code>DeleteView</code> is a bit different from <code>CreateView</code> and <code>UpdateView</code>. As <a href="https://docs.djangoproject.com/en/3.0/ref/class-based-views/generic-editing/#deleteview">the official documentation</a> states, if called with a GET method it shows a confirmation page that POSTs to the same URL. So, as for the GET requests, <code>DeleteView</code> just uses the <code>get</code> method defined by its ancestor <code>BaseDetailView</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/detail.py#L105">CODE</a>), which renders the template putting the object in the context. When called with a POST request, the view uses the <code>post</code> method defined by <code>DeletionMixin</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L217">CODE</a>, which just calls the <code>delete</code> method of the same class (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L206">CODE</a>). This performs the deletion on the database and redirects to the success URL.</p>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>As you can see, the structure behind the current implementation of Django class-based form views is rather complex. This allows the user to achieve complex behaviours like the CUD operations just by defining a couple of classes as I did in the simple example at the beginning of the post. Most of the time, however, such a simplification makes it difficult for the programmer to understand how to achieve the desired changes to the class behaviour. So, the purpose of this big tour I made inside the Django source code was to give an insight of what methods are called in the life cycle of your HTTP request so that you can better identify what methods you need to override.</p>
<p>When performing special actions that fall outside the standard CUD operations you better inherit from <code>FormView</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/edit.py#L156">CODE</a>). The first thing to do is to check if and how you need to customize the <code>get</code> and <code>post</code> methods; remember that you either need to implement the full behaviour of those methods or make you changes and call the parent implementation. If this is not enough for your application consider overriding one of the more dedicated methods, such as <code>get_form_kwargs</code> or <code>form_valid</code>.</p>
<p>This post ends the series "Digging Up Django Class-based Views". Stay tuned for other <a href="/categories/django/">articles on Django</a>!</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>Digging up Django class-based views - 22013-12-11T09:00:00+02:002020-03-16T07:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2013-12-11:/blog/2013/12/11/digging-up-django-class-based-views-2/<p>A detailed explanation of the nature of class-based views in Django - detail views, base views, and date-based views</p><p>In the first instalment of this short series, I introduced the theory behind Django class-based views and the reason why in this context classes are more powerful than pure functions. I also introduced one of the generic views Django provides out of the box, which is <code>ListView</code>.</p>
<p>In this second post I want to talk about the second most used generic view, <code>DetailView</code>, and about custom querysets and arguments. Last, I'm going to introduce unspecialised class-based views that allow you to build more complex Web pages. To fully understand <code>DetailView</code>, however, you need to grasp two essential concepts, namely <strong>querysets</strong> and <strong>view parameters</strong>. So I'm sorry for the learn-by-doing readers, but this time too I'm going to start with some pure programming topics.</p>
<h2 id="querysets-or-the-art-of-extracting-information">QuerySets or the art of extracting information<a class="headerlink" href="#querysets-or-the-art-of-extracting-information" title="Permanent link">¶</a></h2>
<p>One of the most important parts of Django is the ORM (Object Relational Mapper), which allows you to access the underlying database just like a collection of Python objects. As you know, Django provides tools to simplify the construction of DB queries; they are <strong>managers</strong> (the <code>.objects</code> attribute of any models, for example) and <strong>query methods</strong> (<code>get</code>, <code>filter</code>, and so on). Pay attention because things here are slightly more complicated than you can think at a first glance.</p>
<p>When you use one of the methods of a manager you get as a result a <code>QuerySet</code>, which most of the time is used as a list, but is more than this. You can find the documentation about queries <a href="https://docs.djangoproject.com/en/3.0/topics/db/queries/">here</a> and the documentation about <code>QuerySet</code> <a href="https://docs.djangoproject.com/en/3.0/ref/models/querysets/">here</a>. Both are very recommended readings.</p>
<p>What I want to stress here is that quesysets are not evaluated until you perform an action that access the content like slicing or iterating on it. This means that we can build querysets, pass them to functions, store them, and even build them programmatically or metaprogramming them without the DB being hit. If you think at querysets as recipes you are not far from the truth: they are objects that store how you want to retrieve the data of your interest. Actually retrieving them is another part of the game. This separation between the definition of something and its execution is called <strong>lazy evaluation</strong>.</p>
<p>Let me give you a very trivial example to show why the lazy evaluation of querysets is important.</p>
<div class="highlight"><pre><span></span><code><span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">get_oldest_three</span><span class="p">(</span><span class="n">queryset</span><span class="p">):</span>
<span class="k">return</span> <span class="n">queryset</span><span class="o">.</span><span class="n">order_by</span><span class="p">[</span><span class="s1">'id'</span><span class="p">][</span><span class="mi">0</span><span class="p">:</span><span class="mi">2</span><span class="p">]</span>
<span class="n">old_books</span> <span class="o">=</span> <span class="n">get_oldest_three</span><span class="p">(</span><span class="n">Book</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">())</span>
<span class="n">old_hardcover_books</span> <span class="o">=</span> \
<span class="n">get_oldest_three</span><span class="p">(</span><span class="n">Book</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="s1">'type=Book.HARDCOVER'</span><span class="p">))</span>
</code></pre></div>
<p>As you can see the <code>get_oldest_three</code> method is just filtering an incoming <code>QuerySet</code> (which can be of any type); it simply orders the objects and gets the first three inserted in the DB. The important thing here is that we are using querysets like pure algorithms, or descriptions of a procedure. When creating the <code>old_books</code> variable we are just telling the <code>get_oldest_three</code> method "Hey, this is the way I extract the data I'm interested in. May you please refine it and return the actual data?"</p>
<p>Being such flexible objects, querysets are an important part of generic views, so keep them warm for the upcoming banquet.</p>
<h2 id="being-flexible-parametric-views">Being flexible: parametric views<a class="headerlink" href="#being-flexible-parametric-views" title="Permanent link">¶</a></h2>
<p>URLs are the API of our Web site or service. This can be more or less evident for the user that browses through the pages, but from the programmer's point of view, URLs are the entry points of a Web-based service. As such, they are not very different from the API of a library: here, static pages are just like constants, or functions that always return that same value (such as a configuration parameter), while dynamic pages are like functions that process incoming data (parameters) and return a result.</p>
<p>So URLs can accept parameters, and our underlying view shall do the same. You basically have two methods to convey parameters from the browser to your server using HTTP. The first method is named <a href="https://en.wikipedia.org/wiki/Query_string">query string</a> and lists parameters directly in the URL through a universal syntax. The second method is storing parameters in the HTTP request body, which is what POST requests do. We will discuss this method in a later post about forms.</p>
<p>The first method has one big drawback: most of the time URLs are long (and sometimes <em>too</em> long), and difficult to use as a real API. To soften this effect the concept of <a href="https://en.wikipedia.org/wiki/Clean_URL">clean URL</a> arose, and this is the way Django follows natively (though, if you want, you can also stick to the query string method).</p>
<p>Now, the Django official documentation on <a href="https://docs.djangoproject.com/en/3.0/topics/http/urls/">URL dispatcher</a> tells you how you can collect parameters contained in the URL parsing it with a regular expression; what we need to discover is how class-based views receive and process them.</p>
<p>In the previous post we already discussed the <code>as_view</code> method that shall instance the class and return the result of <code>dispatch</code> (<a href="https://github.com/django/django/blob/stable/1.5.x/django/views/generic/base.py#L46">CODE</a>).</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">as_view</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="o">**</span><span class="n">initkwargs</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""</span>
<span class="sd"> Main entry point for a request-response process.</span>
<span class="sd"> """</span>
<span class="c1"># sanitize keyword arguments</span>
<span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">initkwargs</span><span class="p">:</span>
<span class="k">if</span> <span class="n">key</span> <span class="ow">in</span> <span class="bp">cls</span><span class="o">.</span><span class="n">http_method_names</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">TypeError</span><span class="p">(</span><span class="s2">"You tried to pass in the </span><span class="si">%s</span><span class="s2"> method name as a "</span>
<span class="s2">"keyword argument to </span><span class="si">%s</span><span class="s2">(). Don't do that."</span>
<span class="o">%</span> <span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="bp">cls</span><span class="o">.</span><span class="vm">__name__</span><span class="p">))</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">hasattr</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">key</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">TypeError</span><span class="p">(</span><span class="s2">"</span><span class="si">%s</span><span class="s2">() received an invalid keyword </span><span class="si">%r</span><span class="s2">. as_view "</span>
<span class="s2">"only accepts arguments that are already "</span>
<span class="s2">"attributes of the class."</span> <span class="o">%</span> <span class="p">(</span><span class="bp">cls</span><span class="o">.</span><span class="vm">__name__</span><span class="p">,</span> <span class="n">key</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">view</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="bp">self</span> <span class="o">=</span> <span class="bp">cls</span><span class="p">(</span><span class="o">**</span><span class="n">initkwargs</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="s1">'get'</span><span class="p">)</span> <span class="ow">and</span> <span class="ow">not</span> <span class="nb">hasattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="s1">'head'</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">head</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get</span>
<span class="bp">self</span><span class="o">.</span><span class="n">request</span> <span class="o">=</span> <span class="n">request</span>
<span class="bp">self</span><span class="o">.</span><span class="n">args</span> <span class="o">=</span> <span class="n">args</span>
<span class="bp">self</span><span class="o">.</span><span class="n">kwargs</span> <span class="o">=</span> <span class="n">kwargs</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">dispatch</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="c1"># take name and docstring from class</span>
<span class="n">update_wrapper</span><span class="p">(</span><span class="n">view</span><span class="p">,</span> <span class="bp">cls</span><span class="p">,</span> <span class="n">updated</span><span class="o">=</span><span class="p">())</span>
<span class="c1"># and possible attributes set by decorators</span>
<span class="c1"># like csrf_exempt from dispatch</span>
<span class="n">update_wrapper</span><span class="p">(</span><span class="n">view</span><span class="p">,</span> <span class="bp">cls</span><span class="o">.</span><span class="n">dispatch</span><span class="p">,</span> <span class="n">assigned</span><span class="o">=</span><span class="p">())</span>
<span class="k">return</span> <span class="n">view</span>
</code></pre></div>
<p>Now look at what the <code>view</code> wrapper function actually does with the instanced class (<a href="https://github.com/django/django/blob/stable/1.5.x/django/views/generic/base.py#L46">CODE</a>); not surprisingly it takes the <code>request</code>, <code>args</code> and <code>kwargs</code> passed by the URLconf and converts them into as many class attributes with the same names. Remember that URLconf is given this function itself, not the result of the call, which is the result of <code>dispatch</code>.</p>
<p>This means that <em>anywhere in our CBVs</em> we can access the original call parameters simply reading <code>request</code>, <code>args</code> and <code>kwargs</code>, where <code>*args</code> and <code>**kwargs</code> are the unnamed and named values extracted by the URLconf regular expression.</p>
<h2 id="getting-details">Getting details<a class="headerlink" href="#getting-details" title="Permanent link">¶</a></h2>
<p>Just after listing things, one of the most useful things a Web site does is giving details about objects. Obviously any e-commerce site is made for the most part by pages that list products and show product details, but also a blog is made of one or more pages with a list of posts and a page for each of them. So building a detailed view of the content of our database is worth learning.</p>
<p>To help us in this task Django provides <code>DetailView</code>, which indeed deals, as the name suggests, with the details of what we get from the DB. While <code>ListView</code>'s basic behaviour is to extract the list of all objects with a given model, <code>DetailView</code> extracts a single object. How does it know what object shall be extracted?</p>
<p>When <code>dispatch</code> is called on an incoming HTTP request the only thing it does is to look at the <code>method</code> attribute, which for <code>HttpRequest</code> objects contains the name of the HTTP verb used (e.g. <code>'GET'</code>); then <code>dispatch</code> looks for a method of the class with the lowercase name of the verb (e.g. <code>'GET'</code> becomes <code>get</code>) (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L94">CODE</a>). This handler is then called with the same parameters of <code>dispatch</code>, namely the <code>request</code> itself, <code>*args</code> and <code>**kwargs</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L97">CODE</a>).</p>
<p><code>DetailView</code> has no body and inherits everything from two classes, just like happened for <code>ListView</code>; the first parent class is the template mixin, while the second one, <code>BaseDetailView</code>, implements the <code>get</code> method (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/detail.py#L105">CODE</a>).</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">object</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_object</span><span class="p">()</span>
<span class="n">context</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_context_data</span><span class="p">(</span><span class="nb">object</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">object</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">render_to_response</span><span class="p">(</span><span class="n">context</span><span class="p">)</span>
</code></pre></div>
<p>As you can see, this method extracts the single object that it shall represent calling <code>get_object</code>, then calls <code>get_context_data</code> (that we already met in the previous post) and last the familiar <code>render_to_response</code>. The method <code>get_object</code> is provided by <code>BaseDetailView</code>'s ancestor <code>SingleObjectMixin</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/detail.py#L8">CODE</a>): the most important parts of its code, for the sake of our present topic are</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">get_object</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">queryset</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">if</span> <span class="n">queryset</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">queryset</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_queryset</span><span class="p">()</span>
<span class="n">pk</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">kwargs</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">pk_url_kwarg</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">if</span> <span class="n">pk</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">queryset</span> <span class="o">=</span> <span class="n">queryset</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">pk</span><span class="o">=</span><span class="n">pk</span><span class="p">)</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">obj</span> <span class="o">=</span> <span class="n">queryset</span><span class="o">.</span><span class="n">get</span><span class="p">()</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">return</span> <span class="n">obj</span>
</code></pre></div>
<p><strong>Warning</strong>: I removed many lines from the previous function to improve readability; please check the original source code for the complete implementation.</p>
<p>The code shows where <code>DetailView</code> gets the queryset from; the <code>get_queryset</code> method is provided by <code>SingleObjectMixin</code> itself and basically returns <code>queryset</code> if present, otherwise returns all objects of the given model (acting just like <code>ListView</code> does). This <code>queryset</code> is then refined by a <code>filter</code> and last by a <code>get</code>. Here <code>get</code> is not used directly (I think) to manage the different error cases and raise the correct exceptions.</p>
<p>The parameter <code>pk</code> used in <code>filter</code> comes directly from <code>kwargs</code>, so it is taken directly from the URL. Since this is a core concept of views in general I want to look at this part with some extra care.</p>
<p>The <code>DetailView</code> class is called by an URLconf that provides a regular expression to parse the URL, for example <code>url(r'^(?P<pk>\d+)/$',</code>. This regex extracts a parameter and gives it the name <code>pk</code>, so <code>kwargs</code> of the view will contain <code>pk</code> as key and the actual number in the URL as value. For example the URL <code>123/</code> will result in <code>{'pk': 123}</code>. The default behaviour of <code>DetailView</code> is to look for a <code>pk</code> key and use it to perform the filtering of the queryset, since <code>pk_url_kwarg</code> is <code>'pk'</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/detail.py#L17">CODE</a>).</p>
<p>So if we want to change the name of the parameter we can simply define the <code>pk_url_kwarg</code> of our class and provide a regex that extract the primary key with the new name. For example <code>url(r'^(?P<key>\d+)/$',</code> extracts it with the name <code>key</code>, so we should define <code>pk_url_kwarg = 'key'</code> in our class.</p>
<p>From this quick exploration we learned that a class inheriting from <code>DetailView</code>:</p>
<ul>
<li>provides a context with the <code>object</code> key initialized to a single object</li>
<li><strong>must</strong> be configured with a <code>model</code> class attribute, to know what objects to extract</li>
<li><strong>can</strong> be configured with a <code>queryset</code> class attribute, to refine the set of objects where the single object is extracted from</li>
<li><strong>must</strong> be called from a URL that includes a regexp that extracts the primary key of the searched object as <code>pk</code></li>
<li><strong>can</strong> be configured to use a different name for the primary key through the <code>pk_url_kwarg</code> class attribute</li>
</ul>
<p>The basic use of <code>DetailView</code> is exemplified by the following code.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">BookDetail</span><span class="p">(</span><span class="n">DetailView</span><span class="p">):</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Book</span>
<span class="n">urlpatterns</span> <span class="o">=</span> <span class="n">patterns</span><span class="p">(</span><span class="s1">''</span><span class="p">,</span>
<span class="n">url</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^(?P<pk>\d+)/$'</span><span class="p">,</span>
<span class="n">BookDetail</span><span class="o">.</span><span class="n">as_view</span><span class="p">(),</span>
<span class="n">name</span><span class="o">=</span><span class="s1">'detail'</span><span class="p">),</span>
<span class="p">)</span>
</code></pre></div>
<p>The view extracts a single object with the <code>Book</code> model; the regex is configured with the standard <code>pk</code> name.</p>
<p>As shown for <code>ListView</code> in the previous post, any CBV uses <code>get_context_data</code> to return the context dictionary to the rendering engine. So views that inherit from <code>DetailView</code> can add data to the context following the same pattern. Suppose we have a function <code>get_similar_books</code> that given a book, returns similar ones according to some criteria.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">BookDetail</span><span class="p">(</span><span class="n">DetailView</span><span class="p">):</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Book</span>
<span class="k">def</span> <span class="nf">get_context_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">context</span> <span class="o">=</span> <span class="nb">super</span><span class="p">(</span><span class="n">BookDetail</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">get_context_data</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'similar'</span><span class="p">]</span> <span class="o">=</span> <span class="n">get_similar_books</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">object</span><span class="p">)</span>
<span class="k">return</span> <span class="n">context</span>
<span class="n">urlpatterns</span> <span class="o">=</span> <span class="n">patterns</span><span class="p">(</span><span class="s1">''</span><span class="p">,</span>
<span class="n">url</span><span class="p">(</span><span class="sa">r</span><span class="s1">'^(?P<pk>\d+)/$'</span><span class="p">,</span>
<span class="n">BookDetail</span><span class="o">.</span><span class="n">as_view</span><span class="p">(),</span>
<span class="n">name</span><span class="o">=</span><span class="s1">'detail'</span><span class="p">),</span>
<span class="p">)</span>
</code></pre></div>
<p>As explained before, you can access the object being shown through <code>object</code>, which in the above example is passed to a service function we implemented somewhere in our code.</p>
<h2 id="using-the-base-views">Using the base views<a class="headerlink" href="#using-the-base-views" title="Permanent link">¶</a></h2>
<p>Sometimes, when dealing with complex pages, the generic display CBVs that Django provides are not the right choice. This usually becomes evident when you start overriding method to prevent the view to perform its standard behaviour. As an instance say that you want to show detailed information of more than one object: probably DetailView will soon show its limits, having been built to show only one object.</p>
<p>In all those cases that cannot be easily solved by one of the generic display CBVs, your have to build your own starting from one of the base views: <code>RedirectView</code>, <code>TemplateView</code>, or <code>View</code> (<a href="https://docs.djangoproject.com/en/3.0/ref/class-based-views/base/">DOCS</a>, <a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py">CODE</a>).</p>
<p>I'm not going to fully describe those views; I want however to briefly point out some peculiarities.</p>
<p><code>View</code> is by now an old friend of us; we met it when we discussed the <code>as_view</code> and <code>dispatch</code> methods. It is the most generic view class and can be leveraged to perform very specialized tasks such as rendering pages without templates (for example when returning JSON data).</p>
<p><code>TemplateView</code> is the best choice to render pages from a template, maintaining a great level of freedom when it comes to the content of the context dictionary. Chances are that this is going to be the view you will use the most after <code>ListView</code> and <code>DetailView</code>. Basically you just need to inherit from it and define the <code>get_context_data</code> method. As you can see from the <a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L157">CODE</a> <code>TemplateView</code> answers to GET requests only.</p>
<p><code>RedirectView</code>, as the name implies, is used to redirect a request. The redirection mechanism is very simple: its <code>get</code> method returns a <code>HttpResponseRedirect</code> to the URL defined by the <code>url</code> class attribute. The class exhibits a very interesting behaviour (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L201">CODE</a>) when called with HTTP methods other than GET (namely HEAD, POST, OPTIONS, DELETE, PUT, or PATCH): it "converts" the method to GET simply calling <code>get</code> from the respective method (<code>head</code>, <code>post</code>, and so on). In the next post I'll show how to leverage this simple technique to show the user a pre-filled form.</p>
<h2 id="date-based-views">Date-based views<a class="headerlink" href="#date-based-views" title="Permanent link">¶</a></h2>
<p>Django provides other class-based views that simplify dealing with objects extracted or ordered by date. As a programmer, you know that sometimes dealing with dates is awkward, to say the least; views such as <code>YearArchiveView</code> or <code>DayArchiveView</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/dates.py">CODE</a>) aim to help you to tame your date-based objects; any object that contains a date (e.g. post date for articles, birth date for people, log date for messages, etc.) can be processed by these views. You can find the official documentation <a href="https://docs.djangoproject.com/en/3.0/ref/class-based-views/generic-date-based/">here</a>.</p>
<p>Remember that date-based views are CBVs, so they are based on <code>View</code>, just like <code>ListView</code> or <code>TemplateView</code>. So, apart from their specialization on date processing, they behave the same (using <code>get_context_data</code>, <code>get</code>, <code>dispatch</code>, and so on).</p>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>In this post we covered <code>DetailView</code> in deep and, more superficially, all the remaining base and data-based views. I showed you how <code>DetailView</code> uses the given model and the querystring parameters to find the requested object, and how you can change its default behaviour. In the next post we will step into the rich (and strange) world of forms.</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>Digging up Django class-based views - 12013-10-28T08:43:00+02:002020-03-16T07:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2013-10-28:/blog/2013/10/28/digging-up-django-class-based-views-1/<p>A detailed explanation of the nature of class-based views in Django - list views and GET requests</p><p>Django 3 was released at the end of 2019, so I think it is high time I revisited my successful series of post about class-based views in Django. Those posts date back to 2013 and have been written with Django 1.5 in mind, and with examples from that code base. Now Django is already two versions older, but class-based views are still a big part of the framework, so I believe it makes sense to refresh the content of those posts. Moreover, I didn't have a chance to study Django 3 yet, so as per tradition of this blog, I will make my personal investigation available to everyone.</p>
<p>If you are a novice Python programmer, and just approached Django to start your career in web development, chances are that you were puzzled by many things, and class-based views (CBVs) are definitely among those. CBVs are apparently very easy to use, in the simple cases, but it might not be clear how to extend them to match more complicated use cases, as the development of a project proceeds. The official documentation is very good, but to master CBVs you need to understand object-oriented concepts like classes (well, obviously), delegation, and method overriding.</p>
<p>If you need to brush up on these concepts you might find useful to read the following posts here on the blog:</p>
<ul>
<li><a href="https://www.thedigitalcatonline.com/blog/2014/08/20/python-3-oop-part-1-objects-and-types/">Object-Oriented Programming in Python 3</a></li>
<li><a href="https://www.thedigitalcatonline.com/blog/2014/05/19/method-overriding-in-python/">Method overriding in Python</a></li>
</ul>
<h2 id="what-are-cbvs">What are CBVs?<a class="headerlink" href="#what-are-cbvs" title="Permanent link">¶</a></h2>
<p>Class-based views are, Django views based on Python classes. This means that, to master them, you need to understand both Django views and Python classes, so let's give a quick definition of them.</p>
<p>A Django view is a piece of code that processes an incoming HTTP request and returns an HTTP response, nothing more, nothing less. A Python class is the implementation of the Object-Oriented concept of class in the Python language.</p>
<p>So, a view needs to be a <a href="https://docs.python.org/3/library/functions.html#callable">callable</a>, and this includes functions and classes. Thus, to understand the advantages of class-based views over function-based views we shall discuss the merits of classes over functions. The latter sentence could be the title of a 10 volumes book on programming (followed by another 10 volumes book titled "Merits of functions over classes"), so I am just going to scratch the surface of the matter here. If you want to dig more into the subject, please read the series on Python 3 OOP that I linked above, where you will find all the gory details that you are craving for.</p>
<h2 id="starting-off-with-python-classes">Starting off with Python classes<a class="headerlink" href="#starting-off-with-python-classes" title="Permanent link">¶</a></h2>
<p>The main point classes is to implement encapsulation: they represent a way of coupling data and functions. Doing this, a class loses the dynamic essence of a procedure, which exists only while it is running, and becomes a living entity, something that sits there, caring for its data, and reacts when we call its functions (methods).</p>
<p>A good analogy for a class is a finite-state machine: once the class has been initialized, methods are what we use to make the machine move between states. If we do not call methods, the class simply waits there without complaining.</p>
<p>As an example, let's look at a very simple procedure that extracts the even numbers from an iterable like a list</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">extract_even_numbers</span><span class="p">(</span><span class="n">alist</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">alist</span> <span class="k">if</span> <span class="n">i</span><span class="o">%</span><span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">]</span>
</code></pre></div>
<p>The example is very trivial, but, as code naturally tends to become more complicated, it's better to start with simple examples. A class version of this function could be written as</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">EvenExtractor</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">alist</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">l</span> <span class="o">=</span> <span class="n">alist</span>
<span class="k">def</span> <span class="nf">extract</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">l</span> <span class="k">if</span> <span class="n">i</span><span class="o">%</span><span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">]</span>
</code></pre></div>
<p>The two are very similar, and it might look like we haven't changed anything. Indeed, the difference is subtle but remarkable. Now the <code>EvenExtractor</code> class has two parts, the first being the initialization and the second being the actual extraction, and we can have the class in one of three states: before initialization (<code>EvenExtractor</code>), after initialization (<code>e = EvenExtractor([1,4,5,7,12])</code>), and after extraction (<code>l = e.extract()</code>).</p>
<p>Converting the procedure to a class, then, we obtained a rich tool that can execute its job step by step and, in general, can work in a non linear way, as we might add further methods, and thus more states.</p>
<h2 id="delegation-is-the-key">Delegation is the key<a class="headerlink" href="#delegation-is-the-key" title="Permanent link">¶</a></h2>
<p>The real power of classes used as finite-state machines lies in the concept of delegation. This is a mechanism through which a class can delegate some work to another class, avoiding to duplicate code, and thus favouring code reuse and generalisation.</p>
<p>(You might notice that I don't mention inheritance, but delegation, which is implemented by both composition and inheritance. I am a strong supporter of an OO design principle that states "Favour composition over inheritance". I keep reading too many introductions to object-oriented that stress too much the inheritance mechanism and leave composition aside, raising a generation of OOP programmers that, instead of building systems populated by many small collaborating objects, create nightmares infested by giant all-purpose things that sometimes resemble more an operating system than a system component.)</p>
<p>Let's continue the above example, improving the <code>__init__</code> method of the <code>EvenExtractor</code> class:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">EvenExtractor</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">alist</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">l</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">elem</span><span class="p">)</span> <span class="k">for</span> <span class="n">elem</span> <span class="ow">in</span> <span class="n">alist</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">extract</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">l</span> <span class="k">if</span> <span class="n">i</span><span class="o">%</span><span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">]</span>
</code></pre></div>
<p>Now the class performs an important action in its initialization phase, converting all elements of the input to integers. Some days after this change, however, we might realise that we could also profitably use a class that extracts odd elements from a list. Being responsible object oriented programmers we write</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">OddExtractor</span><span class="p">(</span><span class="n">EvenExtractor</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">extract</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">l</span> <span class="k">if</span> <span class="n">i</span><span class="o">%</span><span class="mi">2</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">]</span>
</code></pre></div>
<p>and call it a day. Through the inheritance mechanism expressed by that <code>(EvenExtractor)</code> signature of the new class, we first defined something that is exactly the same thing as <code>EvenExtractor</code>, with the same methods and attributes, but with a different name. Then we changed the behaviour of the new class but only for the extraction part by overriding the method.</p>
<p>To summarise the lesson: using classes and delegation you can build finite-state machines that are easily customizable to suit your exact needs. This obviously is just one of the many points of view under which you can consider classes, but it is the one we need to understand Django CBVs.</p>
<h2 id="back-to-django">Back to Django<a class="headerlink" href="#back-to-django" title="Permanent link">¶</a></h2>
<p>Let's start discussing a practical use of what we learned so far, reviewing how Django uses Python classes and delegation to provide views.</p>
<p>A Django view is a perfect example of a finite-state machine. It takes an incoming request and makes it flow through different processing steps until a final response is produced, which is then sent back to the user. CBVs are a way for the programmer to write their views leveraging the object-oriented paradigm. In this context Class-based Generic Views are the "batteries included" of Django views, the building blocks that the framework provides out of the box.</p>
<p>Let's dig into one of the examples of the official Django docs; <a href="https://docs.djangoproject.com/en/3.0/ref/class-based-views/generic-display/#django.views.generic.list.ListView">here</a> you find the API of the beloved <code>ListView</code>, a generic view to deal with a list of things (extracted from the database). I slightly simplified the example provided by the documentation to avoid having too much on our plate.</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">django.views.generic.list</span> <span class="kn">import</span> <span class="n">ListView</span>
<span class="kn">from</span> <span class="nn">articles.models</span> <span class="kn">import</span> <span class="n">Article</span>
<span class="k">class</span> <span class="nc">ArticleListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Article</span>
</code></pre></div>
<p>This example assumes that <code>articles</code> is your application and <code>Article</code> is one of its models.</p>
<p>You can see here the full power of inheritance. We just derived <code>ArticleListView</code> from <code>ListView</code>, and changed the <code>model</code> class attribute. How can this work? How can this class process incoming requests and what are the outputs? The official documentation states "While this view is executing, <code>object_list</code> will contain the list of objects (usually, but not necessarily a queryset) that the view is operating upon."; this leaves many dark corners, however, and if you are a novice, chances are that you are already lost.</p>
<p>Since <code>ArticleListView</code> derives from <code>ListView</code>, the latter is the class we have to analyse to understand how incoming data is processed. To do this you need to look at the <a href="https://docs.djangoproject.com/en/3.0/ref/class-based-views/generic-display/#listview">documentation</a>, and if something is still unclear you can freely look at the <a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/list.py">source code</a>. In the following paragraphs I will summarise what happens when Django calls the sample <code>ArticleListView</code> class shown above, and you will find links called "DOCS" for the official documentation, and "CODE" for the relevant source code, if you want to read it by yourself.</p>
<h2 id="url-dispatchers-and-views">URL dispatchers and views<a class="headerlink" href="#url-dispatchers-and-views" title="Permanent link">¶</a></h2>
<p>A CBV cannot directly be used in your URL dispatcher; instead you have to give the result of the <code>as_view</code> method (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L49">CODE</a>), which defines a function that instances the class (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L61">CODE</a>) and calls the <code>dispatch</code> method (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L71">CODE</a>); then the function is returned (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L81">CODE</a>) to be used in the URL dispatcher. As a user, we are interested only in the fact that the <em>entry point</em> of the class (the method called when a request hits the URL linked with it) is <code>dispatch</code>.</p>
<p>Let's use this knowledge to print out a string on the console each time a request is served by our CBV. I will run through this simple task step by step, since it shows exactly how you have to deal with CBVs when solving real problems.</p>
<p>If we define the <code>ArticleListView</code> class this way</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">django.views.generic.list</span> <span class="kn">import</span> <span class="n">ListView</span>
<span class="kn">from</span> <span class="nn">articles.models</span> <span class="kn">import</span> <span class="n">Article</span>
<span class="k">class</span> <span class="nc">ArticleListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Article</span>
<span class="k">def</span> <span class="nf">dispatch</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">dispatch</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
</code></pre></div>
<p>the class does not change its behaviour. What we did was to override the <code>dispatch</code> method with a call to the parent's method, i.e. we explicitly wrote what Python does by default. You can find detailed information about <code>super</code> in the <a href="https://docs.python.org/3/library/functions.html#super">official documentation</a> and in <a href="https://www.thedigitalcatonline.com/blog/2014/08/20/python-3-oop-part-3-delegation-composition-and-inheritance/">this post</a> on the blog. Please be sure you understand the star and double star notation to define variable number of arguments; the official documentation is <a href="https://docs.python.org/3.8/tutorial/controlflow.html#positional-or-keyword-arguments">here</a>.</p>
<p>Since views are automatically called by the framework, the latter expects them to comply with a very specific API, so when overriding a method you have to provide the same signature of the original one. The signature of <code>dispatch</code> can be found <a href="https://docs.djangoproject.com/en/3.0/ref/class-based-views/base/#django.views.generic.base.View.dispatch">here</a>.</p>
<p>The <code>dispatch</code> method receives a <code>request</code> argument, which type is <code>HttpRequest</code> (<a href="https://docs.djangoproject.com/en/3.0/ref/request-response/#httprequest-objects">documentation</a>), so we can print it on the console with the standard <code>print</code> function</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">django.views.generic.list</span> <span class="kn">import</span> <span class="n">ListView</span>
<span class="kn">from</span> <span class="nn">articles.models</span> <span class="kn">import</span> <span class="n">Article</span>
<span class="k">class</span> <span class="nc">ArticleListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Article</span>
<span class="k">def</span> <span class="nf">dispatch</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">dispatch</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
</code></pre></div>
<p>This prints the content of the <code>request</code> object on the standard output of the server that is running the Django project. If you are running the Django development server, you will find the output on the text console where you issued the command <code>python manage.py runserver</code>.</p>
<p>This, in a nutshell, is the standard way of dealing with Django CBGVs: inherit from a predefined class, identify which methods you need to change, override them complying with their signature and calling the parent's code somewhere in the new code.</p>
<p>The full list of methods <code>ListView</code> uses when processing incoming requests is listed on its <a href="https://docs.djangoproject.com/en/3.0/ref/class-based-views/generic-display/#listview">official documentation page</a> in the "Method Flowchart" section; in the "Ancestors (MRO)" section you can see that <code>ListView</code> inherits from a good number of other classes. MRO stands for Method Resolution Order and has to deal with multiple inheritance: if you are eager to deal with one of the most intricate Python topics feel free to read <a href="https://docs.python.org/3.8/tutorial/classes.html#multiple-inheritance">this</a>.</p>
<h2 id="incoming-get-requests">Incoming GET requests<a class="headerlink" href="#incoming-get-requests" title="Permanent link">¶</a></h2>
<p>Back to our <code>ArticleListView</code>. The <code>dispatch</code> method of the parent reads the <code>method</code> attribute of the <code>request</code> object and selects a handler to process the request itself (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L93">CODE</a>): this means that if <code>request.method</code> is <code>'GET'</code>, which is the HTTP way to say that we are <em>reading</em> a resource, <code>dispatch</code> will call the <code>get</code> method of the class.</p>
<p>The <code>get</code> method of <code>ListView</code> comes from its <code>BaseListView</code> ancestor (<a href="https://docs.djangoproject.com/en/3.0/ref/class-based-views/generic-display/#django.views.generic.list.BaseListView">DOCS</a>, <a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/list.py#L141">CODE</a>). As you can see, the function basically initializes the attribute <code>object_list</code> with the result of the call <code>get_queryset()</code>, creates a context calling the method <code>get_context_data</code> and calls <code>render_to_response</code>.</p>
<p>Are you still with me? Don't give up, we are almost done, at least with ListView. The method <code>get_queryset</code> comes from the <code>MultipleObjectMixin</code> ancestor of <code>ListView</code> (<a href="https://docs.djangoproject.com/en/3.0/ref/class-based-views/mixins-multiple-object/#multipleobjectmixin">DOCS</a>, <a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/list.py#L9">CODE</a>) and simply gets all objects of a given model (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/list.py#L33">CODE</a>) running <code>queryset = self.model._default_manager.all()</code>. The value of <code>model</code> is what we configured in our class when we wrote <code>model = Article</code>. I hope at this point something start to make sense in your head.</p>
<p>That's all, actually. Our <code>ArticleListView</code> class extracts all <code>Article</code> objects from the database, and calls a template passing a context that contains a single variable, <code>object_list</code>, instanced with the list of extracted objects.</p>
<h2 id="templates-and-contexts">Templates and contexts<a class="headerlink" href="#templates-and-contexts" title="Permanent link">¶</a></h2>
<p>Are you satisfied? I'm actually still curious about the template and the context. Let's see what we can find about these topics. First of all, when the class calls <code>render_to_response</code> it uses the code that comes from its <code>TemplateResponseMixin</code> ancestor (<a href="https://docs.djangoproject.com/en/3.0/ref/class-based-views/mixins-simple/#templateresponsemixin">DOCS</a>, <a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L117">CODE</a>); the method initialises the class <code>TemplateResponse</code> passing a template and a context. The template, through a series of calls which you can follow by yourself, comes from <code>template_name</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L150">CODE</a>); while <code>TemplateResponseMixin</code> initializes it as <code>None</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L119">CODE</a>), <code>ListView</code> performs some magic tricks through ancestors (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/list.py#L165">CODE</a>) to return a template which name derives from the given model. In short, our <code>ArticleListView</code>, defining an <code>Article</code> model, automatically uses a template that is called <code>article_list.html</code>.</p>
<p>May we change this behaviour? Of course! This is, after all, the point of using classes instead of functions: easily customisable behaviour. We can change the definition of our class to be</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">django.views.generic.list</span> <span class="kn">import</span> <span class="n">ListView</span>
<span class="kn">from</span> <span class="nn">articles.models</span> <span class="kn">import</span> <span class="n">Article</span>
<span class="k">class</span> <span class="nc">ArticleListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Article</span>
<span class="n">template_name</span> <span class="o">=</span> <span class="s1">'sometemplate.html'</span>
</code></pre></div>
<p>Let's review what this does step by step. When the response is created, Django runs the code of <code>render_to_response</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L124">CODE</a>), which in turn calls <code>get_template_names</code>. Pay attention that this method returns a list of names, as Django will use the first available among them, scanning them in order. This method is overridden in <code>ListView</code> by its superclass <code>MultipleObjectTemplateResponseMixin</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/list.py#L165">CODE</a>). This calls the same method of its own superclass <code>TemplateResponseMixin</code> (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L140">CODE</a>), which returns the attribute we set in the <code>ArticleListView</code> class (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/base.py#L150">CODE</a>). The mixing goes on and appends to the list the template file name derived from the model (<a href="https://github.com/django/django/blob/stable/3.0.x/django/views/generic/list.py#L181">CODE</a>) and finally returns the list, which at this point is <code>['sometemplate.html', 'article_list.html']</code>.</p>
<p>As for the context, remember that it is only a dictionary of values you want to be able to access when compiling the template. Variable names inside the context, data format, and data content are completely up to you. When using CBGVs, however, you will find in your context some variables that have been created by the ancestors of your view, as happens for <code>object_list</code>. What if you want to show a page with the list of all articles, but you want to add a value to the context?</p>
<p>Easy task: you just need to override the function that produces the context and change its behaviour. Say, for example, that we want to show the number of total readers of our site, along with the list of articles. Assuming that a <code>Reader</code> model is available we can write</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">django.views.generic.list</span> <span class="kn">import</span> <span class="n">ListView</span>
<span class="kn">from</span> <span class="nn">articles.models</span> <span class="kn">import</span> <span class="n">Article</span><span class="p">,</span> <span class="n">Reader</span>
<span class="k">class</span> <span class="nc">ArticleListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Article</span>
<span class="k">def</span> <span class="nf">get_context_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">context</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">get_context_data</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">context</span><span class="p">[</span><span class="s1">'readers'</span><span class="p">]</span> <span class="o">=</span> <span class="n">Reader</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
<span class="k">return</span> <span class="n">context</span>
</code></pre></div>
<p>As always, when overriding a method we need to ask ourselves if we need to call the original method. In this case, we want to merely augment the content of the context and not replace it, so we call <code>super().get_context_data(**kwargs)</code> first, and we add the value that we need to that. pay attention that this might not be always the case, as it depends on the logic of your override.</p>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>In this first post I tried to uncover some of the mysteries behind CBVs and CBGVs in Django, showing exactly what happens to a GET request that hits a class-based view. Hopefully the matter has now been demystified a little! In the next posts I will discuss <code>DetailView</code>, the generic view to show detail about an object, how to create custom CBVs, and how to use CBVs to process forms, i.e. accept POST requests.</p>
<h2 id="updates">Updates<a class="headerlink" href="#updates" title="Permanent link">¶</a></h2>
<p>2020-03-12: A global review of the post, which has been updated with the latest Django code (3.0)</p>
<p>2013-10-29: As pointed out by <a href="https://www.reddit.com/user/mbrochh">mbrochh</a> on Reddit, there is a very useful resource for Django programmers: <a href="http://ccbv.co.uk/">Classy Class-Based Views</a>. It is a comprensive index of all CBGVs with ancestors and method signatures. Make sure to have it in you Django bookmarks!</p>
<p>2013-10-29: I fixed a couple of typos when overriding <code>dispatch()</code>. Thanks to Tom Evans for spotting them.</p>
<p>2013-10-30: Fixed the <code>__init__()</code> method of <code>EvenExtractor</code>, that was missing the <code>self</code> parameter. Thanks <a href="https://www.reddit.com/user/meatypocket">meatypocket</a>.</p>
<p>2015-06-10: <a href="https://www.reddit.com/user/meatypocket">meatypocket</a> spotted a missing <code>return</code> in the <code>dispatch()</code> override. Thank you!</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>