The Digital Cat - Programminghttps://www.thedigitalcatonline.com/2024-02-14T18:00:00+01:00Adventures of a curious cat in the land of programmingA Rust to-do list CLI app - Part 12024-02-14T18:00:00+01:002024-02-14T18:00:00+01:00Leonardo Giordanitag:www.thedigitalcatonline.com,2024-02-14:/blog/2024/02/14/a-rust-to-do-list-cli-app-part-1/<p>Nothing yet</p><p>This blog was born as a place where I could share what I discovered while I was learning new technologies and concepts. Remaining faithful to this manifesto, I decided to start writing some posts about Rust, as I recently joined the vibrant community behind this language. My roots are in C and Assembly, so I feel at home with Rust that looks to me like a proper modern version of C. As such, I'm supremely interested in the ideas behind it, and in particular I want to focus on low-level data representation, structures, and memory usage.</p><p>After having read the manual and implemented several snippets, I decided to try to implement a more complete application and to annotate the journey here. While I was looking for a tutorial I found <a href="https://www.freecodecamp.org/news/how-to-build-a-to-do-app-with-rust/">this useful post</a> by Claudio Restifo, where he develops a simple to-do list management application.</p><p>So, I decided to implement the same using Claudio's solution when I was stuck or to compare his strategy with mine, as it is always extremely useful to see how another coder tackles certain challenges. Thanks Claudio! However, as I'm a big fan of TDD, I'd like to follow that approach, which is something that Claudio doesn't do in his post.</p><p>Please keep in mind that these are my first steps with the language, so consider what you read here as the work of a beginner (as I am, with this language). I'm more than happy to receive advice or corrections, so feel free to get in touch if you see anything that can be done in a better way. In the post, you will find annotations that highlight the major topics that I think a Rust programmer should be familiar with.</p><h2 id="requirements-dd57">Requirements<a class="headerlink" href="#requirements-dd57" title="Permanent link">¶</a></h2><p>The requirements I set for the application are:</p><ul><li>Manage a list of entries. Each entry can be in state "to be done" or "done".</li><li>Provide commands to view, add, delete and mark items as "done" or "to be done".</li><li>Can save and retrieve data from a file. The file has a default name that can be changed with an option.</li></ul><p>This is an extremely basic application, so the command line is: <code>todo [OPTIONS] COMMAND [KEY]</code>.</p><p>I expect the interaction with the tool to be something like</p><div class="code"><div class="content"><div class="highlight"><pre>$ todo list
# TO DO
* Write post
* Buy milk
* Have fun
# DONE
* Feed the cat
$ todo add "Update CV"
$ todo mark-done "Buy milk"
$ todo list
# TO DO
* Write post
* Have fun
* Update CV
# DONE
* Feed the cat
* Buy milk
</pre></div> </div> </div><h2 id="initial-setup-7d57">Initial setup<a class="headerlink" href="#initial-setup-7d57" title="Permanent link">¶</a></h2><p>Starting a new Rust project is extremely simple with Cargo:</p><div class="code"><div class="content"><div class="highlight"><pre><span class="cp">$</span><span class="w"> </span><span class="n">cargo</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="n">todo</span><span class="o">-</span><span class="n">cli</span>
</pre></div> </div> </div><p>This will create the required structure in a new directory and create two files: <code>Cargo.toml</code> and <code>src/main.rs</code>. The latter will contain some placeholder code that we can use to check our setup</p><div class="code"><div class="title"><code>main.rs</code></div><div class="content"><div class="highlight"><pre><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">"Hello, world!"</span><span class="p">);</span>
<span class="p">}</span>
</pre></div> </div> </div><p>We can run the code with <code>cargo run</code> or build it into a stand-alone executable with <code>cargo build</code>. This will compile the code in debug mode by default and put the executable in the directory <code>target/debug</code>. Cargo can be used to run tests as well (<code>cargo test</code>).</p><div class="infobox"><i class="fa fa-"></i><div class="title">Cargo</div><div><p>Cargo [<a href="https://doc.rust-lang.org/cargo">docs</a>] is the Rust package manager and the default solution to manage dependencies, compile packages and in general to manage your code. It's highly recommended to learn at least the basics of this powerful tool.</p></div></div><div class="admonition"><i class="fa fa-github"></i><div class="content"><div><a href="https://github.com/lgiordani/rust-todo-cli/tree/a7c4616179fdcce0b5bc6010402c1d914ac1587b">See the source code</a></div></div></div><h2 id="cli-management-3bb6">CLI management<a class="headerlink" href="#cli-management-3bb6" title="Permanent link">¶</a></h2><p>Command line interfaces are typically not part of the classic TDD cycle, as they should be part of integration tests. Now, the definition that the Rust community uses for <em>integration tests</em> is</p><blockquote><p>Integration tests are external to your crate and use only its public interface in the same way any other code would. Their purpose is to test that many parts of your library work correctly together.</p><cite></cite></blockquote><p>So, the <em>integration</em> they consider here is that between multiple parts of a library. What I am referring to here is more properly <em>system integration tests</em>, where we test the public interface of a whole tool. Long story short, I will not write tests for the CLI commands.</p><p>In the aforementioned post, Claudio Restifo suggests we can read command line arguments using <code>std::env::args()</code> directly with something like</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">action</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span>::<span class="n">env</span>::<span class="n">args</span><span class="p">().</span><span class="n">nth</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="n">expect</span><span class="p">(</span><span class="s">"Please specify an action"</span><span class="p">);</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">item</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span>::<span class="n">env</span>::<span class="n">args</span><span class="p">().</span><span class="n">nth</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="n">expect</span><span class="p">(</span><span class="s">"Please specify an item"</span><span class="p">);</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">"{:?}, {:?}"</span><span class="p">,</span><span class="w"> </span><span class="n">action</span><span class="p">,</span><span class="w"> </span><span class="n">item</span><span class="p">);</span>
<span class="p">}</span>
</pre></div> </div> </div><div class="infobox"><i class="fa fa-"></i><div class="title">Modules</div><div><p>In Rust a module can be used directly as long as it is part of the current project. The standard library is clearly visible by default, while other modules have to be declared in the file <code>Cargo.toml</code>. It is then perfectly acceptable to write <code>let action = std::env::args()...</code>.</p>
<p><a href="https://doc.rust-lang.org/reference/items/use-declarations.html">Use declarations</a>, however, can import other modules into the current namespace, to make the code more readable.</p></div></div><p>The method <code>nth</code> [<a href="https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.nth">docs</a>] returns (not too surprisingly) the nth element of an iterator.</p><div class="infobox"><i class="fa fa-"></i><div class="title">Iterators</div><div><p>The Rust documentation contains a very useful section on iterators [<a href="https://doc.rust-lang.org//std/iter/index.html">docs</a>].</p>
<p>The function <code>std::env::args</code> [<a href="https://doc.rust-lang.org//std/env/fn.args.html">docs</a>] (used to access the command line arguments in the traditional Unix fashion) returns <code>Args</code> [<a href="https://doc.rust-lang.org//std/env/struct.Args.html">docs</a>], which implements the trait <code>Iterator</code>.</p>
<p>As it happens in object-oriented programming languages (which Rust is not), the expression "implements an interface" is often simplified to "is". So, colloquially speaking, we can say that <code>std::env::Args</code> is an <code>Iterator</code> [<a href="https://doc.rust-lang.org//std/iter/trait.Iterator.html">docs</a>].</p></div></div><p>The prototype of <code>nth</code> is</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">fn</span> <span class="nf">nth</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span><span class="w"> </span><span class="bp">self</span><span class="p">,</span><span class="w"> </span><span class="n">n</span>: <span class="kt">usize</span><span class="p">)</span><span class="w"> </span>-> <span class="nb">Option</span><span class="o"><</span><span class="bp">Self</span>::<span class="n">Item</span><span class="o">></span>
</pre></div> </div> </div><p>and it mentions <code>Option<Self::Item></code> as the return type. The type <code>Option</code> provides a method <code>expect</code> [<a href="https://doc.rust-lang.org//std/option/enum.Option.html#method.expect">docs</a>] that returns either the content of the <code>Some</code> value or panics, printing the given message in the backtrace.</p><div class="infobox"><i class="fa fa-"></i><div class="title">Option</div><div><p><code>Option</code> [<a href="https://doc.rust-lang.org/std/option/index.html">docs</a>] and <code>Result</code> [<a href="https://doc.rust-lang.org/std/result/index.html">docs</a>] are a versatile way to manage optional results (either something or nothing) and results (either something good or an error), and are among the most important structures to learn in Rust.</p></div></div><p>Running the code above with <code>cargo run</code> produces the following output, where we can see the message set by the first call to <code>expect</code>.</p><div class="code"><div class="content"><div class="highlight"><pre> Finished dev [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/todo-cli`
thread 'main' panicked at src/main.rs:80:42:
<span class="hll">Please specify an action
</span>note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
</pre></div> </div> </div><h2 id="better-cli-management-with-clap-74e2">Better CLI management with Clap<a class="headerlink" href="#better-cli-management-with-clap-74e2" title="Permanent link">¶</a></h2><p><a href="https://docs.rs/clap/latest/clap/">Clap</a> stands for <strong>C</strong>ommand <strong>L</strong>ine <strong>A</strong>rgument <strong>P</strong>arser and is a nice crate that simplifies the creation of advanced command line interfaces. I installed it using</p><div class="code"><div class="content"><div class="highlight"><pre>$ cargo add clap --features derive
</pre></div> </div> </div><p>as detailed in the documentation and my code is now</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">use</span><span class="w"> </span><span class="n">clap</span>::<span class="n">Parser</span><span class="p">;</span>
<span class="cp">#[derive(Parser)]</span>
<span class="k">struct</span> <span class="nc">Cli</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">command</span>: <span class="nb">String</span><span class="p">,</span>
<span class="w"> </span><span class="n">key</span>: <span class="nb">String</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Cli</span>::<span class="n">parse</span><span class="p">();</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">"Command line: {} {}"</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">command</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">key</span><span class="p">);</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Clap allows me to add long and short options as well, so later I will use it to specify the database file name. For now, however, this is enough.</p><div class="infobox"><i class="fa fa-"></i><div class="title">derive</div><div><p>The attribute <code>derive</code> [<a href="https://doc.rust-lang.org/reference/attributes/derive.html">docs</a>] is another cornerstone of the language and is used everywhere. The machinery behind it is not trivial, but I recommend getting used to the syntax and the standard use cases.</p></div></div><div class="admonition"><i class="fa fa-github"></i><div class="content"><div><a href="https://github.com/lgiordani/rust-todo-cli/tree/b85162745b90e96a58ee58a5a7306e599aa56fa1">See the source code</a></div></div></div><h2 id="a-simple-list-of-elements-2dfb">A simple list of elements<a class="headerlink" href="#a-simple-list-of-elements-2dfb" title="Permanent link">¶</a></h2><p>From Claudio's post I got the idea of using a hash map for the list of items. That's a simple and effective solution, in particular given the fact that Rust provides the collection type out of the box.</p><p>As I want to use TDD, I begin with a test. In Rust, we put tests and code in the same file (but for integration tests between modules), so I can write a simple test at the bottom of the file to check that a <code>TodoList</code> type exists and can be initialised.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="cp">#[cfg(test)]</span>
<span class="k">mod</span> <span class="nn">tests</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">use</span><span class="w"> </span><span class="k">super</span>::<span class="o">*</span><span class="p">;</span>
<span class="w"> </span><span class="cp">#[test]</span>
<span class="w"> </span><span class="k">fn</span> <span class="nf">init_todo</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">todo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">TodoList</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><div class="infobox"><i class="fa fa-"></i><div class="title">TDD</div><div><p>TDD is one of my favourite methodologies and I'm happy to see that Rust allows me to follow it. I can't recommend TDD enough! The Rust book contains <a href="https://doc.rust-lang.org/book/ch11-01-writing-tests.html">a pretty detailed chapter</a> on how to write tests.</p></div></div><p>Clearly, when I run <code>cargo test</code> I get a compile error. Let's implement the type then</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">use</span><span class="w"> </span><span class="n">std</span>::<span class="n">collections</span>::<span class="n">HashMap</span><span class="p">;</span>
<span class="p">[</span><span class="o">..</span><span class="p">.]</span>
<span class="k">struct</span> <span class="nc">TodoList</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// true = to do, false = done</span>
<span class="w"> </span><span class="n">items</span>: <span class="nc">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="o">></span><span class="p">,</span>
<span class="p">}</span>
</pre></div> </div> </div><p>As you see, I had to write a comment as a reminder of the meaning of the boolean values. I also suspect that I will need to use the type <code>HashMap<String, bool></code> multiple times, so I will probably end up creating a type alias of some sort.</p><p>To initialise such structure I have to create an implementation of the function <code>new</code></p><div class="code"><div class="title">Version 1</div><div class="content"><div class="highlight"><pre><span class="k">impl</span><span class="w"> </span><span class="n">TodoList</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">fn</span> <span class="nf">new</span><span class="p">()</span><span class="w"> </span>-> <span class="nc">TodoList</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">items</span>: <span class="nc">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="o">></span><span class="w"> </span><span class="o">=</span>
<span class="w"> </span><span class="n">HashMap</span>::<span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="o">></span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="n">TodoList</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">items</span>: <span class="nc">items</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><div class="infobox"><i class="fa fa-"></i><div class="title">struct and impl</div><div><p>Rust is not an object-oriented programming language, so it uses plain structs to encapsulate data. The Rust book has <a href="https://doc.rust-lang.org/book/ch05-01-defining-structs.html">a full chapter</a> on <code>struct</code> and <code>impl</code>.</p></div></div><p>Thanks to type inference, the explicit definition of types after the call to <code>HashMap</code> is not needed and I can write</p><div class="code"><div class="title">Version 2</div><div class="content"><div class="highlight"><pre><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">items</span>: <span class="nc">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">HashMap</span>::<span class="n">new</span><span class="p">();</span>
</pre></div> </div> </div><p>or</p><div class="code"><div class="title">Version 3</div><div class="content"><div class="highlight"><pre><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">items</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">HashMap</span>::<span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="o">></span>::<span class="n">new</span><span class="p">();</span>
</pre></div> </div> </div><p>For such a simple initialisation, I might also write directly</p><div class="code"><div class="title">Version 4</div><div class="content"><div class="highlight"><pre><span class="w"> </span><span class="n">TodoList</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">items</span>: <span class="nc">HashMap</span>::<span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="o">></span>::<span class="n">new</span><span class="p">(),</span>
<span class="w"> </span><span class="p">}</span>
</pre></div> </div> </div><p>However, I will soon replace the <code>::new()</code> with something more complicated that reads a file, so I decided to keep version 2. This code passes the test I wrote, so it's good enough for now.</p><p>At this point I can also initialise the list in the main function</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">struct</span> <span class="nc">TodoList</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// true = to do, false = done</span>
<span class="w"> </span><span class="n">items</span>: <span class="nc">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="o">></span><span class="p">,</span>
<span class="p">}</span>
<span class="k">impl</span><span class="w"> </span><span class="n">TodoList</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">fn</span> <span class="nf">new</span><span class="p">()</span><span class="w"> </span>-> <span class="nc">TodoList</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">items</span>: <span class="nc">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">HashMap</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="n">TodoList</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">items</span>: <span class="nc">items</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Cli</span>::<span class="n">parse</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">todo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">TodoList</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">"Command line: {} {}"</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">command</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">key</span><span class="p">);</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Please note that I'm not being too strict with dead code here and the compile will complain about unused variables and fields. I like this, and I won't add underscores to silence the warnings since they are a good reminder of what I still have to implement.</p><div class="admonition"><i class="fa fa-github"></i><div class="content"><div><a href="https://github.com/lgiordani/rust-todo-cli/tree/0e49a12d38b1c3a5de9332507608b2c0c3f17ad8">See the source code</a></div></div></div><h2 id="adding-items-ea14">Adding items<a class="headerlink" href="#adding-items-ea14" title="Permanent link">¶</a></h2><p>A good improvement at this point would be to create a method to add items to the list. First, the mandatory test</p><div class="code"><div class="content"><div class="highlight"><pre><span class="w"> </span><span class="cp">#[test]</span>
<span class="w"> </span><span class="k">fn</span> <span class="nf">add_item</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">todo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">TodoList</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">));</span>
<span class="w"> </span><span class="fm">assert_eq!</span><span class="p">(</span><span class="n">todo</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">),</span><span class="w"> </span><span class="nb">Some</span><span class="p">(</span><span class="o">&</span><span class="kc">true</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
</pre></div> </div> </div><p>The type <code>HashMap</code> provides a method called <code>insert</code> [<a href="https://doc.rust-lang.org/std/collections/struct.HashMap.html#method.insert">docs</a>] which is exactly what I need</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">impl</span><span class="w"> </span><span class="n">TodoList</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">..</span><span class="p">.</span>
<span class="w"> </span><span class="k">fn</span> <span class="nf">add</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span><span class="w"> </span><span class="bp">self</span><span class="p">,</span><span class="w"> </span><span class="n">key</span>: <span class="nb">String</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="bp">self</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>And once again this code passes the test, so I consider it good enough.</p><div class="infobox"><i class="fa fa-"></i><div class="title">self and Self</div><div><p>In Rust <code>self</code> is a keyword [<a href="https://doc.rust-lang.org/std/keyword.self.html">docs</a>] and not just a name as it happens in Python. Rust considers <code>self</code> of type <code>Self</code> [<a href="https://doc.rust-lang.org/std/keyword.SelfTy.html">docs</a>], which is the type we are implementing in a <code>trait</code> or <code>impl</code> block.</p>
<p>The code <code>fn add(&mut self, key: String) {</code> above is equivalent to <code>fn add(self: &mut Self, key: String) {</code>. However, <code>self</code> cannot be renamed to something like <code>foo</code>, as Rust is expecting a parameter with that specific name.</p></div></div><div class="infobox"><i class="fa fa-"></i><div class="title">References and mutability</div><div><p>I found confusing, at first, that in Rust we usually call <code>&mut</code> a <em>mutable reference</em>. In my head, I always translate it into a <em>reference to mutable data</em> as this helps me to remember what I am doing here.</p>
<p>In short, in Rust we need to declare explicitly when we intend to consider a value mutable using the keyword <code>mut</code> [<a href="https://doc.rust-lang.org/std/keyword.mut.html">docs</a>], and this is valid also when we pass arguments to functions. If we decide to borrow data instead of moving it, we can use <em>references</em>, that in C terms are equivalent to protected pointers. We can also pass a reference to data that we intend to mutate, which is where <code>&mut</code> comes into play.</p>
<p>However, as I mentioned I think it's important to understand that the reference (a pointer) is not mutating. The data referenced by it is.</p></div></div><div class="admonition"><i class="fa fa-github"></i><div class="content"><div><a href="https://github.com/lgiordani/rust-todo-cli/tree/320c31afba90c63e0482e4442d580d7de1fcfbd4">See the source code</a></div></div></div><h2 id="multiple-additions-and-updates-a15c">Multiple additions and updates<a class="headerlink" href="#multiple-additions-and-updates-a15c" title="Permanent link">¶</a></h2><p>If I add the same key multiple times I want the list to contain only one occurrence, so I test this.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="w"> </span><span class="cp">#[test]</span>
<span class="w"> </span><span class="k">fn</span> <span class="nf">add_item_already_exist</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">todo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">TodoList</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">));</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">));</span>
<span class="w"> </span><span class="fm">assert_eq!</span><span class="p">(</span><span class="n">todo</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">),</span><span class="w"> </span><span class="nb">Some</span><span class="p">(</span><span class="o">&</span><span class="kc">true</span><span class="p">));</span>
<span class="w"> </span><span class="fm">assert_eq!</span><span class="p">(</span><span class="n">todo</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="p">(),</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div> </div> </div><p>The test passes already, thanks to the properties of the hash map.</p><p>I also want the second insertion not to update the value of the existing element, and in this case the test is</p><div class="code"><div class="content"><div class="highlight"><pre><span class="w"> </span><span class="cp">#[test]</span>
<span class="w"> </span><span class="k">fn</span> <span class="nf">add_item_does_not_change_value</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">todo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">TodoList</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">));</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nb">Some</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">get_mut</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">*</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">));</span>
<span class="w"> </span><span class="fm">assert_eq!</span><span class="p">(</span><span class="n">todo</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">),</span><span class="w"> </span><span class="nb">Some</span><span class="p">(</span><span class="o">&</span><span class="kc">false</span><span class="p">));</span>
<span class="w"> </span><span class="fm">assert_eq!</span><span class="p">(</span><span class="n">todo</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="p">(),</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div> </div> </div><p>I have to manually change the value inside the map using <code>get_mut</code> [<a href="https://doc.rust-lang.org/std/collections/struct.HashMap.html#method.get_mut">docs</a>] that returns a mutable reference to the value. This test doesn't pass, as <code>insert</code> actually updates the existing value.</p><p>At the time of writing the method <code>try_insert</code> of <code>HashMap</code> is experimental, so I implemented a custom solution</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">use</span><span class="w"> </span><span class="n">std</span>::<span class="n">collections</span>::<span class="n">hash_map</span>::<span class="n">Entry</span><span class="p">;</span>
<span class="p">[</span><span class="o">..</span><span class="p">.]</span>
<span class="w"> </span><span class="k">fn</span> <span class="nf">add</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span><span class="w"> </span><span class="bp">self</span><span class="p">,</span><span class="w"> </span><span class="n">key</span>: <span class="nb">String</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">Entry</span>::<span class="n">Vacant</span><span class="p">(</span><span class="n">entry</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="bp">self</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">entry</span><span class="p">(</span><span class="n">key</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">entry</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div> </div> </div><p>Here, I'm basically checking if an entry for <code>key</code> is vacant (does not exist) and I create it only in that case. This code passes all tests.</p><div class="infobox"><i class="fa fa-"></i><div class="title">if let</div><div><p>I consider <code>if let</code> [<a href="https://doc.rust-lang.org/book/ch06-03-if-let.html">docs</a>] a very powerful piece of syntax. I care only about one of the possible outcomes, so I don't want to waste time defining it in a full-fledged <code>match</code>.</p></div></div><div class="admonition"><i class="fa fa-github"></i><div class="content"><div><a href="https://github.com/lgiordani/rust-todo-cli/tree/e0ee0b9fdb14e2d7f4d0a5022718d5c7923249e6">See the source code</a></div></div></div><h2 id="marking-items-9ca3">Marking items<a class="headerlink" href="#marking-items-9ca3" title="Permanent link">¶</a></h2><p>The second method I want to add is <code>mark</code> that allows me to set the value of the boolean corresponding to a given key. This will be used to flag an item as "done" or "to be done". The test is</p><div class="code"><div class="content"><div class="highlight"><pre><span class="w"> </span><span class="cp">#[test]</span>
<span class="w"> </span><span class="k">fn</span> <span class="nf">mark_item</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">todo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">TodoList</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">));</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">mark</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">),</span><span class="w"> </span><span class="kc">false</span><span class="p">);</span>
<span class="w"> </span><span class="fm">assert_eq!</span><span class="p">(</span><span class="n">todo</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">),</span><span class="w"> </span><span class="nb">Some</span><span class="p">(</span><span class="o">&</span><span class="kc">false</span><span class="p">))</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">mark</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">),</span><span class="w"> </span><span class="kc">true</span><span class="p">);</span>
<span class="w"> </span><span class="fm">assert_eq!</span><span class="p">(</span><span class="n">todo</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">),</span><span class="w"> </span><span class="nb">Some</span><span class="p">(</span><span class="o">&</span><span class="kc">true</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
</pre></div> </div> </div><p>Here, I can follow the same strategy I used in the test <code>add_item_does_not_change_value</code></p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">impl</span><span class="w"> </span><span class="n">TodoList</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">..</span><span class="p">.</span>
<span class="w"> </span><span class="k">fn</span> <span class="nf">mark</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span><span class="w"> </span><span class="bp">self</span><span class="p">,</span><span class="w"> </span><span class="n">key</span>: <span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="n">value</span>: <span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nb">Some</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="bp">self</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">get_mut</span><span class="p">(</span><span class="o">&</span><span class="n">key</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">*</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>What if the key is not in the list, though? The function <code>get_mut</code> returns an Option, but <code>mark</code> should signal with a <code>Result</code> that something didn't work. I can test this with</p><div class="code"><div class="content"><div class="highlight"><pre><span class="w"> </span><span class="cp">#[test]</span>
<span class="w"> </span><span class="k">fn</span> <span class="nf">mark_item_does_not_exist</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">todo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">TodoList</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="fm">assert_eq!</span><span class="p">(</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">mark</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">),</span><span class="w"> </span><span class="kc">false</span><span class="p">),</span>
<span class="w"> </span><span class="nb">Err</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">))</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div> </div> </div><p>The new version of the function is then</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">impl</span><span class="w"> </span><span class="n">TodoList</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">..</span><span class="p">.</span>
<span class="w"> </span><span class="k">fn</span> <span class="nf">mark</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span><span class="w"> </span><span class="bp">self</span><span class="p">,</span><span class="w"> </span><span class="n">key</span>: <span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="n">value</span>: <span class="kt">bool</span><span class="p">)</span><span class="w"> </span>-> <span class="nb">Result</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="nb">String</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="bp">self</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">get_mut</span><span class="p">(</span><span class="o">&</span><span class="n">key</span><span class="p">).</span><span class="n">ok_or</span><span class="p">(</span><span class="o">&</span><span class="n">key</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
<span class="w"> </span><span class="o">*</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">;</span>
<span class="w"> </span><span class="nb">Ok</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>The method <code>ok_or</code> [<a href="https://doc.rust-lang.org/std/option/enum.Option.html#method.ok_or">docs</a>] converts an <code>Option</code> into a <code>Result</code>, so I just call <code>?</code> to propagate the error.</p><div class="infobox"><i class="fa fa-"></i><div class="title">The question mark operator</div><div><p>The operator <code>?</code> is one of the best features of Rust, and it's explained in <a href="https://doc.rust-lang.org/book/ch09-02-recoverable-errors-with-result.html">this chapter</a> of the Rust Book. I find it such a simple yet extremely powerful way to deal with error propagation.</p></div></div><div class="admonition"><i class="fa fa-github"></i><div class="content"><div><a href="https://github.com/lgiordani/rust-todo-cli/tree/15179a8821f73ea5479d7bde1060126158e5eb7e">See the source code</a></div></div></div><h2 id="listing-items-449d">Listing items<a class="headerlink" href="#listing-items-449d" title="Permanent link">¶</a></h2><p>At this point I want to add the method <code>list</code> that allows me to see the items contained in <code>TodoList</code>. I'd like to separate the logic from the presentation so the method will return two lists of items, one for each value of the connected boolean.</p><p>This means that the output of the method should in my opinion be a tuple of iterators, one on the items with state "to be done" and one on the ones in state "done".</p><div class="infobox"><i class="fa fa-"></i><div class="title">Iterators</div><div><p>Iterators are a big thing in Rust, and I can understand why as they definitely boost performances saving memory. The Rust book has <a href="https://doc.rust-lang.org/book/ch13-02-iterators.html">a chapter</a> on them, and there is clearly plenty of documentation for the <a href="https://doc.rust-lang.org/std/iter/trait.Iterator.html">relative trait</a>.</p></div></div><p>I start with tests as usual</p><div class="code"><div class="content"><div class="highlight"><pre><span class="w"> </span><span class="cp">#[test]</span>
<span class="w"> </span><span class="k">fn</span> <span class="nf">list_items</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">todo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">TodoList</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">));</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something else to do"</span><span class="p">));</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something done"</span><span class="p">));</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">mark</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something done"</span><span class="p">),</span><span class="w"> </span><span class="kc">false</span><span class="p">);</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="p">(</span><span class="n">todo_items</span><span class="p">,</span><span class="w"> </span><span class="n">done_items</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">list</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">todo_items</span>: <span class="nb">Vec</span><span class="o"><</span><span class="nb">String</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">todo_items</span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="o">|</span><span class="n">x</span><span class="o">|</span><span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">clone</span><span class="p">()).</span><span class="n">collect</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">done_items</span>: <span class="nb">Vec</span><span class="o"><</span><span class="nb">String</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">done_items</span><span class="p">.</span><span class="n">map</span><span class="p">(</span><span class="o">|</span><span class="n">x</span><span class="o">|</span><span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">clone</span><span class="p">()).</span><span class="n">collect</span><span class="p">();</span>
<span class="w"> </span><span class="fm">assert!</span><span class="p">(</span><span class="n">todo_items</span><span class="p">.</span><span class="n">iter</span><span class="p">().</span><span class="n">any</span><span class="p">(</span><span class="o">|</span><span class="n">e</span><span class="o">|</span><span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"Something to do"</span><span class="p">));</span>
<span class="w"> </span><span class="fm">assert!</span><span class="p">(</span><span class="n">todo_items</span><span class="p">.</span><span class="n">contains</span><span class="p">(</span><span class="o">&</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something else to do"</span><span class="p">)));</span>
<span class="w"> </span><span class="fm">assert_eq!</span><span class="p">(</span><span class="n">todo_items</span><span class="p">.</span><span class="n">len</span><span class="p">(),</span><span class="w"> </span><span class="mi">2</span><span class="p">);</span>
<span class="w"> </span><span class="fm">assert!</span><span class="p">(</span><span class="n">done_items</span><span class="p">.</span><span class="n">contains</span><span class="p">(</span><span class="o">&</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something done"</span><span class="p">)));</span>
<span class="w"> </span><span class="fm">assert_eq!</span><span class="p">(</span><span class="n">done_items</span><span class="p">.</span><span class="n">len</span><span class="p">(),</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div> </div> </div><p>There is a lot to say here, and please remember the caveat that I'm not sure what I'm doing is the best thing.</p><p>I add some elements to the list and mark one as done, then I call the method <code>list</code> to get two iterators and test them. However, iterators can be traversed only once, so to test them properly I prefer to convert them into vectors using the method <code>collect</code> [<a href="https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.collect">docs</a>].</p><p>To generate the two iterators I will probably use <code>HashMap::iter</code> [<a href="https://doc.rust-lang.org/std/collections/struct.HashMap.html#method.iter">docs</a>], which means they will have an element type <code>&String</code>, as we are interested in the item key.</p><p>As far as I can tell, there are several different strategies I can use here.</p><p>I can generate vectors of <code>&String</code> using the elements directly from the iterators and then use the method <code>Vec::contains</code> [<a href="https://doc.rust-lang.org/std/vec/struct.Vec.html#method.contains">docs</a>]. However, the latter wants to receive a reference to the searched value, which means that I would end up with</p><div class="code"><div class="content"><div class="highlight"><pre><span class="kd">let</span><span class="w"> </span><span class="n">todo_items</span>: <span class="nb">Vec</span><span class="o"><&</span><span class="nb">String</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">todo_items</span><span class="p">.</span><span class="n">collect</span><span class="p">();</span>
<span class="fm">assert!</span><span class="p">(</span><span class="n">todo_items</span><span class="p">.</span><span class="n">contains</span><span class="p">(</span><span class="o">&&</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something else to do"</span><span class="p">)));</span>
</pre></div> </div> </div><p>While this is perfectly reasonable in terms of memory consumption and performances, the double <code>&&</code> is a bit ugly. So, considering that I'm writing a test, where performances are not the major concern, I'd prefer to simplify the syntax. I can create a vector of <code>String</code> values and check them</p><div class="code"><div class="content"><div class="highlight"><pre><span class="kd">let</span><span class="w"> </span><span class="n">todo_items</span>: <span class="nb">Vec</span><span class="o"><</span><span class="nb">String</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">todo_items</span><span class="p">.</span><span class="n">cloned</span><span class="p">().</span><span class="n">collect</span><span class="p">();</span>
<span class="fm">assert!</span><span class="p">(</span><span class="n">todo_items</span><span class="p">.</span><span class="n">contains</span><span class="p">(</span><span class="o">&</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something else to do"</span><span class="p">)));</span>
</pre></div> </div> </div><p>The syntax <code>todo_items.cloned()</code> is equivalent to <code>todo_items.map(|x| x.clone())</code> and leverages the implicit dereferencing of <code>x</code>. Here, <code>copied()</code> cannot be used as <code>String</code> doesn't implement the trait <code>Copy</code>.</p><p>A good alternative to <code>contains</code> is <code>any</code>, which however works on iterators. A final version of the code is then</p><div class="code"><div class="content"><div class="highlight"><pre><span class="kd">let</span><span class="w"> </span><span class="n">todo_items</span>: <span class="nb">Vec</span><span class="o"><</span><span class="nb">String</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">todo_items</span><span class="p">.</span><span class="n">cloned</span><span class="p">().</span><span class="n">collect</span><span class="p">();</span>
<span class="fm">assert!</span><span class="p">(</span><span class="n">todo_items</span><span class="p">.</span><span class="n">iter</span><span class="p">().</span><span class="n">any</span><span class="p">(</span><span class="o">|</span><span class="n">e</span><span class="o">|</span><span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"Something to do"</span><span class="p">));</span>
</pre></div> </div> </div><p>Which is also more elegant since it uses the comparison between a <code>String</code> (which is the iterator item type) and an <code>&str</code> (the right side). At this point my test is</p><div class="code"><div class="content"><div class="highlight"><pre><span class="cp">#[test]</span>
<span class="k">fn</span> <span class="nf">list_items</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">todo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">TodoList</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">));</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something else to do"</span><span class="p">));</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something done"</span><span class="p">));</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">mark</span><span class="p">(</span><span class="nb">String</span>::<span class="n">from</span><span class="p">(</span><span class="s">"Something done"</span><span class="p">),</span><span class="w"> </span><span class="kc">false</span><span class="p">);</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="p">(</span><span class="n">todo_items</span><span class="p">,</span><span class="w"> </span><span class="n">done_items</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">list</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">todo_items</span>: <span class="nb">Vec</span><span class="o"><</span><span class="nb">String</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">todo_items</span><span class="p">.</span><span class="n">cloned</span><span class="p">().</span><span class="n">collect</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">done_items</span>: <span class="nb">Vec</span><span class="o"><</span><span class="nb">String</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">done_items</span><span class="p">.</span><span class="n">cloned</span><span class="p">().</span><span class="n">collect</span><span class="p">();</span>
<span class="w"> </span><span class="fm">assert!</span><span class="p">(</span><span class="n">todo_items</span><span class="p">.</span><span class="n">iter</span><span class="p">().</span><span class="n">any</span><span class="p">(</span><span class="o">|</span><span class="n">e</span><span class="o">|</span><span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"Something to do"</span><span class="p">));</span>
<span class="w"> </span><span class="fm">assert!</span><span class="p">(</span><span class="n">todo_items</span><span class="p">.</span><span class="n">iter</span><span class="p">().</span><span class="n">any</span><span class="p">(</span><span class="o">|</span><span class="n">e</span><span class="o">|</span><span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"Something else to do"</span><span class="p">));</span>
<span class="w"> </span><span class="fm">assert_eq!</span><span class="p">(</span><span class="n">todo_items</span><span class="p">.</span><span class="n">len</span><span class="p">(),</span><span class="w"> </span><span class="mi">2</span><span class="p">);</span>
<span class="w"> </span><span class="fm">assert!</span><span class="p">(</span><span class="n">done_items</span><span class="p">.</span><span class="n">iter</span><span class="p">().</span><span class="n">any</span><span class="p">(</span><span class="o">|</span><span class="n">e</span><span class="o">|</span><span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"Something done"</span><span class="p">));</span>
<span class="w"> </span><span class="fm">assert_eq!</span><span class="p">(</span><span class="n">done_items</span><span class="p">.</span><span class="n">len</span><span class="p">(),</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
</pre></div> </div> </div><p>An implementation of the method <code>list</code> that passes this test is</p><div class="code"><div class="content"><div class="highlight"><pre><span class="w"> </span><span class="k">fn</span> <span class="nf">list</span><span class="p">(</span><span class="o">&</span><span class="bp">self</span><span class="p">)</span><span class="w"> </span>->
<span class="p">(</span><span class="k">impl</span><span class="w"> </span><span class="nb">Iterator</span><span class="o"><</span><span class="n">Item</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&</span><span class="nb">String</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="k">impl</span><span class="w"> </span><span class="nb">Iterator</span><span class="o"><</span><span class="n">Item</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&</span><span class="nb">String</span><span class="o">></span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="bp">self</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">iter</span><span class="p">().</span><span class="n">filter</span><span class="p">(</span><span class="o">|</span><span class="n">x</span><span class="o">|</span><span class="w"> </span><span class="o">*</span><span class="n">x</span><span class="p">.</span><span class="mi">1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">true</span><span class="p">).</span><span class="n">map</span><span class="p">(</span><span class="o">|</span><span class="n">x</span><span class="o">|</span><span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="mi">0</span><span class="p">),</span>
<span class="w"> </span><span class="bp">self</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">iter</span><span class="p">().</span><span class="n">filter</span><span class="p">(</span><span class="o">|</span><span class="n">x</span><span class="o">|</span><span class="w"> </span><span class="o">*</span><span class="n">x</span><span class="p">.</span><span class="mi">1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">false</span><span class="p">).</span><span class="n">map</span><span class="p">(</span><span class="o">|</span><span class="n">x</span><span class="o">|</span><span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="mi">0</span><span class="p">),</span>
<span class="w"> </span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
</pre></div> </div> </div><p>Here, the powerful keyword <code>impl</code> declares that whatever comes out of that function implements the <code>Iterator</code> trait with an element type <code>String</code>. The code uses <code>iter</code> [<a href="https://doc.rust-lang.org/std/collections/struct.HashMap.html#method.iter">docs</a>] to create an iterator on the elements of the hash map (element type <code>(&String, &bool)</code>, then uses <code>map</code> [<a href="https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.map">docs</a>] to extract the first element of each tuple. All in all, the function returns a tuple of <code>Map</code> [<a href="https://doc.rust-lang.org/std/iter/struct.Map.html">docs</a>] which is a type that implements <code>Iterator</code>.</p><div class="admonition"><i class="fa fa-github"></i><div class="content"><div><a href="https://github.com/lgiordani/rust-todo-cli/tree/40293e0aab286505e67f3932c16c1367bf92560c">See the source code</a></div></div></div><h2 id="exposing-commands-on-the-cli-4eb5">Exposing commands on the CLI<a class="headerlink" href="#exposing-commands-on-the-cli-4eb5" title="Permanent link">¶</a></h2><p>It's time to expose the methods I implemented on the CLI. I realised that commands like <code>add</code> and <code>mark-done</code> require a second argument (the key), other commands like <code>list</code> don't.</p><p>So, the first change is to make the key argument optional.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="cp">#[derive(Parser)]</span>
<span class="k">struct</span> <span class="nc">Cli</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">command</span>: <span class="nb">String</span><span class="p">,</span>
<span class="hll"><span class="w"> </span><span class="n">key</span>: <span class="nb">Option</span><span class="o"><</span><span class="nb">String</span><span class="o">></span><span class="p">,</span>
</span><span class="p">}</span>
</pre></div> </div> </div><p>Purely to have something to play with, I will also add some values to the list in <code>main</code>. This is temporary, as long as I don't implement a file storage mechanism.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Cli</span>::<span class="n">parse</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">todo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">TodoList</span>::<span class="n">new</span><span class="p">();</span>
<span class="hll"><span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="s">"Something to do"</span><span class="p">.</span><span class="n">to_string</span><span class="p">());</span>
</span><span class="hll"><span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="s">"Something else to do"</span><span class="p">.</span><span class="n">to_string</span><span class="p">());</span>
</span><span class="hll"><span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="s">"Something done"</span><span class="p">.</span><span class="n">to_string</span><span class="p">());</span>
</span><span class="hll"><span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">mark</span><span class="p">(</span><span class="s">"Something done"</span><span class="p">.</span><span class="n">to_string</span><span class="p">(),</span><span class="w"> </span><span class="kc">false</span><span class="p">).</span><span class="n">unwrap</span><span class="p">();</span>
</span><span class="p">}</span>
</pre></div> </div> </div><p>Last, the command-method binding part. A match construct is the best option in this case, something like</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">match</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">command</span><span class="p">.</span><span class="n">as_str</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"add"</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">..</span><span class="p">.,</span>
<span class="w"> </span><span class="s">"mark-done"</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">..</span><span class="p">.,</span>
<span class="w"> </span><span class="s">"list"</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">..</span><span class="p">.</span>
<span class="p">}</span>
</pre></div> </div> </div><div class="infobox"><i class="fa fa-"></i><div class="title">match</div><div><p>The <code>match</code> control flow construct is a blessing that comes directly from functional programming, where pattern matching is an important tool. The Rust book has <a href="https://doc.rust-lang.org/book/ch06-02-match.html#the-match-control-flow-construct">a chapter dedicated to it</a> and <a href="https://doc.rust-lang.org/book/ch18-03-pattern-syntax.html">a chapter on the pattern syntax</a>.</p></div></div><p>However, since each method has a different return type, I need the whole construct to return a uniform <code>Result</code> that can be used to print a meaningful state message at the end of the execution.</p><p>The code I wrote is the following</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">..</span><span class="p">.</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">command</span><span class="p">.</span><span class="n">as_str</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"add"</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">key</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">Some</span><span class="p">(</span><span class="n">key</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">key</span><span class="p">);</span>
<span class="w"> </span><span class="nb">Ok</span><span class="p">(())</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">None</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="nb">Err</span><span class="p">(</span><span class="s">"Key cannot be empty!"</span><span class="p">.</span><span class="n">to_string</span><span class="p">()),</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s">"mark-done"</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">key</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">Some</span><span class="p">(</span><span class="n">key</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">todo</span>
<span class="w"> </span><span class="p">.</span><span class="n">mark</span><span class="p">(</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="n">map_err</span><span class="p">(</span><span class="o">|</span><span class="n">e</span><span class="o">|</span><span class="w"> </span><span class="fm">format!</span><span class="p">(</span><span class="s">"Invalid key {}"</span><span class="p">,</span><span class="w"> </span><span class="n">e</span><span class="p">))</span>
<span class="w"> </span><span class="p">.</span><span class="n">and</span><span class="p">(</span><span class="nb">Ok</span><span class="p">(())),</span>
<span class="w"> </span><span class="nb">None</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="nb">Err</span><span class="p">(</span><span class="s">"Key cannot be empty!"</span><span class="p">.</span><span class="n">to_string</span><span class="p">()),</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s">"list"</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="p">(</span><span class="n">todo_items</span><span class="p">,</span><span class="w"> </span><span class="n">done_items</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">todo</span><span class="p">.</span><span class="n">list</span><span class="p">();</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">"# TO DO"</span><span class="p">);</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">();</span>
<span class="w"> </span><span class="n">todo_items</span><span class="p">.</span><span class="n">for_each</span><span class="p">(</span><span class="o">|</span><span class="n">x</span><span class="o">|</span><span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">" * {}"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">));</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">();</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">"# DONE"</span><span class="p">);</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">();</span>
<span class="w"> </span><span class="n">done_items</span><span class="p">.</span><span class="n">for_each</span><span class="p">(</span><span class="o">|</span><span class="n">x</span><span class="o">|</span><span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">" * {}"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">));</span>
<span class="w"> </span><span class="nb">Ok</span><span class="p">(())</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">cmd</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="nb">Err</span><span class="p">(</span><span class="fm">format!</span><span class="p">(</span><span class="s">"Command {} not recognised"</span><span class="p">,</span><span class="w"> </span><span class="n">cmd</span><span class="p">)),</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">Err</span><span class="p">(</span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">"ERROR: {}"</span><span class="p">,</span><span class="w"> </span><span class="n">e</span><span class="p">),</span>
<span class="w"> </span><span class="nb">Ok</span><span class="p">(</span><span class="n">_</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">"SUCCESS"</span><span class="p">),</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><div class="infobox"><i class="fa fa-"></i><div class="title">Option and Result</div><div><p>It's paramount to learn how to convert <code>Option</code> [<a href="https://doc.rust-lang.org/std/option/enum.Option.html">docs</a>] into <code>Result</code> [<a href="https://doc.rust-lang.org/std/result/enum.Result.html">docs</a>] and vice versa, as well as how to convert a <code>Result</code> type into a different one. Being familiar with functions like <code>map_err</code> [<a href="https://doc.rust-lang.org/std/result/enum.Result.html#method.map_err">docs</a>] or <code>and</code> [<a href="https://doc.rust-lang.org/std/result/enum.Result.html#method.and">docs</a>] will drastically change the quality of your Rust code.</p></div></div><div class="admonition"><i class="fa fa-github"></i><div class="content"><div><a href="https://github.com/lgiordani/rust-todo-cli/tree/89cccf4d64c7c998b6f254d3af2f68557e58759b">See the source code</a></div></div></div><h2 id="tidy-up-99b9">Tidy up<a class="headerlink" href="#tidy-up-99b9" title="Permanent link">¶</a></h2><p>At this point I went through the code and fixed some of the warning the compiler was still giving me. These all come from the tests, where I created the <code>todo</code> variable but never used it, and where I ignored the results returned by calls of <code>todo.mark</code>. There, I used <code>unwrap</code> [<a href="https://doc.rust-lang.org/std/result/enum.Result.html#method.unwrap">docs</a>] as I'm happy for that to panic if something goes wrong.</p><div class="admonition"><i class="fa fa-github"></i><div class="content"><div><a href="https://github.com/lgiordani/rust-todo-cli/tree/39d406d483bc79fff262211beff473027cd02794">See the source code</a></div></div></div><h2 id="final-words-9803">Final words<a class="headerlink" href="#final-words-9803" title="Permanent link">¶</a></h2><p>What a journey so far! It's really true that you can't consider a language learned until you start from scratch and try to use it to implement a real application. Well, it's not over yet, I'm still missing an important part which is the file storage.</p><p>If you have comments, suggestions, or corrections, please let me know! I am more than happy to learn something new from other coders and to publish updates to the post.</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>Design and implement a flexible VPC on AWS2023-10-16T11:00:00+01:002023-10-16T11:00:00+01:00Leonardo Giordanitag:www.thedigitalcatonline.com,2023-10-16:/blog/2023/10/16/design-and-implement-a-flexible-vpc-on-aws/<p> An example of network design with an implementation using AWS VPC</p><p>Designing networks is not an easy task, and while cloud computing removes the hassle (and also a bit the fun) of moving around switches and cables, it left untouched the complexity of planning a good structure. But, what does "good structure" mean?</p><p>I think this is crucial question in engineering. A well-designed (or well-architected) system cannot be defined once and for all, because its nature, structure, and components depend on the requirements. Hence the usual answer: "it depends".</p><p>In this post I want to give an example of some business requirements and of the structure of a network that might satisfy them.</p><p>For the implementation I will work with <a href="https://aws.amazon.com/vpc/">AWS VPC</a>, which has several advantages. First of all, AWS is one of the major cloud providers, and this might help beginners to better understand how it works. Second, most of the components of a VPC are free of charge, which means that anyone can apply the structure I will show without having to pay. The only component that AWS will charge you for is a NAT gateway, but the price is around 0.045 $/hour, which means that with a single dollar you can enjoy a trip on a well-architected VPC for approximately 22 hours. After that time you can always remove the NAT and keep working on the free part of your VPC.</p><h2 id="a-quick-recap-of-ip-and-cidrs-4f94">A quick recap of IP and CIDRs<a class="headerlink" href="#a-quick-recap-of-ip-and-cidrs-4f94" title="Permanent link">¶</a></h2><p>You should be familiar with the IP protocol to use VPC effectively, but if your knowledge is rusty I give you a quick recap.</p><p>IPv4 addresses are made of 32 bits, thus spanning 2<sup>32</sup> values, between 0 and 4,294,967,295 (2<sup>32</sup>-1). To simplify their usage, we split IPv4 addresses into 4 chunks of 8 bits (octets) and convert each into a decimal number which is thus between 0 and 255 (2<sup>8</sup>-1). The classic form of an IPv4 address is thus <code>A.B.C.D</code>, e.g. <code>1.2.3.4</code>, <code>134.32.175.52</code>, <code>255.255.0.0</code>.</p><p>When considering ranges of addresses, giving the first and last address might be tedious and difficult to read. The CIDR notation was introduced to simplify this. A CIDR (Classless Inter-Domain Routing) is expressed in the form <code>A.B.C.D/N</code>, where <code>N</code> is a number of bits between 0 and 32 and represents how many bits of the address remain fixed. A CIDR like <code>134.73.28.196/32</code> represents only the address <code>134.73.28.196</code>, as 32 bits out of 32 are fixed. Conversely, the CIDR <code>0.0.0.0/0</code> represents all IPv4 addresses as 0 bits out of 32 are fixed.</p><p>The range of addresses corresponding to a CIDR is in general not easy to compute manually, but those corresponding to the 4 octets are trivial</p><ul><li>The CIDR <code>A.B.C.D/32</code> corresponds to the address <code>A.B.C.D</code>.</li><li>The CIDR <code>A.B.C.0/24</code> corresponds to the addresses between <code>A.B.C.0</code> and <code>A.B.C.255</code> (255 addresses, 2<sup>32-24</sup> or 2<sup>8</sup>). Here, the first 24 bits (the 3 octets <code>A</code>, <code>B</code>, and <code>C</code>) are fixed.</li><li>The CIDR <code>A.B.0.0/16</code> corresponds to the addresses between <code>A.B.0.0</code> and <code>A.B.255.255</code> (65,536 addresses, 2<sup>32-16</sup> or 2<sup>16</sup>). Here, the first 16 bits (the 2 octets <code>A</code> and <code>B</code>) are fixed.</li><li>The CIDR <code>A.0.0.0/8</code> corresponds to the addresses between <code>A.0.0.0</code> and <code>A.255.255.255</code> (16,777,216 addresses, 2<sup>32-8</sup> or 2<sup>24</sup>). Here, the first 8 bits (the octet <code>A</code>) are fixed.</li></ul><p>Please note that by convention we set the variable octets to 0. The CIDR <code>A.B.C.0/24</code> is exactly the same as the CIDR <code>A.B.C.D/24</code>, as the octet <code>D</code> is not fixed. For this reason it's deceiving and useless to set it. For example, I would never write <code>153.23.95.34/24</code>, as this means all addresses between <code>153.23.95.0</code> and <code>153.23.95.255</code>, so the final <code>34</code> is just misleading. <code>153.23.95.0/24</code> is much better in this case.</p><p>You can use the <a href="https://jodies.de/ipcalc">IP Calculator</a> by Krischan Jodies to explore CIDRs.</p><p>As the number of IPv4 addresses quickly proved to be insufficient we developed IPv6, but in the meantime we also created private network spaces. In IPv4 there are 3 different ranges of addresses that are considered "private", which means that they can be duplicated and that they are not reachable from the Internet. The difference between public and private addresses is the same between "London, UK" (there is only one in the world) and "kitchen" (every house has one).</p><p>The three private ranges in IPv4 are:</p><ul><li><code>192.168.0.0/16</code> - 65536 addresses between <code>192.168.0.0</code> and <code>192.168.255.255</code></li><li><code>172.16.0.0/12</code> - 1,048,576 addresses between <code>172.16.0.0</code> and <code>172.31.255.255</code> (this is not easily computed manually because 12 is not a multiple of 8)</li><li><code>10.0.0.0/8</code> - 16,777,216 addresses between <code>10.0.0.0</code> and <code>10.255.255.255</code></li></ul><p>This means that IP addresses like <code>192.168.6.1</code>, <code>172.17.123.45</code>, and <code>10.34.168.20</code> are all private. Take care of the second range, as it goes from <code>172.16</code> to <code>172.31</code>, so an address like <code>172.32.123.45</code> is definitely public.</p><p>Now that your knowledge of IP addresses has been fully restored we can dive into network design.</p><h2 id="requirements-dd57">Requirements<a class="headerlink" href="#requirements-dd57" title="Permanent link">¶</a></h2><p>As I mentioned in the introduction, the most important part of a system design are requirements, both the present and the future ones.</p><p>If you design a road, it is crucial to understand how many vehicles will travel on it per minute (or per hour, day, month) and the type of vehicle. I'm not an expert of highway engineering, but I'm sure a road for mining trucks has to be different from a cycle lane, and the same is true for a computer system. Surely you want to store information in a database, but the size and the type of it depend on the amount of data you have, the usage pattern, the required reliability, and so on.</p><p>We are designing a network that will host cloud computing resources such as computing instances, databases, load balancers, and so on. I will globally refer to them as <em>resources</em> or <em>instances</em>, without paying to much attention to the concrete nature of each of them. From the networking point of view they are all just a bunch of network cards.</p><p>As an example, we have the following business requirements for a company called ZooSoft:</p><ul><li>There are currently <strong>three main products</strong>: Alligator Accounting, Barracuda Blogging, Coyote CAD.</li><li>There might be <strong>more products</strong> in the future, we are in the first design stages of Dragonfly Draw and Echidna Email.</li><li>We need <strong>four environments</strong> for each product: Live, Staging, Demo, UAT.<ul><li>Live is the application accessed by clients</li><li>Staging is a clone of Live that is used to run extensive pre-release tests and to perform initial support debugging</li><li>Demo runs the application with the same configuration as Live but with fake data, used to showcase the application to new customers</li><li>UAT contains on-demand instances used by developers and QA to test new features</li></ul></li><li>Some data or services are <strong>shared among the products</strong>, and the infrastructure team needs a space where to deploy their tools.</li></ul><h2 id="initial-analysis-c228">Initial analysis<a class="headerlink" href="#initial-analysis-c228" title="Permanent link">¶</a></h2><p>As you see, I highlighted some of the most important points we need to keep in mind.</p><ul><li><strong>There are currently 3 products</strong>. Not a single one, not 1 hundred. It is important to understand this number because we probably want to have separate spaces for each product, with different teams working on each one. If the company had one single product we might expect it to create a new one in the future, but it might not be that urgent to have space to grow. On the other hand, if the company had already 100 products we might want to design things with a completely different approach.</li><li><strong>There might be more products in the future</strong>. Again, it is important to have a good idea of the future requirements, as most of the problems of a system will come when the usage patterns change. It's generally a good idea to leave space for growth, but overdoing it might lead to a waste of resources and ultimately money. Understanding the growth expectation is paramount to find a good balance between inflexibility and waste of resources.</li><li><strong>There are 4 different usage patterns for each application</strong>, each one with its own requirements. The Live environment clearly needs a lot of power and redundancy to provide a stable service for users, while environments like UAT and Demo will certainly have more relaxed parameters in terms of availability or reliability.</li><li><strong>We need space to deploy internal tools</strong> used to monitor the application and to test new solutions. The architecture of the application might change in the future so we need space to try out different structures and products.</li></ul><p>In general, it's a good idea to <strong>isolate anything that doesn't need to be shared</strong> across teams or products, as it reduces the risk of errors and exposes less resources to attacks. In AWS, the concept of account allows us to completely separate environments at the infrastructure level. Resources in separate accounts can still communicate, but this requires a certain amount of work to set up the connection, which ultimately promotes isolation.</p><p>So, the initial idea might be to give each product a different account. However, we also have 4 different environments for each product, and given the relative simplicity involved in the creation of an AWS account it sounds like a good idea to have one of them for each combination of product and environment. AWS provides a tool called <a href="https://aws.amazon.com/controltower/">Control Tower</a> that can greatly simplify the creation and management of accounts, which makes this choice even more reasonable.</p><p>A VPC (Virtual Private Network) is, as the name suggests, a private network that allows different products to use the same IP address pool without clashing, which is not new to anyone is familiar with private IP address spaces. This means that we could easily create in each account a VPC with a CIDR <code>10.0.0.0/8</code> that grants 2<sup>24</sup> (more than 16M) different IP addresses, plenty enough to host instances and databases for any type of application.</p><p>However, it might be useful in the future to connect different VPCs, for example to perform data migrations, and this is done in AWS through <a href="https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html">VPC peering</a>. In simple words, this is a way to create a single network out of two different VPCs, but it can't be done if the two VPCs have overlapping CIDR blocks. This means that while we keep VPCs separate in different accounts, we might also want to assign different CIDRs to each one.</p><p>Avoiding overlap clearly reduces the size of a VPC, so let's have a look at some figures to have an idea of what we can create.</p><p>If we assign to each account a CIDR <code>10.X.0.0/16</code>, with X being a number assigned to the specific account, we can create up to 256 different accounts (from <code>10.0.0.0/16</code> to <code>10.255.0.0/16</code>). Out of an abundance of caution we might reserve the first 10 CIDRs for future use and internal needs, which leaves us with 246 non-overlapping CIDRs (from <code>10.10.0.0/16</code> to <code>10.255.0.0/16</code>). This means that we have space for several combinations of product/environment, for example we might have up to 41 products with 6 environments each or 30 products with 8 environments each (with leftovers).</p><p>Since at the moment we have 3 products with 4 environments each, this choice looks reasonable. At the same time, a <code>/16</code> CIDR grants us space for 2<sup></sup>16 (65536) resources, which again looks more than enough to host a standard web application.</p><h2 id="assignment-plan-3693">Assignment plan<a class="headerlink" href="#assignment-plan-3693" title="Permanent link">¶</a></h2><p>To simplify the schema, let's grant space for 20 products and group CIDRs by environment. This means we will have 20 CIDRs for Live environments, 20 for Staging, and so on. The assignment plan is then</p><div class="code"><div class="content"><div class="highlight"><pre>10.0.0.0/16 reserved
...
10.9.0.0/16 reserved
10.10.0.0/16 alligator-accounting-live
10.11.0.0/16 barracuda-blogging-live
10.12.0.0/16 coyote-cad-live
...
10.30.0.0/16 alligator-accounting-staging
10.31.0.0/16 barracuda-blogging-staging
10.32.0.0/16 coyote-cad-staging
...
10.50.0.0/16 alligator-accounting-demo
10.51.0.0/16 barracuda-blogging-demo
10.52.0.0/16 coyote-cad-demo
...
10.70.0.0/16 alligator-accounting-uat
10.71.0.0/16 barracuda-blogging-uat
10.72.0.0/16 coyote-cad-uat
...
10.250.0.0/16 infrastructure-team
...
10.255.0.0/16 infrastructure-team
</pre></div> </div> </div><p>As I mentioned, the initial CIDRs are reserved for future use, but we also kept the final 6 CIDRs for the needs of the infrastructure team. Keep in mind that this is only an example and that we are clearly free to change any of these figure to match our needs more closely. Each one of these CIDRs will be assigned to a specific account.</p><p>Should we create new products we will continue with the same pattern, e.g.</p><div class="code"><div class="content"><div class="highlight"><pre>10.0.0.0/16 reserved
...
10.10.0.0/16 alligator-accounting-live
10.11.0.0/16 barracuda-blogging-live
10.12.0.0/16 coyote-cad-live
<span class="hll">10.13.0.0/16 dragonfly-draw-live
</span>...
10.30.0.0/16 alligator-accounting-staging
10.31.0.0/16 barracuda-blogging-staging
10.32.0.0/16 coyote-cad-staging
<span class="hll">10.33.0.0/16 dragonfly-draw-staging
</span>...
10.50.0.0/16 alligator-accounting-demo
10.51.0.0/16 barracuda-blogging-demo
10.52.0.0/16 coyote-cad-demo
<span class="hll">10.53.0.0/16 dragonfly-draw-demo
</span>...
10.70.0.0/16 alligator-accounting-uat
10.71.0.0/16 barracuda-blogging-uat
10.72.0.0/16 coyote-cad-uat
<span class="hll">10.73.0.0/16 dragonfly-draw-uat
</span>...
10.250.0.0/16 infrastructure-team
...
</pre></div> </div> </div><p>We are planning to use IaC tools to implement this, but it's nevertheless interesting to spot the patterns in this schema that make it easier to debug network connections.</p><p>All environments of a certain type belong to a specific range, so an address like <code>10.15.123.456</code> is definitely in a Live environment. At the same time, IP addresses across the same product have the same final digit in the second part of the address, so if <code>10.12.456.789</code> is a Live instance, the corresponding Staging instance will have an address like <code>10.32.X.Y</code>.</p><p>While this is not crucial, I wouldn't underestimate the value of a regular structure that can give precious information at a glance. While debugging during an emergency things like this might be a blessing.</p><p>The last thing to note is that in this schema the 160 CIDRs between <code>10.90.0.0/16</code> and <code>10.249.0.0/16</code> are not allocated. This might give you a better idea of how wide a <code>10.0.0.0/8</code> network space is! Such accounts can be used to host up to other 8 environments for each product.</p><h2 id="address-space-bad2">Address space<a class="headerlink" href="#address-space-bad2" title="Permanent link">¶</a></h2><p>Let's focus on a single CIDR in the form <code>10.N.0.0/16</code>. As we know this provides 65536 addresses (2<sup>16</sup>) that we need to split into subnets. In AWS, subnets correspond to different Availability Zones, that are "distinct locations within an AWS Region that are engineered to be isolated from failures in other Availability Zones" (from the docs). In other words, they are separate data centres built so that if one blows up the others should be unaffected. I guess this depends on the size of the explosion, but within reason this is the idea.</p><p>So, each account gets 65536 addresses (<code>/16</code>), split into:</p><ul><li>1 public subnet for the NAT gateway to live in (<code>nat</code>).</li><li>3 private subnets for the computing resources (<code>private_a</code>, <code>private_b</code>, <code>private_c</code>).</li><li>3 public subnets for the load balancer (<code>public_a</code>, <code>public_b</code>, <code>public_c</code>).</li><li>1 public subnet for the bastion instance (<code>bastion</code>).</li></ul><p>Now, if you are not familiar with subnets they are the simplest of concepts. You get the address space of a network (say for example <code>10.10.0.0/16</code>, that is the addresses from <code>10.10.0.0</code> to <code>10.10.255.255</code>) and split it into chunks. The fact that each chunk is assigned to a different data centre is an AWS addition and is not part of the definition of subnet in principle. However, the reason behind subnetting is exactly to create small <em>physical</em> networks that are therefore more efficient. If two computers are on the same subnet the routing of IP packets exchanged by them is simpler and thus faster. For similar reasons, and to increase security, it's a good idea to keep your subnets as small as possible.</p><p>In this case, we might create subnets in a <code>/23</code> space (512 addresses each), which looks wide enough to host the web applications of ZooSoft. Before we have a look at the actual figures let me clarify what this means. I assume each application (Alligator Accounting, Barracuda Blogging, and so on) has been containerised, maybe using ECS or EKS, which however means that there are EC2 instances behind the scenes running the containers. If we are using Fargate we do not provide EC2 instances and in that case we might set up our network in a different way.</p><p>EC2 instances are computers, and they all have at least one network interface, which corresponds to an IP address. So, when I say that a subnet contains 512 addresses I mean that in a single subnet I can run up to 507 EC2 instances (remember that AWS reserves some addresses, see <a href="https://docs.aws.amazon.com/vpc/latest/userguide/subnet-sizing.html">https://docs.aws.amazon.com/vpc/latest/userguide/subnet-sizing.html</a>). Assuming instances with 8 GiB of memory each (e.g. <code>m7g.large</code>) and containers that require 1 GiB of memory we can easily host 3042 containers (507*6) leaving 2 GiB for each instance to host newly created containers (for example to run blue-green deployments). These are clearly examples and you have to adapt them to the requirements of your specific application, but I hope you get an idea of how to roughly estimate this sort of quantities.</p><p>Remember that in AWS the difference between public and private networks is only in the gateway they are connected to. Public networks are connected to an Internet Gateway and thus are reachable from Internet, while private networks are either disconnected from Internet or connected with a NAT, which allows them to access Internet but not to be accessed from outside.</p><p>The <code>bastion</code> subnet might or might not be useful. In general, <code>bastion</code> hosts are very secure instances that can be accessed using SSH, and from which you can access the rest of the instances. As from the point of view of security they are a weak point of the whole infrastructure you might not want to have them or replace them with more ephemeral solutions. In any case, I left the network there as an example of a space that hosts tools not directly connected with the application.</p><p>Let's have a deeper look at the figures. A <code>/16</code> space can be split into 128 <code>/23</code> spaces (2<sup>23-16</sup>), but given the list of subnets I showed before we need only 8 of them, which leaves again a lot of space for further expansion, and there are two types of expansion we might consider. One is increasing the number of subnets, the other is increasing the size of subnets themselves. With the amount of space granted by the current size of the networks we have plenty of options to cover both cases. We might reach a good balance between the size of the network and the number of networks increasing the potential size to <code>/21</code> (2048 addresses), which grants us space for 32 subnetworks.</p><p>Here, I show a possible schema for the account <code>alligator-accounting-live</code> that is granted the space <code>10.10.0.0/16</code>.</p><div class="code"><div class="content"><div class="highlight"><pre>NAME CIDR ADDRESSES NUM ADDRESSES
reserved 10.10.0.0/21 (10.10.0.0 - 10.10.7.255) {2048}
nat 10.10.8.0/23 (10.10.8.0 - 10.10.9.255) {512}
expandable to
10.10.8.0/21 (10.10.8.0 - 10.10.15.255) {2048}
PUBLIC
reserved 10.10.16.0/21 (10.10.10.0 - 10.10.15.255) {2048}
reserved 10.10.24.0/21 (10.10.10.0 - 10.10.31.255) {2048}
private-a 10.10.32.0/23 (10.10.32.0 - 10.10.33.255) {512}
expandable to
10.10.32.0/21 (10.10.32.0 - 10.10.39.255) {2048}
private-b 10.10.40.0/23 (10.10.40.0 - 10.10.41.255) {512}
expandable to
10.10.40.0/21 (10.10.40.0 - 10.10.47.255) {2048}
private-c 10.10.48.0/23 (10.10.48.0 - 10.10.49.255) {512}
expandable to
10.10.48.0/21 (10.10.48.0 - 10.10.55.255) {2048}
reserved 10.10.56.0/21 (10.10.56.0 - 10.10.63.255) {2048}
RESERVED PRIVATE 4
reserved 10.10.63.0/21 (10.10.63.0 - 10.10.71.255) {2048}
RESERVED PRIVATE 5
...
reserved 10.10.128.0/21 (10.10.128.0 - 10.10.135.255) {2048}
RESERVED PRIVATE 13
public-a 10.10.136.0/23 (10.10.136.0 - 10.10.137.255) {512}
expandable to
10.10.136.0/21 (10.10.136.0 - 10.10.143.255) {2048}
PUBLIC
public-b 10.10.144.0/23 (10.10.144.0 - 10.10.145.255) {512}
expandable to
10.10.144.0/21 (10.10.144.0 - 10.10.151.255) {2048}
PUBLIC
public-c 10.10.152.0/23 (10.10.152.0 - 10.10.153.255) {512}
expandable to
10.10.152.0/21 (10.10.152.0 - 10.10.159.255) {2048}
PUBLIC
reserved 10.10.160.0/21 (10.10.160.0 - 10.10.167.255) {2048}
RESERVED PUBLIC 4
reserved 10.10.168.0/21 (10.10.167.0 - 10.10.175.255) {2048}
RESERVED PUBLIC 5
...
reserved 10.10.232.0/21 (10.10.232.0 - 10.10.239.255) {2048}
RESERVED PUBLIC 13
bastion 10.10.240.0/23(10.10.240.0 - 10.10.241.255) {512}
expandable to
10.10.240.0/21 (10.10.240.0 - 10.10.247.255) {2048}
PUBLIC
reserved 10.10.248.0/21(10.10.248.0 - 10.10.255.255) {2048}
</pre></div> </div> </div><h2 id="routing-4daa">Routing<a class="headerlink" href="#routing-4daa" title="Permanent link">¶</a></h2><p>Routing of each VPC is very simple:</p><ul><li>All resources in the private subnets will be routed into the NAT to grant them Internet access but to isolate them from outside.</li><li>All resources in the public subnets will be routed into the default Internet Gateway.</li><li>The NAT network has to be public, so it is routed into the default Internet Gateway.</li><li>The bastion subnet is public, so it routed into the default Internet Gateway.</li></ul><p>The NAT is a device that translates internet addresses, hiding the internal ones through some clever hacking of TCP/IP. This means that it has to live in a public network so that it can access the Internet.</p><p>The bastion (if present) is a machine that can be accessed from Internet, so it has to be in a public subnet. It's customary to grant access to the bastion to a specific set of IPs (e.g. the personal IPs of some developers) but this is done through Security Groups.</p><h2 id="relevant-figures-92bd">Relevant figures<a class="headerlink" href="#relevant-figures-92bd" title="Permanent link">¶</a></h2><p>In summary the current design grants us the following:</p><ul><li>246 non-overlapping accounts</li><li>20 different products each with 12 environments</li><li>32 subnets for each account</li><li>512 addresses per subnet, upgradable to 2048 without overlapping</li></ul><h2 id="a-simple-terraform-module-d757">A simple Terraform module<a class="headerlink" href="#a-simple-terraform-module-d757" title="Permanent link">¶</a></h2><p>The following code is a simple Terraform module intended to showcase how to create a well-designed VPC with that tool. I decided to avoid using complex loops or other clever hacks to keep it simple and accessible to anyone might be moving their first steps into AWS, VPC, network design, and Terraform. You are clearly free to build on top of it and to come up with a different or more clever implementation.</p><p>Remember that the NAT is the only resource that is not free of charge, so don't leave it up and running if don't use it. Don't be afraid of creating one and having a look in the AWS console though.</p><p>I assume the following files are all created in the same directory that I will conventionally call <code>modules/vpc</code>.</p><h3 id="vpc-2303">VPC</h3><div class="code"><div class="title"><code>modules/vpc/vpc.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_vpc"</span><span class="w"> </span><span class="nv">"main"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.0.0/16"</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.name</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><div class="code"><div class="title"><code>modules/vpc/variables.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">variable</span><span class="w"> </span><span class="nv">"cidr_prefix"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">description</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"The first two octets of the CIDR, e.g. 10.10 (will become 10.10.0.0/16)"</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
<span class="kr">variable</span><span class="w"> </span><span class="nv">"name"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">description</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"The name of this VPC and the prefix/tag for its related resources"</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
</pre></div> </div> </div><p>When you call the module you will have to pass these two variables, e.g.</p><div class="code"><div class="title"><code>alligator-accounting-live/vpc/main.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">module</span><span class="w"> </span><span class="nv">"vpc"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"../../modules/vpc"</span>
<span class="w"> </span><span class="na">cidr_prefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"10.10"</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"alligator-accounting-live"</span>
<span class="p">}</span>
</pre></div> </div> </div><h3 id="internet-gateway-e563">Internet Gateway </h3><div class="code"><div class="title"><code>modules/vpc/gateway.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_internet_gateway"</span><span class="w"> </span><span class="nv">"main"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.name</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><h3 id="subnets-03e5">Subnets</h3><div class="code"><div class="title"><code>modules/vpc/subnets.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"nat"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.8.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1a"</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-nat"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"private_a"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.32.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1a"</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-private-a"</span>
<span class="w"> </span><span class="na">Tier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"private"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"private_b"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.40.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1b"</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-private-b"</span>
<span class="w"> </span><span class="na">Tier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"private"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"private_c"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.48.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1c"</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-private-c"</span>
<span class="w"> </span><span class="na">Tier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"private"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"public_a"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.136.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1a"</span>
<span class="w"> </span><span class="na">map_public_ip_on_launch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="no">true</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-public-a"</span>
<span class="w"> </span><span class="na">Tier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"public"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"public_b"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.144.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1b"</span>
<span class="w"> </span><span class="na">map_public_ip_on_launch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="no">true</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-public-b"</span>
<span class="w"> </span><span class="na">Tier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"public"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"public_c"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.152.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1c"</span>
<span class="w"> </span><span class="na">map_public_ip_on_launch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="no">true</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-public-c"</span>
<span class="w"> </span><span class="na">Tier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"public"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_subnet"</span><span class="w"> </span><span class="nv">"bastion"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.cidr_prefix}.240.0/23"</span>
<span class="w"> </span><span class="na">availability_zone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"eu-west-1a"</span>
<span class="w"> </span><span class="na">map_public_ip_on_launch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="no">true</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name}-bastion"</span>
<span class="w"> </span><span class="na">Tier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"bastion"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>We need to associate the public subnets with the Internet Gateway.</p><div class="code"><div class="title"><code>modules/vpc/gateway.tf</code></div><div class="content"><div class="highlight"><pre><span class="p">[...]</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table"</span><span class="w"> </span><span class="nv">"main"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="nb">route</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"0.0.0.0/0"</span>
<span class="w"> </span><span class="na">gateway_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_internet_gateway.main.id</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name} Internet Gateway"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"public_a_to_igw"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.public_a.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.main.id</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"public_b_to_igw"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.public_b.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.main.id</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"public_c_to_igw"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.public_c.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.main.id</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"bastion_to_igw"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.bastion.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.main.id</span>
<span class="p">}</span>
</pre></div> </div> </div><h3 id="nat-gateway-3743">NAT Gateway</h3><p>We need a NAT to grant private networks access to the Internet.</p><div class="code"><div class="title"><code>modules/vpc/nat.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_eip"</span><span class="w"> </span><span class="nv">"nat_gateway"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="no">true</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.name</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"nat_to_igw"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.nat.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.main.id</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_nat_gateway"</span><span class="w"> </span><span class="nv">"nat_gateway"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">allocation_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_eip.nat_gateway.id</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.nat.id</span>
<span class="w"> </span><span class="na">depends_on</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="nv">aws_internet_gateway.main</span><span class="p">]</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.name</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table"</span><span class="w"> </span><span class="nv">"nat_gateway"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_vpc.main.id</span>
<span class="w"> </span><span class="nb">route</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">cidr_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"0.0.0.0/0"</span>
<span class="w"> </span><span class="na">nat_gateway_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_nat_gateway.nat_gateway.id</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name} NAT Gateway"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"private_a_to_nat"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.private_a.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.nat_gateway.id</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"private_b_to_nat"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.private_b.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.nat_gateway.id</span>
<span class="p">}</span>
<span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_route_table_association"</span><span class="w"> </span><span class="nv">"private_c_to_nat"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_subnet.private_c.id</span>
<span class="w"> </span><span class="na">route_table_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_route_table.nat_gateway.id</span>
<span class="p">}</span>
</pre></div> </div> </div><h2 id="subnet-groups-bcd9">Subnet groups<a class="headerlink" href="#subnet-groups-bcd9" title="Permanent link">¶</a></h2><p>As an optional step, you might want to create <em>subnet groups</em> for RDS. In AWS, you can create RDS instances in public networks out of the box, but if you want to put them in a private network (and <em>you want</em> to put them there) you need to build a subnet group. See <a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.WorkingWithRDSInstanceinaVPC.html">the documentation</a>.</p><div class="code"><div class="title"><code>modules/vpc/subnets.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_db_subnet_group"</span><span class="w"> </span><span class="nv">"rds_group"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rds_private"</span>
<span class="w"> </span><span class="na">subnet_ids</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="nv">aws_subnet.private_a.id</span><span class="p">,</span>
<span class="w"> </span><span class="nv">aws_subnet.private_b.id</span><span class="p">,</span>
<span class="w"> </span><span class="nv">aws_subnet.private_c.id</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="nb">tags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">Name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.name} RDS subnet group"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><h2 id="final-words-9803">Final words<a class="headerlink" href="#final-words-9803" title="Permanent link">¶</a></h2><p>I hope this was a useful and interesting trip into network design. As an example, this might sound trivial and simple compared to what is needed in certain contexts, but it is definitely a good setup that you can build on. I think VPC is often overlooked as it is assumed developers are familiar with networks. As networking is a crucial part of a system and will pop up in other technologies like Docker or Kubernetes, I recommend any mid-level or senior developer to make sure they are familiar with the main concepts of IP. Happy learning!</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>Data Partitioning and Consistent Hashing2022-08-23T12:00:00+00:002022-08-23T12:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2022-08-23:/blog/2022/08/23/data-partitioning-and-consistent-hashing/<p>This post is an introduction to partitioning, a technique for distributed storage systems, and to consistent hashing, a specific partitioning algorithm that promotes even distribution of data, while allowing to dynamically change the number of servers and minimising the cost of the transition.</p><p>My interest in partitioning dates back to 2015 when I was following courses at the MongoDB university and learned about <em>sharding</em>, the name MongoDB uses for partitioning. I was fascinated by the topic and discovered the technique known as <em>consistent hashing</em>; I enjoyed it a lot, so much that I wrote a little demo in Python to understand it better. Later, I focused on other things and forgot the project completely, until recently, when <a href="https://github.com/drocpdp">David Eynon</a> sent me a PR on GitHub to replace a deprecated testing library. So, I decided to brush up on my knowledge of consistent hashing and, as I often do on this blog, dump my thoughts in a post.</p><p>The topic of distributed storage and data processing is arguably rich and complicated, so while I will try to give a broader context to the concepts that I will introduce, I by no means intend to write a comprehensive guide to the subject matter. The audience of this post is developers who do not know what partitioning and consistent hashing are and want to take their first step into those topics.</p><div class="infobox"><i class="fa fa-"></i><div class="title">Code syntax</div><div><p>You will find some code examples mentioned in the post, which are written using the Python notation. If you are not familiar with the language, these are the main rules</p>
<ul><li><code>x**y</code> means x<sup>y</sup>, e.g. <code>2**3 => 8</code>.</li><li><code>x//y</code> means the integer division between <code>x</code> and <code>y</code>, e.g. <code>11//4 => 2</code>.</li><li><code>x%y</code> means the modulo operation (remainder of integer division), e.g. <code>11%4 => 3</code>.</li></ul></div></div><h2 id="rationale-2d0e">Rationale<a class="headerlink" href="#rationale-2d0e" title="Permanent link">¶</a></h2><p>When we design a system, we might want to scatter data among multiple sources to allow real concurrency of access and a more targeted optimisation.</p><p>For example, we might observe that in a given social media application there are two types of queries: some are very infrequent and involve tables related to personal data and the user profile, others are extremely frequent and pretty intensive, and are related to the content shared by the user. In this case, we might decide to store the tables related to the profile and the tables that are related to content in two different systems, A and B (here, the word <em>system</em> might be freely replaced by <em>computer</em>, <em>database</em>, <em>storage system</em>, or other similar components).</p><p>This means that the infrequent queries that fetch personal data will be served by system A, while the more frequent and intensive queries related to content will be served by system B.</p><p>Suddenly, we have the chance to deploy system B using more powerful and expensive hardware, or an architecture with better performances, without increasing the cost for tables that won't benefit from such an improvement as the ones stored in system A.</p><div class="imageblock"><img src="/images/data-partitioning-and-consistent-hashing/partitioning_rationale.jpg"></div><p>This is a standard approach in system design, and it requires the introduction of an additional layer of control that will route requests to the right source. This layer might be implemented in several places, for example:</p><ul><li>in the code of our application, with conditional structures that query different data sources</li><li>in the framework that we are using for the application, for example in a middleware that automatically routes requests according to nature or the query</li><li>in a wrapper around the storage that hides the fact that data exists in two different systems</li></ul><p>In the last case, this technique is usually called <em>partitioning</em>.</p><p>In this post, I will try to show the challenges we face when we partition data and focus on some of the algorithms that can be used to implement it, in particular on consistent hashing. Please note that, while some of these techniques are used by databases to provide internal partitioning, they have a wider range of applications and might come in handy in different contexts.</p><h2 id="design-choices-d096">Design choices<a class="headerlink" href="#design-choices-d096" title="Permanent link">¶</a></h2><p>Every design choice in a system depends on the requirements, and when it comes to data storage the most important factors are the <em>nature of the data</em>, its <em>distribution</em>, and the <em>access patterns</em>. Consider for example databases and Content Delivery Networks (CDNs): both are meant to store data, and the storage size of both can vary substantially. However, there are important differences between the two that greatly affect the design choices. Let's see some simple examples:</p><ul><li>databases are meant to store data in a long-term fashion, while caches are by definition short-lived. This means that an important requirement for databases is data preservation, and we should do everything in our power to avoid losing parts of the database. A cache, conversely, holds data for a short time, either predetermined by the system or forced by a change in the data source. As you can see in this case we not only take data loss as part of the equation, but we get to the point where we trigger it on purpose.</li></ul><ul><li>applications often make use of range queries, which means that they retrieve sets of results spanning a range of values of one of the keys; for example, you might want to see all employers within a certain range of salaries, or all users that have more than a certain amount of followers. In such cases, it makes little sense to scatter data among different physical sources, thus making the retrieval more complicated and ultimately affecting performances. Databases see very often an access pattern of this type, while caches, being usually implemented as key/value stores, do not need to take this into account.</li></ul><h2 id="a-practical-example-of-partitioning-1b06">A practical example of partitioning<a class="headerlink" href="#a-practical-example-of-partitioning-1b06" title="Permanent link">¶</a></h2><p>Let us consider a simple key/value store, for example a common address book where the key is the name of the contact and the value a rich document with their personal details. If multiple users access the store, chances are that the system will at a certain point struggle to serve all the requests, so we might want to partition the data to allow concurrent access. We can for example sort them alphabetically and split them in two, storing all values with a key that begins with the letters A-M in one server and the rest (keys N-Z) in the second one.</p><div class="imageblock"><img src="/images/data-partitioning-and-consistent-hashing/simple_partitioning_1.jpg"></div><p>This might seem a good idea, but we will soon discover that performances are not great. Unfortunately, our address book doesn't contain the same number of people for each letter, as (for example) we know more people whose name starts with A or C than with X or Z.</p><p>That poses a problem, as our partitioning doesn't achieve the desired outcome, that of splitting requests evenly between the two servers. If we increase the number of partitions, serving smaller groups of letters, we will just worsen the problem, to the point where a partition might be completely empty and thus receive no traffic: since the problem comes from the data distribution, we need to find a way to change that property.</p><div class="imageblock"><img src="/images/data-partitioning-and-consistent-hashing/simple_partitioning_2.jpg"></div><p>One way to deal with the problem is to change the boundaries of the partitions so that we get an almost even distribution of values among them. For example, we might store keys starting with A-B in the first partition, keys starting with C-D in the second, and all the rest in the third one.</p><p>The problem with such a strategy is that it is highly dependent on the actual data that we are storing. Not only does this mean the solution has to be customised for each use case (the partitions in the example might be good for one address book and completely wrong for another), but adding data to the storage might change the distribution and invalidate the solution.</p><div class="imageblock"><img src="/images/data-partitioning-and-consistent-hashing/simple_partitioning_3.jpg"></div><h2 id="hash-functions-to-the-rescue-2cc2">Hash functions to the rescue<a class="headerlink" href="#hash-functions-to-the-rescue-2cc2" title="Permanent link">¶</a></h2><p>An interesting solution to the problem of distributing data evenly is represented by hash functions. As I explained in my post <a href="https://www.thedigitalcatonline.com/blog/2018/04/06/introduction-to-hashing/">Introduction to hashing</a>, good hash functions produce a highly uniform distribution, which makes them ideal in this case. Please note that hash functions can help with <em>routing queries</em> and not with <em>storing data</em>. Hashed values cannot replace the content, as they are not bijective, i.e. given two different inputs the output might be the same (collision), so they can only be used to decide <em>where</em> to store a piece of information.</p><p>We can at this point devise a storage strategy based on hash functions. We can divide the output space of the hash function (codomain) into a certain amount of partitions and be sure that each one of them will contain a similar amount of elements. For example, the hash function might output a 32-bit number, so we know that each hashed value will be between 0 and 2<sup>32</sup> (4,294,967,295), and from here it's pretty straightforward to find partition boundaries. For example, we can create 16 partitions numbered 0 to 15, each one containing 2<sup>28</sup> hash values (268,435,456).</p><div class="imageblock"><img src="/images/data-partitioning-and-consistent-hashing/hash_functions.jpg"></div><p>Routing is at this point very simple, as we can mathematically find the partition number given the hash. There are many ways to do this but two simple approaches are</p><ul><li>using the integer division <code>hash(k) // partition_size</code>, e.g. <code>hash(k) // 2**28</code>. All keys from <code>0</code> to <code>268435455</code> end up in partition 0 (<code>268435455 // 2**28</code>), keys from <code>268435456</code> to <code>536870911</code> end up in partition 1, and so on.</li></ul><ul><li>using the modulo operator <code>hash(k) % number_of_partitions</code>, e.g. <code>hash(k) % 16</code>. This assigns values to partitions in a round robin fashion, where key <code>0</code> goes to partition 0 (<code>0%16</code>), key <code>1</code> to partition 1 (<code>1%16</code>), key <code>15</code> to partition 15 (<code>15%16</code>), and then starts again with key <code>16</code> which goes to partition 0 (<code>16%16</code>), and so on.</li></ul><p>This architecture has the clear advantage that thanks to the properties of hash functions, data is scattered evenly among the partitions. This means that when we query the system, requests will also be divided evenly, thus giving us a good distribution of the load.</p><p>As we will see later, however, this is not a good approach for dynamic systems.</p><h2 id="partitioning-use-cases-d7c3">Partitioning use cases<a class="headerlink" href="#partitioning-use-cases-d7c3" title="Permanent link">¶</a></h2><p>Hash functions are definitely interesting but they are not the perfect solution in every case. Let's have a brief look at three different types of systems that might benefit from partitioning and discuss their specific requirements.</p><h3 id="load-balancers-b5f9">Load balancers</h3><p>Pure load balancers solve a simple problem: to spread requests evenly across multiple <em>identical</em> servers. The key word here is "identical", as you cannot pick the wrong server, thus no routing can result in an error. However, spreading the load unevenly can result in performance loss, and possibly also service failure. For example, if a server gets overloaded queries might hit a timeout while waiting to be served.</p><p>For this reason, when load balancing is not content-aware, for example in a simple HTTP server scenario, round-robin partitioning is a good choice. The system just assigns new requests to servers on a rotation basis, which ensures perfectly even distribution. For example, this algorithm is the default choice for AWS Application Load Balancers.</p><p>Clearly, load balancers can be more complicated and feature-rich even without becoming content aware. The aforementioned AWS ALBs, for example, support also the "least outstanding requests" algorithm, which in simple words means choosing the server with the smallest workload.</p><h3 id="caches-27ec">Caches</h3><p>Caches are systems that temporarily store data whose retrieval is expensive, either for the user or for the provider. For example, if a system runs a long query on a database caching the result will be beneficial both for the system and the database. For the former, because a repeated run will get the result much faster and for the latter because the load of the new query is zero.</p><p>Caches can be found everywhere and vary dramatically in size, but they are one of the best examples of systems that benefit from partitioning. As I mentioned before, their standard usage patterns don't include range queries and data loss (flushing) is part of their normal workflow.</p><p>A Content Delivery Network (CDN) is a specific type of cache that is distributed geographically. The purpose of the CDN nodes is to store content in a location that is physically near the users, thus increasing the performance of the system. This means that two geographically distinct nodes of a CDN contain the same values (replication), and the routing policy is solely based on the physical position of the user with respect to the node. Internally, each CDN node can be implemented using partitioning, though, which might speed up the performances of that specific node.</p><h3 id="databases-14fd">Databases</h3><p>As for databases, I already mentioned that the most important problem is range queries or if you prefer, content-aware partitioning. In general, you can't partition a database without taking into account the content, or you will incur severe performance losses. So, when it comes to databases, partitioning has to be the result of a specific design and can't be applied regardless of the database schema. </p><p>To better understand the challenge, let's consider a simple database whose elements are employees with a name and a salary. Now, if we want to partition this database we have to choose a key for the partitioning itself. It might be the primary key, the name, or the salary, as these are the only values available in each record.</p><p>Say we use hash functions to partition the database and use the employee salary as a key. Because of the properties of hash functions, employees with the exact same salary will end up being stored in the same partition, but employees with similar salaries might end up in different ones. This depends on the number of partitions, clearly, but the main point is that records that are "near" (according to the selected key) now are potentially very far.</p><div class="imageblock"><img src="/images/data-partitioning-and-consistent-hashing/hash_functions_and_range_queries.jpg"></div><p>In the example above I used MD5 as the routing hash function, and you can reproduce the calculations using the following Python code</p><div class="code"><div class="content"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">hashlib</span>
<span class="k">def</span> <span class="nf">hash_value</span><span class="p">(</span><span class="n">value</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">hashlib</span><span class="o">.</span><span class="n">md5</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">value</span><span class="p">)</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s2">"utf-8"</span><span class="p">))</span><span class="o">.</span><span class="n">hexdigest</span><span class="p">(),</span> <span class="mi">16</span><span class="p">)</span>
<span class="c1"># 57500283691658467528082923406379043196</span>
<span class="n">hash_value</span><span class="p">(</span><span class="mi">60000</span><span class="p">)</span>
<span class="c1"># 209589555716047624083879134729984902154</span>
<span class="n">hash_value</span><span class="p">(</span><span class="mi">60100</span><span class="p">)</span>
<span class="c1"># 12</span>
<span class="n">hash_value</span><span class="p">(</span><span class="mi">60000</span><span class="p">)</span> <span class="o">%</span> <span class="mi">16</span>
<span class="c1"># 10</span>
<span class="n">hash_value</span><span class="p">(</span><span class="mi">60100</span><span class="p">)</span> <span class="o">%</span> <span class="mi">16</span>
</pre></div> </div> </div><p>Things do not go much better if we use the integer division. If we have 16 partitions, each one of them contains 2<sup>124</sup> values</p><div class="code"><div class="content"><div class="highlight"><pre><span class="c1"># 2</span>
<span class="n">hash_value</span><span class="p">(</span><span class="mi">60000</span><span class="p">)</span> <span class="o">//</span> <span class="mi">2</span><span class="o">**</span><span class="mi">124</span>
<span class="c1"># 9</span>
<span class="n">hash_value</span><span class="p">(</span><span class="mi">60100</span><span class="p">)</span> <span class="o">//</span> <span class="mi">2</span><span class="o">**</span><span class="mi">124</span>
</pre></div> </div> </div><p>Now, let's consider a query that selects all employees within a certain range of salaries. If the database is not partitioned, all records are kept on the same server, and if we optimised the system for such a query, the records will also be physically adjacent (e.g. stored in nearby memory addresses). This makes the query blazing fast, but if the database is partitioned the query has to collect values from multiple partitions which greatly penalises performance.</p><p>We can see a real example of this design challenge in the documentation of MongoDB, a non-relational database that supports partitioning (called <em>sharding</em>). MongoDB supports <a href="https://www.mongodb.com/docs/manual/core/hashed-sharding/">hashed sharding</a> and <a href="https://www.mongodb.com/docs/manual/core/ranged-sharding/">ranged sharding</a>. In their words</p><p><em>Hashed sharding uses either a single field hashed index or a compound hashed index as the shard key to partition data across your sharded cluster.</em></p><p><em>Range-based sharding involves dividing data into contiguous ranges determined by the shard key values. In this model, documents with "close" shard key values are likely to be in the same chunk or shard. This allows for efficient queries where reads target documents within a contiguous range. However, both read and write performance may decrease with poor shard key selection.</em></p><p>I highly recommend reading the two pages I linked above as they will give you a good idea of how a real system uses the concepts I introduced and what design challenges are involved when using partitioning.</p><h2 id="caching-and-scaling-strategies-90f4">Caching and scaling strategies<a class="headerlink" href="#caching-and-scaling-strategies-90f4" title="Permanent link">¶</a></h2><p>When we design distributed caches, an interesting problem we might face is that of scaling the system in and out to match the current load without wasting resources.</p><p>When the cache is under a light load we might want to run a small number of servers, but as soon as the number of requests increases we need to proportionally increase the number of cache nodes if we want to avoid a performance drop. This is usually not a big problem for partitioned databases, since in that case we change the number of partitions only occasionally to adjust performances or to increase the storage size, but caches like CDNs might need continuous adjustments during a single day.</p><p>Increasing or decreasing the number of nodes in a distributed cache might however be a pretty destructive action. Depending on the routing algorithm, if we add nodes (scale out) we might need to move data from existing ones to the newly added ones, and if we remove nodes (scale in) we will certainly lose the data contained in them. Both scenarios result in a (potentially massive) cache invalidation which can't be taken lightly.</p><p>The hash-based routing method presented in the previous section has terrible performances when it comes to scaling because any change in the number of servers impacts the key boundaries of the existing ones. Let's see a practical example of that and calculate the actual figures.</p><h3 id="scaling-out-with-hash-partitioning-d6de">Scaling out with hash partitioning</h3><p>Every time you consider a process or an algorithm you should have a look at how it behaves in the worst possible condition, to have a glimpse of what you might run into when you use it. For this reason, the following example considers a scale-out scenario in which all cache nodes are full. The best case is obviously when all nodes are empty, but in that case we don't need to scale out at all.</p><p>Let's consider a 32-bit hash function and 16 partitions numbered 0 to 15. Since the hash function space is 2<sup>32</sup> (4,294,967,296), each partition will contain 2<sup>28</sup> hash values (268,435,456). Each node is full, which means that all the possible 2<sup>28</sup> slots are assigned to a cached item, that is some data stored in the server that corresponds to that partition. The system is using the integer division routing system.</p><p>If we scale out to 17 partitions, increasing the pool by just by 1 node, each node will now contain a smaller part of the global data space, as now we split it among more nodes. In particular, each node used to contain 1/16 of the global data (268,435,456), and will now contain 1/17 of it (approx. 252,645,135). Our biggest problem is now managing the transition between the initial setup and the new one.</p><p>The first node hosted 1/16 of the data space, the keys from <code>0</code> to <code>268435455</code>. It will now contain 1/17 of the data space, the keys from <code>0</code> to <code>252645134</code>. To simplify the example it is useful to convert everything into a common unit of measure: the node used to contain 17/272 of the space (1/16) and contains now 16/272 (1/17) of it.</p><p>This means that 1/272 of the whole data space has to be moved to the second node, corresponding to the keys from <code>252645135</code> to <code>268435455</code>. It is important to note that these keys cannot be moved to the newly added node, but have to be moved to the second node because the algorithm we use maps keys to nodes in order.</p><p>This means that the second node will receive 1/272 of the whole data space. Since it originally already contained 17/272 of the whole space it should now theoretically contain 18/272 of it. However, as it happened for the first node, we want to balance the content and reduce it to 16/272, so now we have 2/272 of the whole space that we want to move to the third node.</p><div class="imageblock"><img src="/images/data-partitioning-and-consistent-hashing/ripple_effect.jpg"></div><p>So, we move 1/272 from node 1 to node 2, 2/272 from node 2 to node 3, 3/272 from node 3 to node 4, and going on with the example we end up moving 16/272 (1/17) from the 16th node to the 17th, which fills it with the correct amount of keys. However, in doing so we moved 136/272 (1/272 + 2/272 + 3/272 + ... + 16/272) of the data space between nodes, which is exactly 50% of it.</p><p>So, for any initial size and a scale out of 1 single node, we have to move 50% of the data stored in our cache, and it might only get worse by increasing the number of final nodes until we end up having to move almost 100% of it (in an extreme case). A similar effect plagues the scale-in action, where one or more nodes are removed from the pool, and the keys they contain have to be migrated to the remaining nodes, creating a ripple effect to redistribute the keys according to the algorithm.</p><p>Using a modulo routing strategy doesn't change things: as I mentioned before, the core issue is that the addition of new nodes changes the routing of the whole data space, requiring a massive migration of keys in the entire system.</p><h2 id="a-different-approach-be6e">A different approach<a class="headerlink" href="#a-different-approach-be6e" title="Permanent link">¶</a></h2><p>While the idea of using hash functions looked very promising, we quickly found that the trivial implementation has very poor performances in a dynamic setting. As we clearly saw in the previous section, the problem is that upon scaling more than half of the keys have to be moved across nodes, so if we could find a way to avoid this we could still use hash functions to scatter data uniformly across the nodes.</p><p>As you might have already figured out, the issue comes from the attempt to keep all nodes perfectly balanced. The modulo and integer division algorithms distribute keys evenly (as long as the hash function has a good diffusion), but this is a double-edged sword. The balance is extremely beneficial in a static environment, but it is also the Achilles heel of this architecture when we change the number of nodes.</p><p>When we design a system, requirements are paramount. Everything we add to the final product should be there to satisfy one or more requirements. However, often requirements clash with each other, and trying to implement all of them at once might lead to situations where there is no apparent solution. In such cases, it is useful to temporarily drop one or more requirements and investigate the options we have, and this is exactly what we can do in this case: maintaining balance is an important feature, but let's see what would happen if we didn't have that requirement.</p><p>If we don't care about balancing nodes we can solve the problem with a different approach. Instead of using the integer division to find the slot, we can keep a table of the minimum hash served by each slot and route requests according to that. Each row of the table will have a minimum hash and the node that serves them.</p><div class="imageblock"><img src="/images/data-partitioning-and-consistent-hashing/hash_table.jpg"></div><p>This means that when we increase the number of slots we can just drop a new slot anywhere and assign to it all the keys that fall under its domain. This means that the new node will become the owner of keys that belonged to another node as it happened before, but with an important difference. Now all keys come from another single node, and the amount of keys moved is a fraction of those contained in it (which is much less than half of the keys). In the worst case, we need to move all keys contained in a node, which once again is much less than half of the keys.</p><div class="imageblock"><img src="/images/data-partitioning-and-consistent-hashing/hash_table_add_node.jpg"></div><p>As you can see, this relieves the load of one single node. According to what we said before, we are not trying to balance the load of the whole cluster. If we could use this technique to cover multiple spaces with a single added node, though, we could relieve the load of more than one other node. In principle this is simple: we just need to add multiple rows with the same node to the table.</p><div class="imageblock"><img src="/images/data-partitioning-and-consistent-hashing/hash_table_add_node_multiple.jpg"></div><p>Pay attention to the fact that we added multiple rows, that is multiple partitions, but they are all served by the same physical node. This has several advantages:</p><ul><li>It fills the new node with keys coming from several different nodes without rippling effects.</li><li>The key transfer load is spread among different nodes, noticeably hitting only the new node.</li></ul><p>There is also an interesting turn of events: since keys for the new node are fetched from several different existing nodes, the process will keep the cluster balanced! This is a remarkable outcome: we temporarily dropped a requirement and found a solution that provides that exact requirement in a different way.</p><p>The key part of this new process is the idea that multiple partitions can be served by the same node. The only missing part at this point is a way to identify the new partitions (the sets of hashes) served by the new node in a deterministic way.</p><h2 id="consistent-hashing-1397">Consistent hashing<a class="headerlink" href="#consistent-hashing-1397" title="Permanent link">¶</a></h2><p>Finally, let me introduce consistent hashing as a technique to implement the process described above.</p><p>As we discussed in the previous section, the only missing part is an algorithm that produces a deterministic set of hash ranges for a single new node. These hash ranges represent the partitions served by that node and should be scattered across the whole hash space. It is important for them to be spread because this way they will each receive some keys from existing nodes, instead of migrating a bulk of keys from a single one. The more evenly spread, the better the distribution of the load and the more balanced the resulting cluster.</p><p>As we saw previously, any time we need to scatter data across a given space in a deterministic way, hash functions are a good choice, and they can be used in this case as well. The idea is simple: <em>each partition of a node is assigned a name and this name is hashed with the same function used to hash the keys stored in the system</em>. This will produce a deterministic value in the hash space, and <em>that value will be the minimum value served by that partition</em>. Thanks to diffusion the names of all partitions will generate different hash values that won't easily clash, and this is the way we generate the routing table.</p><p>Let's see an example, bearing in mind that the specific function can change among implementations.</p><p>For simplicity's sake, I used a custom hash function that outputs 28-bit hashes (7 hexadecimal digits). This makes it possible to compare hashes visually and simplifies the example. To do this I took the first 7 digits of the SHA1 hash with the following Python code</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">hash_name</span><span class="p">(</span><span class="n">name</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">hashlib</span><span class="o">.</span><span class="n">sha1</span><span class="p">(</span><span class="n">name</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s2">"utf-8"</span><span class="p">))</span><span class="o">.</span><span class="n">hexdigest</span><span class="p">()[:</span><span class="mi">7</span><span class="p">],</span> <span class="mi">16</span><span class="p">)</span>
</pre></div> </div> </div><p>thus creating a hash function whose values go from <code>0x0000000</code> to <code>0xfffffff</code>. At the end of the post you will find the Python code that I used to generate the following routing tables, and you are free to experiment using different settings.</p><p>WARNING: this is not a good hash function! SHA1 produces 160 bits hashes, so taking the first 28 bits reduces the hash space to a microscopic fraction of the total, as we go from 2<sup>160</sup> total hashes to 2<sup>28</sup>. Please keep in mind that this is done only to simplify the visualisation of the example.</p><p>All our nodes are called <code>server-X</code> with <code>X</code> being a letter of the English alphabet, thus giving us <code>server-a</code>, <code>server-b</code>, and so on. I decided to create 5 partitions per server, numbered from 0 to 4, which are generated appending <code>-Y</code> to the name, where <code>Y</code> is the number of the partition. For example:</p><div class="code"><div class="content"><div class="highlight"><pre>server-a-0 -- hash --> 148456820
server-a-1 -- hash --> 57674441
server-a-2 -- hash --> 216250418
server-a-3 -- hash --> 30595746
server-a-4 -- hash --> 23746828
</pre></div> </div> </div><p>If we do this for two nodes (<code>server-a</code> and <code>server-b</code>) and then sort the results we will get a full routing table</p><div class="code"><div class="content"><div class="highlight"><pre> 23746828 --> server-a-4 ( 6848918 hashes)
30595746 --> server-a-3 (27078695 hashes)
57674441 --> server-a-1 ( 3228787 hashes)
60903228 --> server-b-2 (17957108 hashes)
78860336 --> server-b-0 ( 7773725 hashes)
86634061 --> server-b-4 (61822759 hashes)
148456820 --> server-a-0 (67793598 hashes)
216250418 --> server-a-2 (17304439 hashes)
233554857 --> server-b-3 (29289666 hashes)
262844523 --> server-b-1 ( 5590932 hashes)
</pre></div> </div> </div><p>Remember that the hashes in the routing table are the minimum hash served by that partition. For example, the first line tells us that all hashes from <code>23746828</code> are served by the partition <code>server-a-4</code>, while hashes from <code>30595746</code> are served by the partition <code>server-a-3</code>. This means that the partition <code>server-a-4</code> serves 6848918 hashes (as you can read in the table). A key whose hash is <code>79249022</code> will be served by <code>server-b-0</code></p><div class="code"><div class="content"><div class="highlight"><pre> 60903228 --> server-b-2 (17957108 hashes)
78860336 --> server-b-0 ( 7773725 hashes)
^
|
79249022 -----------+
86634061 --> server-b-4 (61822759 hashes)
148456820 --> server-a-0 (67793598 hashes)
</pre></div> </div> </div><p>Since partitions are not physically separated, but are just virtual entities belonging to a node, the route table can be simplified to</p><div class="code"><div class="content"><div class="highlight"><pre> 23746828 -- > server-a (37156400 hashes)
60903228 -- > server-b (87553592 hashes)
148456820 -- > server-a (85098037 hashes)
233554857 -- > server-b (34880598 hashes)
</pre></div> </div> </div><hr><p>What we achieved is remarkable, but there are still two problems. Let's have a look at a simple routing table for three nodes with 5 partitions each</p><div class="code"><div class="title">3 nodes with 5 partitions each</div><div class="content"><div class="highlight"><pre> 23746828 --> server-a (23267855 hashes)
47014683 --> server-c (10659758 hashes)
57674441 --> server-a ( 3228787 hashes)
60903228 --> server-b (63557309 hashes)
124460537 --> server-c (23996283 hashes)
148456820 --> server-a (31382512 hashes)
179839332 --> server-c (36411086 hashes)
216250418 --> server-a (17304439 hashes)
233554857 --> server-b (15386579 hashes)
248941436 --> server-c (13903087 hashes)
262844523 --> server-b ( 5590932 hashes)
</pre></div> </div> </div><p>First, the lowest value is not 0, which means that there are some hashes (23,746,828 in this case) which are not served by any slot. Second, in general the distribution doesn't cover the space evenly, as some nodes receive too many keys compared to others. This second problem isn't actually visible in the setups I showed so far, but it becomes noticeable increasing the number of servers. For example, with two nodes we have this situation</p><div class="code"><div class="title">2 nodes with 5 partitions each</div><div class="content"><div class="highlight"><pre>server-a: 122254437 hashes
server-b: 146181018 hashes
</pre></div> </div> </div><p>while with 5 nodes it becomes</p><div class="code"><div class="title">5 nodes with 5 partitions each</div><div class="content"><div class="highlight"><pre>server-a: 64211359 hashes
server-b: 66179053 hashes
server-c: 57545779 hashes
server-d: 43217324 hashes
server-e: 37281940 hashes
</pre></div> </div> </div><p>As you can see, in the second case the load of <code>server-e</code> is 56% that of <code>server-b</code>.</p><hr><p>The first problem is easily solved assigning the initial hashes to the last node, that is considering the hash space mapped on a circle. This means that for 2 nodes with 5 partitions each we have</p><div class="code"><div class="title">Routing table of 2 nodes with 5 partitions each</div><div class="content"><div class="highlight"><pre>Full routing table
<span class="hll"> 0 --> server-b-1 (23746828 hashes)
</span> 23746828 --> server-a-4 (6848918 hashes)
30595746 --> server-a-3 (27078695 hashes)
57674441 --> server-a-1 (3228787 hashes)
60903228 --> server-b-2 (17957108 hashes)
78860336 --> server-b-0 (7773725 hashes)
86634061 --> server-b-4 (61822759 hashes)
148456820 --> server-a-0 (67793598 hashes)
216250418 --> server-a-2 (17304439 hashes)
233554857 --> server-b-3 (29289666 hashes)
262844523 --> server-b-1 (5590932 hashes)
Simplified routing table
<span class="hll"> 0 -- > server-b (23746828 hashes)
</span> 23746828 -- > server-a (37156400 hashes)
60903228 -- > server-b (87553592 hashes)
148456820 -- > server-a (85098037 hashes)
233554857 -- > server-b (34880598 hashes)
</pre></div> </div> </div><p>where the partition <code>server-b-1</code> contains the orphaned initial hashes.</p><p>The second problem is a matter of statistical approach. The hash function that we use to map the partition name to the key space cannot be controlled, as its diffusion property has been designed to avoid a regular spacing of values. However, if we increase the number of partitions we expect the hash function to spread values across the whole space. At that point, each partition will be assigned just a tiny key space, and the differences between partitions will be less noticeable. In other words, by increasing the number of partitions dramatically we should achieve a better distribution. Let's compare the results of 5 nodes with 2 partitions each</p><div class="code"><div class="title">5 nodes with 2 partitions each</div><div class="content"><div class="highlight"><pre>server-a 36500586
server-b 76678431
server-c 31738329
server-d 56183426
server-e 67334683
</pre></div> </div> </div><p>with the results of 5 nodes with 3000 partitions each</p><div class="code"><div class="title">5 nodes with 3000 partitions each</div><div class="content"><div class="highlight"><pre>server-a 53385222
server-b 53855877
server-c 53755762
server-d 53597662
server-e 53840932
</pre></div> </div> </div><p>There is clearly an upper limit to the number of partitions that we can create. If we create more partitions than the possible number of hashes we will end up having empty ones and incurring routing errors as some of them will clash, but this is a purely theoretical case: using standard real hash functions we generate hashes of at least 160 bits, which means a codomain of 2<sup>160</sup> possible values (more than 10<sup>48</sup>). With 10,000 nodes (which is a considerable amount of servers in 2022) the threshold would be greater than 10<sup>44</sup> partitions per server.</p><p>So far, we achieved great results, but we already managed to properly partition the space with simple techniques. The real power of consistent hashing is in the way it behaves in a dynamic setting.</p><h2 id="consistent-hashing-and-scaling-649a">Consistent hashing and scaling<a class="headerlink" href="#consistent-hashing-and-scaling-649a" title="Permanent link">¶</a></h2><p>The interesting thing about consistent hashing is its amazing behaviour in a dynamic environment. As you might remember, the problem with hash partitioning was that a change in the number of nodes had ripple effects that resulted in a massive migration of at least half the keys.</p><p>With consistent hashing, when we add a new node we need to generate the hash values for that and put them in the routing table, and at that point we need to migrate the keys that fall under the domain of the newly created slots. Let's see an example before we discuss the performances.</p><p>The initial setup is 2 nodes with 5 partitions each</p><div class="code"><div class="title">2 nodes with 5 partitions</div><div class="content"><div class="highlight"><pre>Full routing table
0 --> server-b-1 (23746828 hashes)
23746828 --> server-a-4 (6848918 hashes)
30595746 --> server-a-3 (27078695 hashes)
57674441 --> server-a-1 (3228787 hashes)
60903228 --> server-b-2 (17957108 hashes)
78860336 --> server-b-0 (7773725 hashes)
86634061 --> server-b-4 (61822759 hashes)
148456820 --> server-a-0 (67793598 hashes)
216250418 --> server-a-2 (17304439 hashes)
233554857 --> server-b-3 (29289666 hashes)
262844523 --> server-b-1 (5590932 hashes)
Simplified routing table
0 -- > server-b (23746828 hashes)
23746828 -- > server-a (37156400 hashes)
60903228 -- > server-b (87553592 hashes)
148456820 -- > server-a (85098037 hashes)
233554857 -- > server-b (34880598 hashes)
Stats
server-a 122254437
server-b 146181018
TOTAL HASHES: 268435455/268435455
</pre></div> </div> </div><p>if we add one node we migrate to this new setup</p><div class="code"><div class="title">3 nodes with 5 partitions</div><div class="content"><div class="highlight"><pre>Full routing table
0 --> server-b-1 (23746828 hashes)
23746828 --> server-a-4 (6848918 hashes)
30595746 --> server-a-3 (16418937 hashes)
47014683 --> server-c-3 (10659758 hashes)
57674441 --> server-a-1 (3228787 hashes)
60903228 --> server-b-2 (17957108 hashes)
78860336 --> server-b-0 (7773725 hashes)
86634061 --> server-b-4 (37826476 hashes)
124460537 --> server-c-2 (23996283 hashes)
148456820 --> server-a-0 (31382512 hashes)
179839332 --> server-c-1 (25303093 hashes)
205142425 --> server-c-4 (11107993 hashes)
216250418 --> server-a-2 (17304439 hashes)
233554857 --> server-b-3 (15386579 hashes)
248941436 --> server-c-0 (13903087 hashes)
262844523 --> server-b-1 (5590932 hashes)
Simplified routing table
0 -- > server-b (23746828 hashes)
23746828 -- > server-a (23267855 hashes)
47014683 -- > server-c (10659758 hashes)
57674441 -- > server-a ( 3228787 hashes)
60903228 -- > server-b (63557309 hashes)
124460537 -- > server-c (23996283 hashes)
148456820 -- > server-a (31382512 hashes)
179839332 -- > server-c (36411086 hashes)
216250418 -- > server-a (17304439 hashes)
233554857 -- > server-b (15386579 hashes)
248941436 -- > server-c (13903087 hashes)
262844523 -- > server-b ( 5590932 hashes)
Stats
server-a 75183593
server-b 108281648
server-c 84970214
TOTAL HASHES: 268435455/268435455
</pre></div> </div> </div><p>Let's have a closer look to what happens with <code>server-c</code></p><div class="code"><div class="content"><div class="highlight"><pre>Simplified routing table
0 -- > server-b (23746828 hashes)
23746828 -- > server-a (23267855 hashes) ----+ 10659758 hashes
| from server-a
47014683 -- > server-c (10659758 hashes) <---+
57674441 -- > server-a ( 3228787 hashes)
60903228 -- > server-b (63557309 hashes) ----+ 23996283 hashes
| from server-b
124460537 -- > server-c (23996283 hashes) <---+
148456820 -- > server-a (31382512 hashes) ----+ 36411086 hashes
| from server-a
179839332 -- > server-c (36411086 hashes) <---+
216250418 -- > server-a (17304439 hashes)
233554857 -- > server-b (15386579 hashes) ----+ 13903087 hashes
| from server-b
248941436 -- > server-c (13903087 hashes) <---+
262844523 -- > server-b ( 5590932 hashes)
</pre></div> </div> </div><p>Globally, <code>server-c</code> receives 47,070,844 hashes from <code>server-a</code> and 37,899,370 hashes from <code>server-b</code>, which results in a migration of approximately 30% of the total hashes. As you can see there is no ripple effect here, as the boundaries of the existing partitions do not change.</p><p>Let's consider the performances in the worst case when we add one single node. If we are terribly unlucky (and we use a hash function with clear issues) each partition of the new node will cover completely a partition of an existing node. Assuming that the initial setup with N nodes created a balanced cluster, each node contains 1/Nth of the total keys, and in the worst case we need to move all of them from an existing node to the newly added one.</p><p>So, adding one node to a cluster of N nodes using consistent hashing results, in the worst case, in the migration of 1/Nth of the keys. In the previous example, then, we expected to migrate <em>at most</em> 50% of the keys (1/2), and we ended up migrating 30$ of them.</p><p>This is a terrific result. Not only it's much better than the previous one (<em>at least</em> 50% of the keys), but it gets better increasing the size of the cluster. In a cluster with 100 nodes, adding a node will result (in the worst case!) in the migration of 1/100 of the keys.</p><h2 id="source-code-b277">Source code<a class="headerlink" href="#source-code-b277" title="Permanent link">¶</a></h2><p>All routing tables shown in the post have been created with the following Python script. Please bear in mind that this is just demo code, so things haven't been optimised or designed particularly well. Feel free to change the hash function and the parameters of the script to experiment and see what consistent hashing can do.</p><div class="code"><div class="title"><code>consistent_hashing_demo.py</code></div><div class="content"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">hashlib</span>
<span class="kn">import</span> <span class="nn">itertools</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">string</span>
<span class="kn">from</span> <span class="nn">operator</span> <span class="kn">import</span> <span class="n">itemgetter</span>
<span class="n">NUM_NODES</span> <span class="o">=</span> <span class="mi">3</span>
<span class="n">NUM_PARTITIONS</span> <span class="o">=</span> <span class="mi">5</span>
<span class="k">def</span> <span class="nf">hash_name</span><span class="p">(</span><span class="n">name</span><span class="p">):</span>
<span class="n">encoded_name</span> <span class="o">=</span> <span class="n">name</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s2">"utf-8"</span><span class="p">)</span>
<span class="n">hash_encoded_name</span> <span class="o">=</span> <span class="n">hashlib</span><span class="o">.</span><span class="n">sha1</span><span class="p">(</span><span class="n">encoded_name</span><span class="p">)</span><span class="o">.</span><span class="n">hexdigest</span><span class="p">()</span>
<span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">hash_encoded_name</span><span class="p">[:</span><span class="mi">7</span><span class="p">],</span> <span class="mi">16</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">create_partitions</span><span class="p">(</span><span class="n">node_name</span><span class="p">,</span> <span class="n">partitions</span><span class="p">):</span>
<span class="n">partition_hashes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">partition_number</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">partitions</span><span class="p">):</span>
<span class="n">partition_name</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">node_name</span><span class="si">}</span><span class="s2">-</span><span class="si">{</span><span class="n">partition_number</span><span class="si">}</span><span class="s2">"</span>
<span class="n">partition_hash</span> <span class="o">=</span> <span class="n">hash_name</span><span class="p">(</span><span class="n">partition_name</span><span class="p">)</span>
<span class="n">partition_hashes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span>
<span class="p">{</span>
<span class="s2">"min_hash"</span><span class="p">:</span> <span class="n">partition_hash</span><span class="p">,</span>
<span class="s2">"partition_name"</span><span class="p">:</span> <span class="n">partition_name</span><span class="p">,</span>
<span class="s2">"node_name"</span><span class="p">:</span> <span class="n">node_name</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">partition_hashes</span>
<span class="k">def</span> <span class="nf">create_routing_table</span><span class="p">(</span><span class="n">node_names</span><span class="p">,</span> <span class="n">partitions</span><span class="p">):</span>
<span class="n">table</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">node_name</span> <span class="ow">in</span> <span class="n">node_names</span><span class="p">:</span>
<span class="n">table</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">create_partitions</span><span class="p">(</span><span class="n">node_name</span><span class="p">,</span> <span class="n">partitions</span><span class="p">))</span>
<span class="n">table</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">table</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="n">itemgetter</span><span class="p">(</span><span class="s2">"min_hash"</span><span class="p">))</span>
<span class="k">return</span> <span class="n">table</span>
<span class="k">if</span> <span class="n">NUM_NODES</span> <span class="o">></span> <span class="nb">len</span><span class="p">(</span><span class="n">string</span><span class="o">.</span><span class="n">ascii_lowercase</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Too many servers"</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="p">[</span><span class="sa">f</span><span class="s2">"server-</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s2">"</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">string</span><span class="o">.</span><span class="n">ascii_lowercase</span><span class="p">[:</span><span class="n">NUM_NODES</span><span class="p">]]</span>
<span class="n">routing_table</span> <span class="o">=</span> <span class="n">create_routing_table</span><span class="p">(</span><span class="n">nodes</span><span class="p">,</span> <span class="n">NUM_PARTITIONS</span><span class="p">)</span>
<span class="n">routing_table</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"min_hash"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s2">"partition_name"</span><span class="p">:</span> <span class="n">routing_table</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="s2">"partition_name"</span><span class="p">],</span>
<span class="s2">"node_name"</span><span class="p">:</span> <span class="n">routing_table</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="s2">"node_name"</span><span class="p">],</span>
<span class="p">}</span>
<span class="p">]</span> <span class="o">+</span> <span class="n">routing_table</span>
<span class="n">routing_table_shift</span> <span class="o">=</span> <span class="n">routing_table</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="o">+</span> <span class="p">[</span>
<span class="p">{</span><span class="s2">"min_hash"</span><span class="p">:</span> <span class="mh">0xFFFFFFF</span><span class="p">,</span> <span class="s2">"partition_name"</span><span class="p">:</span> <span class="s2">"END"</span><span class="p">}</span>
<span class="p">]</span>
<span class="n">full_routing_table</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">routing_table</span><span class="p">,</span> <span class="n">routing_table_shift</span><span class="p">):</span>
<span class="n">full_routing_table</span><span class="o">.</span><span class="n">append</span><span class="p">(</span>
<span class="p">{</span>
<span class="s2">"min_hash"</span><span class="p">:</span> <span class="n">i</span><span class="p">[</span><span class="s2">"min_hash"</span><span class="p">],</span>
<span class="s2">"partition_name"</span><span class="p">:</span> <span class="n">i</span><span class="p">[</span><span class="s2">"partition_name"</span><span class="p">],</span>
<span class="s2">"node_name"</span><span class="p">:</span> <span class="n">i</span><span class="p">[</span><span class="s2">"node_name"</span><span class="p">],</span>
<span class="s2">"served_hashes"</span><span class="p">:</span> <span class="n">j</span><span class="p">[</span><span class="s2">"min_hash"</span><span class="p">]</span> <span class="o">-</span> <span class="n">i</span><span class="p">[</span><span class="s2">"min_hash"</span><span class="p">],</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Full routing table"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">full_routing_table</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">r</span><span class="p">[</span><span class="s2">"min_hash"</span><span class="p">]</span><span class="si">:</span><span class="s1">9</span><span class="si">}</span><span class="s1"> --> </span><span class="si">{</span><span class="n">r</span><span class="p">[</span><span class="s2">"partition_name"</span><span class="p">]</span><span class="si">}</span><span class="s1"> (</span><span class="si">{</span><span class="n">r</span><span class="p">[</span><span class="s2">"served_hashes"</span><span class="p">]</span><span class="si">}</span><span class="s1"> hashes)'</span><span class="p">)</span>
<span class="n">grouped_routing_table</span> <span class="o">=</span> <span class="n">itertools</span><span class="o">.</span><span class="n">groupby</span><span class="p">(</span>
<span class="n">full_routing_table</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="n">itemgetter</span><span class="p">(</span><span class="s2">"node_name"</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">simplified_routing_table</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">grouped_routing_table</span><span class="p">:</span>
<span class="n">consecutive_partitions</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">r</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="n">simplified_routing_table</span><span class="o">.</span><span class="n">append</span><span class="p">(</span>
<span class="p">{</span>
<span class="s2">"node_name"</span><span class="p">:</span> <span class="n">r</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span>
<span class="s2">"min_hash"</span><span class="p">:</span> <span class="n">consecutive_partitions</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="s2">"min_hash"</span><span class="p">],</span>
<span class="s2">"served_hashes"</span><span class="p">:</span> <span class="nb">sum</span><span class="p">([</span><span class="n">i</span><span class="p">[</span><span class="s2">"served_hashes"</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">consecutive_partitions</span><span class="p">]),</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="nb">print</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Simplified routing table"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">simplified_routing_table</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">r</span><span class="p">[</span><span class="s2">"min_hash"</span><span class="p">]</span><span class="si">:</span><span class="s1">9</span><span class="si">}</span><span class="s1"> -- > </span><span class="si">{</span><span class="n">r</span><span class="p">[</span><span class="s2">"node_name"</span><span class="p">]</span><span class="si">}</span><span class="s1"> (</span><span class="si">{</span><span class="n">r</span><span class="p">[</span><span class="s2">"served_hashes"</span><span class="p">]</span><span class="si">:</span><span class="s1">8</span><span class="si">}</span><span class="s1"> hashes)'</span><span class="p">)</span>
<span class="nb">print</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Stats"</span><span class="p">)</span>
<span class="n">stats</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">nodes</span><span class="p">:</span>
<span class="n">slots</span> <span class="o">=</span> <span class="nb">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="s2">"node_name"</span><span class="p">]</span> <span class="o">==</span> <span class="n">node</span><span class="p">,</span> <span class="n">simplified_routing_table</span><span class="p">)</span>
<span class="n">total_hashes</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">([</span><span class="n">i</span><span class="p">[</span><span class="s2">"served_hashes"</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">slots</span><span class="p">])</span>
<span class="n">stats</span><span class="o">.</span><span class="n">append</span><span class="p">({</span><span class="s2">"node_name"</span><span class="p">:</span> <span class="n">node</span><span class="p">,</span> <span class="s2">"served_hashes"</span><span class="p">:</span> <span class="n">total_hashes</span><span class="p">})</span>
<span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">stats</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">r</span><span class="p">[</span><span class="s2">"node_name"</span><span class="p">],</span> <span class="n">r</span><span class="p">[</span><span class="s2">"served_hashes"</span><span class="p">])</span>
<span class="n">total_hashes</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">([</span><span class="n">i</span><span class="p">[</span><span class="s2">"served_hashes"</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">stats</span><span class="p">])</span>
<span class="nb">print</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"TOTAL HASHES: </span><span class="si">{</span><span class="n">total_hashes</span><span class="si">}</span><span class="s2">/</span><span class="si">{</span><span class="mi">2</span><span class="o">**</span><span class="mi">28</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</pre></div> </div> </div><h2 id="final-words-9803">Final words<a class="headerlink" href="#final-words-9803" title="Permanent link">¶</a></h2><p>I hope this long post was useful to introduce you to the topic of partitioning and in general to system design. As I mentioned, such concepts are currently in use by well-known systems, and still discussed as none of them is perfect, so it is worth understanding the fundamental issues before adopting a specific solution.</p><h2 id="resources-edc5">Resources<a class="headerlink" href="#resources-edc5" title="Permanent link">¶</a></h2><ul><li>Martin Kleppmann, <em>Designing Data-Intensive Applications</em>, Chapter 6 "Partitioning", O’Reilly 2017 <a href="https://www.oreilly.com/library/view/designing-data-intensive-applications/9781491903063/">official site</a>.</li><li>The <a href="https://en.wikipedia.org/wiki/Consistent_hashing">Wikipedia article</a> about consistent hashing.</li><li><a href="https://www.toptal.com/big-data/consistent-hashing">A Guide to Consistent Hashing</a> by Juan Pablo Carzolio.</li><li>The <a href="https://www.cs.princeton.edu/courses/archive/fall09/cos518/papers/chash.pdf">original article</a> by David Karger et al.: "Consistent Hashing and Random Trees: Distributed Caching protocols for Relieving Hot Spots ont the World Wide Web".</li><li>An <a href="https://arxiv.org/pdf/1406.2294.pdf">alternative algorithm</a> by John Lamping and Eric Veach: "A Fast, Minimal Memory, Consistent Hash Algorithm".</li></ul><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>From Docker CLI to Docker Compose2022-02-19T15:00:00+01:002022-03-17T10:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2022-02-19:/blog/2022/02/19/from-docker-cli-to-docker-compose/<p> A hands-on post that shows how to build a system with Docker and which problems Docker Compose solves</p><p>In this post I will show you how and why Docker Compose is useful, building a simple application written in Python that uses PostgreSQL. I think it is worth going through such an exercise to see how technologies that we might be already familiar with actually simplify workflows that would otherwise definitely be more complicated.</p><p>The name of the demo application I will develop is a very unimaginative <code>whale</code>, that shouldn't clash with any other name introduced by the tools I will use. Every time you see something with <code>whale</code> in it you know that I am referring to a value that you can change according to your setup.</p><p>Before we start, please create a directory to host all the files we will create. I will refer to this directory as the "project directory". </p><h2 id="postgresql-090e">PostgreSQL<a class="headerlink" href="#postgresql-090e" title="Permanent link">¶</a></h2><p>Since the application will connect to a PostgreSQL database the first thing we can explore is how to run that in a Docker container.</p><p>The official Postgres image can be found <a href="https://hub.docker.com/_/postgres">here</a>, and I highly recommend taking the time to properly read the documentation, as it contains a myriad of details that you should be familiar with.</p><p>For the time being, let's focus on the environment variables that the image requires you to set.</p><h3 id="password-cd2a">Password</h3><p>The first variable is <code>POSTGRES_PASSWORD</code>, which is the only mandatory configuration value (unless you disable authentication which is not recommended). Indeed, if you run the image without setting this value, you get this message</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run postgres
Error: Database is uninitialized and superuser password is not specified.
You must specify POSTGRES_PASSWORD to a non-empty value for the
superuser. For example, "-e POSTGRES_PASSWORD=password" on "docker run".
You may also use "POSTGRES_HOST_AUTH_METHOD=trust" to allow all
connections without a password. This is *not* recommended.
See PostgreSQL documentation about "trust":
https://www.postgresql.org/docs/current/auth-trust.html
</pre></div> </div> </div><p>This value is very interesting because it's a secret. So, while I will treat it as a simple configuration value in the first stages of the setup, later we will need to discuss how to manage it properly.</p><h3 id="superuser-93cc">Superuser</h3><p>Being a production-grade database, Postgres allows you to specify users, groups, and permissions in a fine-grained fashion. I won't go into that as it's usually more a matter of database administration and application development, but we need to define at least the superuser. The default value for this image is <code>postgres</code>, but you can change it setting <code>POSTGRES_USER</code>.</p><h3 id="database-name-796b">Database name</h3><p>If you do not specify the value of <code>POSTGRES_DB</code>, this image will create a default database with the name of the superuser.</p><hr><p>A note of warning here. If you omit both the database name and the user you will end up with the superuser <code>postgres</code> and database <code>postgres</code>. The <a href="https://www.postgresql.org/docs/current/creating-cluster.html">official documentation</a> states that</p><div class="code"><div class="content"><div class="highlight"><pre>After initialization, a database cluster will contain a database named
postgres, which is meant as a default database for use by utilities,
users and third party applications. The database server itself does not
require the postgres database to exist, but many external utility programs
assume it exists.
</pre></div> </div> </div><p>This mean that it is not ideal to use that as the database for our application. So, unless you are just trying out a quick piece of code, my recommendation is to always configure all three values: <code>POSTGRES_PASSWORD</code>, <code>POSTGRES_USER</code>, and <code>POSTGRES_DB</code>.</p><p>We can run the image with</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run -d \
-e POSTGRES_PASSWORD=whale_password \
-e POSTGRES_DB=whale_db \
-e POSTGRES_USER=whale_user \
postgres:13
</pre></div> </div> </div><p>As you can see I run the image in <a href="https://docs.docker.com/engine/reference/run/#detached--d">detached mode</a>. This image is not meant to be interactive, as Postgres is by it's very nature a daemon. To connect in an interactive way we need to use the tool <code>psql</code>, which is provided by this image. Please note that I'm running <code>postgres:13</code> only to keep the post consistent with what you will see if you read it in the future, you are clearly free to use any version of the engine.</p><p>The ID of the container is returned by <code>docker run</code> but we can retrieve it any time running <code>docker ps</code>. Using IDs is however pretty complicated, and looking at the command history is not immediately clear what you have been doing at a certain point in time. For this reason, it's a good idea to name the containers.</p><p>Stop the previous container and run it again with</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run -d \
--name whale-postgres \
-e POSTGRES_PASSWORD=whale_password \
-e POSTGRES_DB=whale_db \
-e POSTGRES_USER=whale_user \
postgres:13
</pre></div> </div> </div><div class="infobox"><i class="fa fa-info-circle"></i><div class="title">Stopping containers</div><div><p>You can stop containers using <code>docker stop ID</code>. This <a href="https://docs.docker.com/engine/reference/commandline/stop/#extended-description">gives containers a grace period</a> to react to the <code>SIGTERM</code> signal, for example to properly close files and terminate connections, and then terminates it with <code>SIGKILL</code>. You can also force it to stop unconditionally using <code>docker kill ID</code> which sends <code>SIGKILL</code> immediately.</p>
<p>In either case, however, you might want to remove the container, that otherwise will be kept indefinitely by Docker. This can become a problem when containers are named, as you can't reuse a name that is currently assigned to a container.</p>
<p>To remove a container you have to run <code>docker rm ID</code>, but you can leverage the fact that both <code>docker stop</code> and <code>docker kill</code> return the ID of the container to pipe the termination and the removal</p>
<div class="code"><div class="content"><div class="highlight"><pre>$ docker stop ID | xargs docker rm
</pre></div> </div> </div>
<p>Otherwise, you can use <code>docker rm -f ID</code>, which corresponds to <code>docker kill</code> followed by <code>docker rm</code>. If you name a container, however, you can use its name instead of the ID.</p></div></div><hr><p>Now we can connect to the database using the executable <code>psql</code> provided in the image itself. To execute a command inside a container we use <code>docker exec</code> and this time we will specify <code>-it</code> to open an interactive session. <code>psql</code> uses by default the user name <code>root</code>, and the database with the same name as the user, so we need to specify both. The header informs me that the image is running PostgreSQL 13.5 on Debian.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker exec -it whale-postgres psql -U whale_user whale_db
psql (13.5 (Debian 13.5-1.pgdg110+1))
Type "help" for help.
whale_db=#
</pre></div> </div> </div><div class="infobox"><i class="fa fa-info-circle"></i><div class="title">Postgres trust</div><div><p>You might be surprised by the fact that <code>psql</code> didn't ask for the password that we set when we run the container. This happens because the server trusts local connections, and when we run <code>psql</code> inside the container we are on <code>localhost</code>.</p>
<p>If you are curious about trust in Postgres you can see the configuration file with</p>
<div class="code"><div class="content"><div class="highlight"><pre>$ docker exec -it whale-postgres \
cat /var/lib/postgresql/data/pg_hba.conf
</pre></div> </div> </div>
<p>where you can spot the lines</p>
<div class="code"><div class="content"><div class="highlight"><pre># TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all trust
</pre></div> </div> </div>
<p>You can find more information about Postgres trust in <a href="https://www.postgresql.org/docs/current/auth-trust.html">the official documentation</a>.</p></div></div><p>Here, I can list all the databases with <code>\l</code>. You can see all <code>psql</code> commands and the rest of the documentation at <a href="https://www.postgresql.org/docs/current/app-psql.html">https://www.postgresql.org/docs/current/app-psql.html</a>.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker exec -it whale-postgres psql -U whale_user whale_db
psql (13.5 (Debian 13.5-1.pgdg110+1))
Type "help" for help.
whale_db=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+------------+----------+------------+------------+---------------------------
postgres | whale_user | UTF8 | en_US.utf8 | en_US.utf8 |
template0 | whale_user | UTF8 | en_US.utf8 | en_US.utf8 | =c/whale_user +
| | | | | whale_user=CTc/whale_user
template1 | whale_user | UTF8 | en_US.utf8 | en_US.utf8 | =c/whale_user +
| | | | | whale_user=CTc/whale_user
whale_db | whale_user | UTF8 | en_US.utf8 | en_US.utf8 |
(4 rows)
whale_db=#
</pre></div> </div> </div><p>As you can see, the database called <code>postgres</code> has been created as part of the initialisation, as clarified previously. You can exit <code>psql</code> with <code>Ctrl-D</code> or <code>\q</code>.</p><hr><p>If we want the database to be accessible from outside we need to publish a port. The image <strong>exposes</strong> port 5432 (see the <a href="https://github.com/docker-library/postgres/blob/master/13/alpine/Dockerfile#L190">source code</a>), which tells us where the server is listening. To <strong>publish</strong> the port towards the host system we can add <code>-p 5432:5432</code>. Please remember that exposing a port in Docker basically means to add some metadata that informs the user of the image, but doesn't affect the way it runs.</p><p>Stop the container (you can use its name now) and run it again with</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run -d \
--name whale-postgres \
-e POSTGRES_PASSWORD=whale_password \
-e POSTGRES_DB=whale_db \
-e POSTGRES_USER=whale_user \
-p 5432:5432 postgres:13
</pre></div> </div> </div><p>Running <code>docker ps</code> we can see that the container publishes the port now (<code>0.0.0.0:5432->5432/tcp</code>). We can double-check it with <code>ss</code> ("socket statistics")</p><div class="code"><div class="content"><div class="highlight"><pre>$ ss -nulpt | grep 5432
tcp LISTEN 0 4096 0.0.0.0:5432 0.0.0.0:*
tcp LISTEN 0 4096 [::]:5432 [::]:*
</pre></div> </div> </div><p>Please note that usually <code>ss</code> won't tell you the name of the process using that port because the process is run by <code>root</code>. If you run <code>ss</code> with <code>sudo</code> you will see it</p><div class="code"><div class="content"><div class="highlight"><pre>$ sudo ss -nulpt | grep 5432
tcp LISTEN 0 4096 0.0.0.0:5432 0.0.0.0:* users:(("docker-proxy",pid=1262717,fd=4))
tcp LISTEN 0 4096 [::]:5432 [::]:* users:(("docker-proxy",pid=1262724,fd=4))
</pre></div> </div> </div><p>Unfortunately, <code>ss</code> is not available on macOS. On that platform (and on Linux as well) you can use <code>lsof</code> with <code>grep</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ sudo lsof -i -p -n | grep 5432
docker-pr 219643 root 4u IPv4 2945982 0t0 TCP *:5432 (LISTEN)
docker-pr 219650 root 4u IPv6 2952986 0t0 TCP *:5432 (LISTEN)
</pre></div> </div> </div><p>or directly using the option <code>-i</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ sudo lsof -i :5432
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
docker-pr 219643 root 4u IPv4 2945982 0t0 TCP *:postgresql (LISTEN)
docker-pr 219650 root 4u IPv6 2952986 0t0 TCP *:postgresql (LISTEN)
</pre></div> </div> </div><p>Please note that <code>docker-pr</code> in the output above is just <code>docker-proxy</code> truncated, matching what we saw with <code>ss</code> previously.</p><p>If you want to publish the container's port 5432 to a different port on the host you can just use <code>-p ANY_NUMBER:5432</code>. Remember however that port numbers under 1024 are <em>privileged</em> or <em>well-known</em>, which means that they are assigned by default to specific services (<a href="https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers#Well-known_ports">listed here</a>).</p><p>This means that in theory you can use <code>-p 80:5432</code> for your database container, exposing it on port 80 of your host. In practice this will result in a lot of headaches and a bunch of developers chasing you with spikes and shovels.</p><hr><p>Now that we exposed a port we can connect to the database running <code>psql</code> in an ephemeral container. "Ephemeral" means that a resource (in this case a Docker container) is run just for the time necessary to serve a specific purpose, as opposed to "permanent". This way we can simulate someone that tries to connect to the Docker container from a different computer on the network.</p><p>Since <code>psql</code> is provided by the image <code>postgres</code> we can in theory run that passing the hostname with <code>-h localhost</code>, but if you try it you will be disappointed.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run -it postgres:13 psql -h localhost -U whale_user whale_db
psql: error: connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (::1), port 5432 failed: Cannot assign requested address
Is the server running on that host and accepting TCP/IP connections?
</pre></div> </div> </div><p>This is correct, as that container runs in a bridge network where <code>localhost</code> is the container itself. To make it work we need to run the container as part of the host network (that is the same network our computer is running on). This can be done with <code>--network=host</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run -it \
--network=host postgres:13 \
psql -h localhost -U whale_user whale_db
Password for user whale_user:
psql (13.5 (Debian 13.5-1.pgdg110+1))
Type "help" for help.
whale_db=#
</pre></div> </div> </div><p>Please note that now <code>psql</code> asks for a password (that you know because you set it when we run the container <code>whale-postgres</code>). This happens because the tool is not run on the same node as the database server any more, so PostgreSQL doesn't trust it.</p><h2 id="volumes-0cfc">Volumes<a class="headerlink" href="#volumes-0cfc" title="Permanent link">¶</a></h2><p>If we used a structured framework in Python, we could leverage an ORM like SQLAlchemy to map classes to database tables. The model definitions (or changes) can be captured into little scripts called migrations that are applied to the database, and those can also be used to insert some initial data. For this example I will go a simpler route, that is to initialise the database using SQL directly.</p><p>I do not recommend this approach for a real project but it should be good enough in this case. In particular, it will allow me to demonstrate how to use volumes in Docker.</p><p>Make sure the container <code>whale-postgres</code> is running (with or without publishing the port, it's not important at the moment). Connect to the container using <code>psql</code> and run the following two SQL commands (make sure you are connected to the database <code>whale_db</code>)</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">recipes</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">recipe_id</span><span class="w"> </span><span class="nb">INT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span>
<span class="w"> </span><span class="n">recipe_name</span><span class="w"> </span><span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">30</span><span class="p">)</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span>
<span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">recipe_id</span><span class="p">),</span>
<span class="w"> </span><span class="k">UNIQUE</span><span class="w"> </span><span class="p">(</span><span class="n">recipe_name</span><span class="p">)</span>
<span class="p">);</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">recipes</span><span class="w"> </span>
<span class="w"> </span><span class="p">(</span><span class="n">recipe_id</span><span class="p">,</span><span class="w"> </span><span class="n">recipe_name</span><span class="p">)</span><span class="w"> </span>
<span class="k">VALUES</span><span class="w"> </span>
<span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="s1">'Tacos'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="s1">'Tomato Soup'</span><span class="p">),</span>
<span class="w"> </span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="s1">'Grilled Cheese'</span><span class="p">);</span>
</pre></div> </div> </div><p>This code creates a table called <code>recipes</code> and inserts 3 rows with an <code>id</code> and a <code>name</code>. The output of the above commands should be</p><div class="code"><div class="content"><div class="highlight"><pre>CREATE TABLE
INSERT 0 3
</pre></div> </div> </div><p>You can double check that the database contains the table with <code>\dt</code></p><div class="code"><div class="content"><div class="highlight"><pre>whale_db=# \dt
List of relations
Schema | Name | Type | Owner
--------+---------+-------+------------
public | recipes | table | whale_user
(1 row)
</pre></div> </div> </div><p>and that the table contains three rows with a <code>select</code>.</p><div class="code"><div class="content"><div class="highlight"><pre>whale_db=# select * from recipes;
recipe_id | recipe_name
-----------+----------------
1 | Tacos
2 | Tomato Soup
3 | Grilled Cheese
(3 rows)
</pre></div> </div> </div><p>Now, the problem with containers is that they do not store data permanently. While the container is running there are no issues, as a matter of fact you can terminate <code>psql</code>, connect, and run the <code>select</code> again, and you will see the same data.</p><p>If we stop the container and run it again, though, we will quickly realise that the values stored in the database are gone.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker stop whale-postgres | xargs docker rm
whale-postgres
$ docker run -d \
--name whale-postgres \
-e POSTGRES_PASSWORD=whale_password \
-e POSTGRES_DB=whale_db \
-e POSTGRES_USER=whale_user \
-p 5432:5432 postgres:13
4a647ebef78e32bb4733484a6e435780e17a69b643e872613ca50115d60d54ce
$ docker exec -it whale-postgres \
psql -U whale_user whale_db -c "select * from recipes"
ERROR: relation "recipes" does not exist
LINE 1: select * from recipes
^
</pre></div> </div> </div><hr><p>Containers have been created with isolation in mind, which is why by default nothing of what happens inside the container is connected with the host and is preserved when the container is destroyed.</p><p>As happened with ports, however, we need to establish some communication between containers and the host system, and we also want to keep data after the container has been destroyed. The solution in Docker is to use volumes.</p><p>There are three types of volumes in Docker: <em>host</em>, <em>anonymous</em>, and <em>named</em>. Host volumes are a way to mount inside the container a path on the host's filesystem, and while they are useful to exchange data between the host and the container, they also often have permissions issues. Generally speaking, containers define users whose IDs are not mapped to the host's ones, which means that the files written by the container might end up belonging to non-existing users.</p><p>Anonymous and named volumes are simply virtual filesystems created and managed independently from containers. These can be connected with a running container so the latter can use the data contained in them and store data that will survive its termination. The only difference between named an anonymous volumes is the name that allows you to easily manage them. For this reason, I think it's not really useful to consider anonymous volumes, which is why I will focus on named ones.</p><p>You can manage volumes using <code>docker volume</code>, that provides several subcommands such as <code>create</code>, and <code>rm</code>. You can then <a href="https://docs.docker.com/engine/reference/run/#volume-shared-filesystems">attach a named volume to a container</a> when you run it using the option <code>-v</code> of <code>docker run</code>. This creates the volume if it's not already existing, so this is the standard way many of us create a volume.</p><p>Stop and remove the running Postgres container and run it again with a named volume</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker stop whale-postgres | xargs docker rm
$ docker run -d \
--name whale-postgres \
-e POSTGRES_PASSWORD=whale_password \
-e POSTGRES_DB=whale_db \
-e POSTGRES_USER=whale_user \
-p 5432:5432 \
-v whale_dbdata:/var/lib/postgresql/data \
postgres:13
</pre></div> </div> </div><p>This will create the volume named <code>whale_dbdata</code> and connect it to the path <code>/var/lib/postgresql/data</code> in the container that we are running. That path happens to be the one where Postgres stores the actual database, as you can see from <a href="https://www.postgresql.org/docs/current/storage-file-layout.html">the official documentation</a>. There is a specific reason why I used the prefix <code>whale_</code> for the name of the volume, which will be clear later when we will introduce Docker Compose.</p><p><code>docker ps</code> doesn't give any information on volumes, so to see what is connected to your container you need to use <code>docker inspect</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ docker inspect whale-postgres
[...]
"Mounts": [
{
"Type": "volume",
"Name": "whale_dbdata",
"Source": "/var/lib/docker/volumes/whale_dbdata/_data",
"Destination": "/var/lib/postgresql/data",
"Driver": "local",
"Mode": "z",
"RW": true,
"Propagation": ""
}
],
[...]
</pre></div> </div> </div><p>The value for <code>"Source"</code> is where the volume is stored in the host, that is on your computer, but generally speaking you can ignore that detail. You can see all volumes using <code>docker volume ls</code> (using <code>grep</code> if the list is long as it is in my case)</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker volume ls | grep whale
local whale_dbdata
</pre></div> </div> </div><p>Now that the container is running and is connected to a volume, we can try to initialise the database again. Connect with <code>psql</code> using the command line we developed before and run the SQL commands that create the table <code>recipes</code> and insert three rows.</p><p>The whole point of using a volume is to make information permanent, so now terminate and remove the Postgres container, and run it again using the same volume. You can check that the database still contains data using the query shown previously.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker rm -f whale-postgres
whale-postgres
$ docker run -d \
--name whale-postgres \
-e POSTGRES_PASSWORD=whale_password \
-e POSTGRES_DB=whale_db \
-e POSTGRES_USER=whale_user \
-p 5432:5432 \
-v whale_dbdata:/var/lib/postgresql/data \
postgres:13
893378f044204e5c1a87473a038b615a08ad08e5da9225002a470caeac8674a8
$ docker exec -it whale-postgres \
psql -U whale_user whale_db \
-c "select * from recipes"
recipe_id | recipe_name
-----------+----------------
1 | Tacos
2 | Tomato Soup
3 | Grilled Cheese
(3 rows)
</pre></div> </div> </div><h2 id="python-application-4d3a">Python application<a class="headerlink" href="#python-application-4d3a" title="Permanent link">¶</a></h2><p>Great! Now that we have a database that can be restarted without losing data we can create a Python application that interacts with it. Again, please remember that the goal of this post is to show what container orchestration is and how Docker compose can simplify it, so the application developed in this section is absolutely minimal.</p><p>I will first create an application and run it in the host, leveraging the port exposed by the container to connect to the database. Later, I will move the application in its own container.</p><p>To create the application, first create a Python virtual environment using your preferred method. I currently use <code>pyenv</code> (<a href="https://github.com/pyenv/pyenv">https://github.com/pyenv/pyenv</a>).</p><div class="code"><div class="content"><div class="highlight"><pre>pyenv virtualenv whale_docker
pyenv activate whale_docker
</pre></div> </div> </div><p>Now we need to put our requirements in a file and install them. I prefer to keep things tidy from day zero, so create the directory <code>whaleapp</code> in the project directory and inside it the file <code>requirements.txt</code>.</p><div class="code"><div class="content"><div class="highlight"><pre>mkdir whaleapp
touch whaleapp/requirements.txt
</pre></div> </div> </div><p>The only requirement we have for this simple application is <code>psycopg2</code>, so I add it to the file and then install it. Since we are installing requirements is useful to update <code>pip</code> as well.</p><div class="code"><div class="content"><div class="highlight"><pre>echo "psycopg2" >> whaleapp/requirements.txt
pip install -U pip
pip install -r whaleapp/requirements.txt
</pre></div> </div> </div><hr><p>Now create the file <code>whaleapp/whaleapp.py</code> and put this code in it</p><div class="code"><div class="title">whaleapp/whaleapp.py</div><div class="content"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">psycopg2</span>
<span class="n">connection_data</span> <span class="o">=</span> <span class="p">{</span> <span class="callout">1</span>
<span class="s2">"host"</span><span class="p">:</span> <span class="s2">"localhost"</span><span class="p">,</span>
<span class="s2">"database"</span><span class="p">:</span> <span class="s2">"whale_db"</span><span class="p">,</span>
<span class="s2">"user"</span><span class="p">:</span> <span class="s2">"whale_user"</span><span class="p">,</span>
<span class="s2">"password"</span><span class="p">:</span> <span class="s2">"whale_password"</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">conn</span> <span class="o">=</span> <span class="kc">None</span>
<span class="c1"># Connect to the PostgreSQL server</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Connecting to the PostgreSQL database..."</span><span class="p">)</span>
<span class="n">conn</span> <span class="o">=</span> <span class="n">psycopg2</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="o">**</span><span class="n">connection_data</span><span class="p">)</span> <span class="callout">2</span>
<span class="c1"># Create a cursor</span>
<span class="n">cur</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span>
<span class="c1"># Execute the query</span>
<span class="n">cur</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">"select * from recipes"</span><span class="p">)</span> <span class="callout">3</span>
<span class="c1"># Fetch all results</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">cur</span><span class="o">.</span><span class="n">fetchall</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="n">results</span><span class="p">)</span> <span class="callout">4</span>
<span class="c1"># Close the connection</span>
<span class="n">cur</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
<span class="k">except</span> <span class="p">(</span><span class="ne">Exception</span><span class="p">,</span> <span class="n">psycopg2</span><span class="o">.</span><span class="n">DatabaseError</span><span class="p">)</span> <span class="k">as</span> <span class="n">error</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">error</span><span class="p">)</span>
<span class="k">finally</span><span class="p">:</span>
<span class="k">if</span> <span class="n">conn</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">conn</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="callout">5</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Database connection closed."</span><span class="p">)</span>
<span class="c1"># Wait three seconds</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
</pre></div> </div> </div><p>As you can see the code is not complicated. The application is an endless <code>while</code> loop that every 3 seconds establishes a connection with the DB <span class="callout">2</span> using the configuration in <span class="callout">1</span>. After this, the query <code>select * from recipes</code> is run <span class="callout">3</span> , all the results are printed on the standard output <span class="callout">4</span>, and the connection is closed <span class="callout">5</span>.</p><p>If the Postgres container is running and publishing port 5432, this application can be run directly on the host</p><div class="code"><div class="content"><div class="highlight"><pre>$ python whaleapp.py
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
</pre></div> </div> </div><p>and will go on indefinitely until we press <code>Ctrl-C</code> to stop it.</p><hr><p>For the same reasons of isolation and security that we discussed previously, we want to run the application in a Docker container. This can be done pretty easily, but we will run into the same issues that we had when we where trying to run <code>psql</code> in a separate container. At the moment, the application tries to connect to the database on <code>localhost</code>, which is fine while the application is running on the host directly, but won't work any more once that is transported into a Docker container.</p><p>To face one problem at a time, let's first containerise the application and run it using the <code>host</code> network. Once this works, we can see how to solve the communication problem between containers.</p><p>The easiest way to containerise a Python application is to create a new image starting from the image <code>python:3</code>. The following <code>Dockerfile</code> goes into the application directory</p><div class="code"><div class="title"><code>whaleapp/Dockerfile</code></div><div class="content"><div class="highlight"><pre><span class="k">FROM</span><span class="w"> </span><span class="s">python:3</span> <span class="callout">1</span>
<span class="k">WORKDIR</span><span class="w"> </span><span class="s">/usr/src/app</span> <span class="callout">2</span>
<span class="k">COPY</span><span class="w"> </span>requirements.txt<span class="w"> </span>. <span class="callout">3</span>
<span class="k">RUN</span><span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>--no-cache-dir<span class="w"> </span>-r<span class="w"> </span>requirements.txt <span class="callout">4</span>
<span class="k">COPY</span><span class="w"> </span>.<span class="w"> </span>. <span class="callout">5</span>
<span class="k">CMD</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="s2">"python"</span><span class="p">,</span><span class="w"> </span><span class="s2">"-u"</span><span class="p">,</span><span class="w"> </span><span class="s2">"./whaleapp.py"</span><span class="w"> </span><span class="p">]</span> <span class="callout">6</span>
</pre></div> </div> </div><p>A Docker file contains the description of the layers that build an image. Here, we start from the official Python 3 image <span class="callout">1</span> (<a href="https://hub.docker.com/_/python">https://hub.docker.com/_/python</a>), set a working directory <span class="callout">2</span>, copy the requirements file <span class="callout">3</span> and install the requirements <span class="callout">4</span>, then copy the rest of the application <span class="callout">5</span>, and run the application <span class="callout">6</span>. The Python option <code>-u</code> avoids output buffering, see <a href="https://docs.python.org/3/using/cmdline.html#cmdoption-u">https://docs.python.org/3/using/cmdline.html#cmdoption-u</a>.</p><p>It is important to keep in mind the layered nature of Docker images, as this can lead to simple optimisation tricks. In this case, loading the requirements file and installing them creates a layer out of a file that doesn't change very often, while the layer created at <span class="callout">5</span> is probably changing very quickly while we develop the application. If we run something like</p><div class="code"><div class="content"><div class="highlight"><pre><span class="o">[</span>...<span class="o">]</span>
<span class="k">COPY</span><span class="w"> </span>.<span class="w"> </span>.
<span class="k">RUN</span><span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>--no-cache-dir<span class="w"> </span>-r<span class="w"> </span>requirements.txt
<span class="k">CMD</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="s2">"python"</span><span class="p">,</span><span class="w"> </span><span class="s2">"-u"</span><span class="p">,</span><span class="w"> </span><span class="s2">"./app.py"</span><span class="w"> </span><span class="p">]</span>
</pre></div> </div> </div><p>we would have to install the requirements every time we change the application code, as this would rebuild the <code>COPY</code> layer and thus invalidate the layer containing the <code>RUN</code> command.</p><p>Once the <code>Dockerfile</code> is in place we can build the image</p><div class="code"><div class="content"><div class="highlight"><pre>$ cd whaleapp
$ docker build -t whaleapp .
Sending build context to Docker daemon 6.144kB
Step 1/6 : FROM python:3
---> 768307cdb962
Step 2/6 : WORKDIR /usr/src/app
---> Using cache
---> b00189756ddb
Step 3/6 : COPY requirements.txt .
---> a7aef12f562c
Step 4/6 : RUN pip install --no-cache-dir -r requirements.txt
---> Running in 153a3ca6a1b2
Collecting psycopg2
Downloading psycopg2-2.9.3.tar.gz (380 kB)
Building wheels for collected packages: psycopg2
Building wheel for psycopg2 (setup.py): started
Building wheel for psycopg2 (setup.py): finished with status 'done'
Created wheel for psycopg2: filename=psycopg2-2.9.3-cp39-cp39-linux_x86_64.whl size=523502 sha256=1a3aac3cf72cc86b63a3e0f42b9b788c5237c3e5d23df649ca967b29bf89ecf5
Stored in directory: /tmp/pip-ephem-wheel-cache-ow3d1yop/wheels/b3/a1/6e/5a0e26314b15eb96a36263b80529ce0d64382540ac7b9544a9
Successfully built psycopg2
Installing collected packages: psycopg2
Successfully installed psycopg2-2.9.3
WARNING: You are using pip version 20.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
Removing intermediate container 153a3ca6a1b2
---> b18aead1ef15
Step 5/6 : COPY . .
---> be7c3c11e608
Step 6/6 : CMD [ "python", "-u", "./app.py" ]
---> Running in 9e2f4f30b59e
Removing intermediate container 9e2f4f30b59e
---> b735eece4f86
Successfully built b735eece4f86
Successfully tagged whaleapp:latest
</pre></div> </div> </div><p>You can see the layers being built one by one (marked as <code>Step x/6</code> here). Once the image has been build you should be able to see it in the list of images present in your system</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker image ls | grep whale
whaleapp latest 969b15466905 9 minutes ago 894MB
</pre></div> </div> </div><div class="infobox"><i class="fa fa-info-circle"></i><div class="title">Size of containers</div><div><p>You might want to observe 1 minute of silence meditating on the fact that we used almost 900 megabytes of space to run 40 lines of Python. As you can see benefits come with a cost, and you should not underestimate those. 900 megabytes might not seem a lot nowadays, but if you keep building images you will soon use up the space on your hard drive or end up paying a lot for the space on your remote repository.</p>
<p>By the way, this is the reason why Docker splits image into layers and reuses them. For now we can ignore this part of the game, but remember that keeping the system clean and removing past artefacts is important.</p></div></div><p>As I mentioned before we can run this image but we need to use the <code>host</code> network configuration.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run -it --rm --network=host --name whale-app whaleapp
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
</pre></div> </div> </div><p>Please note that I used <code>--rm</code> to make Docker remove the container automatically when it is terminated. This way I can run it again with the same name without having to explicitly remove the past container with <code>docker rm</code>.</p>
<div class="advertisement">
<a href="https://www.thedigitalcat.academy/freebie-first-class-objects">
<img src="/images/first-class-objects/cover.jpg" />
</a>
<div class="body">
<h2 id="first-class-objects-in-python-fffa">First-class objects in Python<a class="headerlink" href="#first-class-objects-in-python-fffa" title="Permanent link">¶</a></h2>
<p>Higher-order functions, wrappers, and factories</p>
<p>Learn all you need to know to understand first-class citizenship in Python, the gateway to grasp how decorators work and how functional programming can supercharge your code.</p>
<div class="actions">
<a class="action" href="https://www.thedigitalcat.academy/freebie-first-class-objects">Get your FREE copy</a>
</div>
</div>
</div>
<h2 id="run-containers-in-the-same-network-deb7">Run containers in the same network<a class="headerlink" href="#run-containers-in-the-same-network-deb7" title="Permanent link">¶</a></h2><p>Docker containers are isolated from the host and from other containers by default. This however doesn't mean that they can't communicate with each other if we run them in a specific configuration. In particular, an important part in Docker networking is played by bridge networks.</p><p>Whenever containers are run in the same custom bridge network, Docker provides them DNS resolution using the container names. This means that we can make the application communicate with the database without having to run the former in the host network.</p><p>A custom network can be created using <code>docker network</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ docker network create whale
</pre></div> </div> </div><p>As always, Docker will return the ID of the object it just created, but we can ignore it for now, as we can refer to the network by name.</p><p>Stop and remove the Postgres container, and run it again using the network <code>whale</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ docker rm -f whale-postgres
whale-postgres
$ docker run -d \
--name whale-postgres \
-e POSTGRES_PASSWORD=whale_password \
-e POSTGRES_DB=whale_db \
-e POSTGRES_USER=whale_user \
--network=whale \
-v whale_dbdata:/var/lib/postgresql/data \
postgres:13
</pre></div> </div> </div><p>Please note that there is no need to publish the port 5432 in this setup, as the host doesn't need to access the container. Should this be a requirement, add the option <code>-p 5432:5432</code> again.</p><p>As happened with volumes, <code>docker ps</code> doesn't give information about the network that containers are using, so you have to use <code>docker inspect</code> again</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker inspect whale-postgres
[...]
"NetworkSettings": {
"Networks": {
"whale": {
[...]
</pre></div> </div> </div><div class="infobox"><i class="fa fa-info-circle"></i><div class="title">Docker network management</div><div><p>The command <code>docker network</code> can be used to change the network configuration of <em>running</em> containers.</p>
<p>You can disconnect a running container from a network with</p>
<div class="code"><div class="content"><div class="highlight"><pre>$ docker network disconnect NETWORK_ID CONTAINER_ID
</pre></div> </div> </div>
<p>and connect it with</p>
<div class="code"><div class="content"><div class="highlight"><pre>$ docker network connect NETWORK_ID CONTAINER_ID
</pre></div> </div> </div>
<p>You can see which containers are using a given network inspecting it</p>
<div class="code"><div class="content"><div class="highlight"><pre>$ docker network inspect NETWORK_ID
</pre></div> </div> </div>
<p>Remember that disconnecting a container from a network makes it unreachable, so while it is good that we can do this on running containers, maintenance shall be always carefully planned to avoid unexpected downtime.</p></div></div><p>As I mentioned before, Docker bridge networks provide DNS resolution using the container's name. We can double check this running a container and using <code>ping</code>.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker run -it --rm --network=whale whaleapp ping whale-postgres
PING whale-postgres (172.19.0.2) 56(84) bytes of data.
64 bytes from whale-postgres.whale (172.19.0.2): icmp_seq=1 ttl=64 time=0.064 ms
64 bytes from whale-postgres.whale (172.19.0.2): icmp_seq=2 ttl=64 time=0.100 ms
64 bytes from whale-postgres.whale (172.19.0.2): icmp_seq=3 ttl=64 time=0.115 ms
64 bytes from whale-postgres.whale (172.19.0.2): icmp_seq=4 ttl=64 time=0.101 ms
^C
--- whale-postgres ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 80ms
rtt min/avg/max/mdev = 0.064/0.095/0.115/0.018 ms
</pre></div> </div> </div><p>What I did here was to run the image <code>whaleapp</code> that we built previously, but overriding the default command and running <code>ping whale-postgres</code> instead. This is a good way to check if a host can resolve a name on the network (<code>dig</code> is another useful tool but is not installed by default in that image).</p><p>As you can see the Postgres container is reachable and we also know that it currently runs with the IP <code>172.19.0.2</code>. This value might be different on your system, but it will match the information you get if you run <code>docker network inspect whale</code>.</p><p>The point of all this talk about DNS is that we can now change the code of the Python application so that it connects to <code>whale-postgres</code> instead of <code>localhost</code></p><div class="code"><div class="content"><div class="highlight"><pre><span class="n">connection_data</span> <span class="o">=</span> <span class="p">{</span>
<span class="hll"> <span class="s2">"host"</span><span class="p">:</span> <span class="s2">"whale-postgres"</span><span class="p">,</span>
</span> <span class="s2">"database"</span><span class="p">:</span> <span class="s2">"whale_db"</span><span class="p">,</span>
<span class="s2">"user"</span><span class="p">:</span> <span class="s2">"whale_user"</span><span class="p">,</span>
<span class="s2">"password"</span><span class="p">:</span> <span class="s2">"whale_password"</span><span class="p">,</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Once this is done, rebuild the image and run it in the <code>whale</code> network</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker build -t whaleapp .
[...]
$ docker run -it --rm --network=whale --name whale-app whaleapp
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
</pre></div> </div> </div><p>You can also take the network directly from another container, which is a useful shortcut.</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker build -t whaleapp .
[...]
$ docker run -it --rm \
--network=container:whale-postgres \
--name whale-app whaleapp
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
</pre></div> </div> </div><h2 id="run-time-configuration-7c07">Run time configuration<a class="headerlink" href="#run-time-configuration-7c07" title="Permanent link">¶</a></h2><p>Hardcoding configuration values into the application is never a great idea, and while this is a very simple example it is worth pushing the setup a bit further to make it tidy.</p><p>In particular, we can replace the connection data <code>host</code>, <code>database</code>, and <code>user</code> with environment variables, which allow us to reuse the application configuring it at run time. For simplicity's sake I will store the password in an environment variable as well, and pass it in clear text when we run the container. See the box for more information about how to manage secret values.</p><p>Reading values from environment variables is easy in Python</p><div class="code"><div class="content"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">psycopg2</span>
<span class="n">DB_HOST</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"WHALEAPP__DB_HOST"</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="n">DB_NAME</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"WHALEAPP__DB_NAME"</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="n">DB_USER</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"WHALEAPP__DB_USER"</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="n">DB_PASSWORD</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"WHALEAPP__DB_PASSWORD"</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="n">connection_data</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"host"</span><span class="p">:</span> <span class="n">DB_HOST</span><span class="p">,</span>
<span class="s2">"database"</span><span class="p">:</span> <span class="n">DB_NAME</span><span class="p">,</span>
<span class="s2">"user"</span><span class="p">:</span> <span class="n">DB_USER</span><span class="p">,</span>
<span class="s2">"password"</span><span class="p">:</span> <span class="n">DB_PASSWORD</span><span class="p">,</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Please note that I prefixed all environment variables with <code>WHALEAPP__</code>. This is not mandatory, and has no special meaning for the operating system. In my experience, complicated systems can have many environment variables, and using prefixes is a simple and effective way to keep track of which part of the system needs that particular value.</p><p>We already know how to pass environment variables to Docker containers as we did it when we run the Postgres container. Build the image again, and then run it passing the correct variables</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker build -t whaleapp .
[...]
$ docker run -it --rm --network=whale \
-e WHALEAPP__DB_HOST=whale-postgres \
-e WHALEAPP__DB_NAME=whale_db \
-e WHALEAPP__DB_USER=whale_user \
-e WHALEAPP__DB_PASSWORD=password \
--name whale-app whaleapp
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
Connecting to the PostgreSQL database...
[(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
Database connection closed.
</pre></div> </div> </div><div class="infobox"><i class="fa fa-info-circle"></i><div class="title">Managing secrets</div><div><p>A "secret" is a value that should never be shown in plain text, as it is used to grant access to a system. This can be a password or a private key such as the ones you have to run SSH, and as happens with everything related to security, managing them is complicated. Please keep in mind that security is hard and that the best attitude to have is: <em>every time you think something in security is straightforward this means you got it wrong</em>.</p>
<p>Generally speaking, you want secrets to be encrypted and stored in a safe place where access is granted to a narrow set of people. These secrets should be accessible to your application in a secure way, and it shouldn't be possible to access the secrets hosted in the memory of the application.</p>
<p>For example, many posts online show how you can use AWS Secrets Manager to store your secrets and access them from your application using <a href="https://stedolan.github.io/jq/">jq</a> to fetch them at run time. While this works, if the JSON secret contains a syntax error, <code>jq</code> dumps the whole value in the standard output of the application, which means that the logs contain the secret in plain text.</p>
<p><a href="https://hub.docker.com/_/vault">Vault</a> is a tool created by Hashicorp that many use to store secrets needed by containers. It is interesting to read in the description of the image that with a specific configuration the container prevents memory from being swapped to disk, which would leak the unencrypted values. As you see, security is hard.</p>
<p>Orchestration tools always provide a way to manage secrets and to pass them to containers. For example, see <a href="https://docs.docker.com/engine/swarm/secrets/">Docker Swarm secrets</a>, <a href="https://kubernetes.io/docs/concepts/configuration/secret/">Kubernetes secrets</a>, and <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/specifying-sensitive-data-secrets.html">secrets for AWS Elastic Container Service</a>.</p></div></div><h2 id="enter-docker-compose-58a7">Enter Docker Compose<a class="headerlink" href="#enter-docker-compose-58a7" title="Permanent link">¶</a></h2><p>The setup we created in the past sections is good, but is far from being optimal. We had to create a custom bridge network and then start the Postgres and the application containers connected to it. To stop the system we need to terminate containers manually and to remember to remove them to avoid blocking the container name. We also have to manually remove the network if we want to keep the system clean.</p><p>The next step would then be to create a bash script, then to evolve it to a Makefile or similar solution. Fortunately, Docker provides a better solution with Docker Compose.</p><p>Docker Compose can be described as a single-host orchestration tool. Orchestration tools are pieces of software that allow us to deal with the problems described previously, such as starting and terminating multiple containers, creating networks and volumes, managing secrets, and so on. Docker Compose works in a single-host mode, so it's a great solution for development environment, while for production multi-host environments it's better to move to more advanced tools such as AWS ECS or Kubernetes.</p><p>Docker Compose reads the configuration of a system from the file <code>docker-compose.yml</code> (the default value, it can be changed) that captures all we did manually in the previous sections in a compact and readable way.</p><p>To install Docker Compose follow the instructions you find at <a href="https://docs.docker.com/compose/install/">https://docs.docker.com/compose/install/</a>. Before we start using Docker Compose make sure you kill the Postgres container if you are still running it, and remove the network we created</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker rm -f whale-postgres
whale-postgres
$ docker network remove whale
whale
</pre></div> </div> </div><p>Then create the file <code>docker-compose.yml</code> in the project directory (not the app directory) and put the following code in it</p><div class="code"><div class="title"><code>docker-compose.yml</code></div><div class="content"><div class="highlight"><pre><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">'3.8'</span>
<span class="nt">services</span><span class="p">:</span>
</pre></div> </div> </div><p>This is not a valid Docker Compose file, yet, but you can see that there is a value that specifies the syntax version and one that lists services. You can find the Compose file reference at <a href="https://docs.docker.com/compose/compose-file/">https://docs.docker.com/compose/compose-file/</a>, together with a detailed description of the various versions.</p><p>The first service we want to run is Postgres, and a basic configuration for that is</p><div class="code"><div class="title"><code>docker-compose.yml</code></div><div class="content"><div class="highlight"><pre><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">'3.8'</span>
<span class="nt">services</span><span class="p">:</span>
<span class="w"> </span><span class="nt">db</span><span class="p">:</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">postgres:13</span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
<span class="w"> </span><span class="nt">POSTGRES_DB</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_db</span>
<span class="w"> </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_password</span>
<span class="w"> </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_user</span>
<span class="w"> </span><span class="nt">volumes</span><span class="p">:</span> <span class="callout">2</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">dbdata:/var/lib/postgresql/data</span>
<span class="nt">volumes</span><span class="p">:</span> <span class="callout">1</span>
<span class="w"> </span><span class="nt">dbdata</span><span class="p">:</span>
</pre></div> </div> </div><p>As you can see, this file contains the environment variables that we passed to the Postgres container and the volume configuration. The final <code>volumes</code> <span class="callout">1</span> declares which volumes have to be present (so it creates them if they are not), while <code>volumes</code> <span class="callout">2</span> inside the service <code>db</code> creates the connection just like the option <code>-v</code> did previously.</p><p>Now, from the project directory, you can run Docker Compose with</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker-compose -p whale up -d
Creating network "whale_default" with the default driver
Creating whale_db_1 ... done
</pre></div> </div> </div><p>The option <code>-p</code> sets the name of the project, which otherwise would be by default that of the directory you are at the moment (which might or might not be meaningful), while the command <code>up -d</code> starts all the containers in a detached mode.</p><p>As you can see from the output, Docker Compose creates a (bridge) network called <code>whale_default</code>. Normally, you would see a message like <code>Creating volume "whale_dbdata" with default driver</code> as well, but in this case the volume is already present as we created it previously. Both the network and the volume are prefixed with <code>PROJECTNAME_</code>, and this is the reason why when we first created the volume I named it <code>whale_dbdata</code>. Keep in mind however that all these default behaviours can be customised in the Compose file.</p><p>If you run <code>docker ps</code> you will see that the container is named <code>whale_db_1</code>. This comes from the project name (<code>whale_</code>), the service name in the Compose file (<code>db_</code>) and the container number, which is 1 because at the moment we are running only one container for that service.</p><p>To stop the services you have to run</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker-compose -p whale down
Stopping whale_db_1 ... done
Removing whale_db_1 ... done
Removing network whale_default
</pre></div> </div> </div><p>As you can see from the output, Docker Compose stops and removes the container, then removes the network. This is very convenient, as it already removes a lot of the work we had to do manually earlier.</p><hr><p>We can now add the application container to the Compose file</p><div class="code"><div class="content"><div class="highlight"><pre><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">'3.8'</span>
<span class="nt">services</span><span class="p">:</span>
<span class="w"> </span><span class="nt">db</span><span class="p">:</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">postgres:13</span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
<span class="w"> </span><span class="nt">POSTGRES_DB</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_db</span>
<span class="w"> </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_password</span>
<span class="w"> </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_user</span>
<span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">dbdata:/var/lib/postgresql/data</span>
<span class="hll"><span class="w"> </span><span class="nt">app</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="nt">build</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="nt">context</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whaleapp</span>
</span><span class="hll"><span class="w"> </span><span class="nt">dockerfile</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Dockerfile</span>
</span><span class="hll"><span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="nt">WHALEAPP__DB_HOST</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">db</span>
</span><span class="hll"><span class="w"> </span><span class="nt">WHALEAPP__DB_NAME</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_db</span>
</span><span class="hll"><span class="w"> </span><span class="nt">WHALEAPP__DB_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_user</span>
</span><span class="hll"><span class="w"> </span><span class="nt">WHALEAPP__DB_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_password</span>
</span>
<span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="nt">dbdata</span><span class="p">:</span>
</pre></div> </div> </div><p>This definition is slightly different, as the application container has to be built using the Dockerfile we created. Docker Compose allows us to store here the build configuration so that we don't need to pass al the options to <code>docker build</code> manually, but please note that configuring the build here doesn't mean that Docker Compose will build the image for you every time. You still need to run <code>docker-compose -p whale build</code> every time you need to rebuild it. </p><p>Please note that the variable <code>WHALEAPP__DB_HOST</code> is set to the service name, and not to the container name. Now, when we run Docker Compose we get</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker-compose -p whale up -d
Creating network "whale_default" with the default driver
Creating whale_db_1 ... done
Creating whale_app_1 ... done
</pre></div> </div> </div><p>and the output tells us that also the container <code>whale_app_1</code> has been created this time. We can see the logs of a container with <code>docker logs</code>, but using <code>docker-compose</code> allows us to call services by name instead of by ID</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker-compose -p whale logs -f app
Attaching to whale_app_1
app_1 | Connecting to the PostgreSQL database...
app_1 | [(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
app_1 | Database connection closed.
app_1 | Connecting to the PostgreSQL database...
app_1 | [(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
app_1 | Database connection closed.
</pre></div> </div> </div><h2 id="health-checks-and-dependencies-bc9b">Health checks and dependencies<a class="headerlink" href="#health-checks-and-dependencies-bc9b" title="Permanent link">¶</a></h2><p>You might have noticed that at the very beginning of the application logs there are some connection errors, and that after a while the application manages to connect to the database</p><div class="code"><div class="content"><div class="highlight"><pre>$ docker-compose -p whale logs -f app
Attaching to whale_app_1
app_1 | Connecting to the PostgreSQL database...
app_1 | could not translate host name "db" to address: Name or service not known
app_1 |
app_1 | Connecting to the PostgreSQL database...
app_1 | could not translate host name "db" to address: Name or service not known
app_1 |
app_1 | Connecting to the PostgreSQL database...
app_1 | Connecting to the PostgreSQL database...
app_1 | could not connect to server: Connection refused
app_1 | Is the server running on host "db" (172.31.0.3) and accepting
app_1 | TCP/IP connections on port 5432?
app_1 |
app_1 | Connecting to the PostgreSQL database...
app_1 | [(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
app_1 | Database connection closed.
app_1 | Connecting to the PostgreSQL database...
app_1 | [(1, 'Tacos'), (2, 'Tomato Soup'), (3, 'Grilled Cheese')]
app_1 | Database connection closed.
</pre></div> </div> </div><p>These errors come from the fact that the application container is up and running before the database is ready to serve connections. In a production setup this usually doesn't happen because the database is up and running much before the application gets deployed for the first time, and then runs (hopefully) without interruption. In a development environment, instead, such a situation is normal.</p><p>Please note that this might not happen in your setup, as this is tightly connected with the speed of Docker Compose and the containers. Time-sensitive bugs are one of the worst types to deal with, and this is the reason why managing distributed systems is hard. It is important that you realise that even though this might work now on your system, the problem is there and we need to find a solution.</p><p>The standard solution when part of a system depends on another is to create a <em>health check</em> that periodically tests the first service, and to start the second service only when the check is successful. We can do this in the Compose file using <code>healthcheck</code> and <code>depends_on</code></p><div class="code"><div class="content"><div class="highlight"><pre><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s">'3.8'</span>
<span class="nt">services</span><span class="p">:</span>
<span class="w"> </span><span class="nt">db</span><span class="p">:</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">postgres:13</span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
<span class="w"> </span><span class="nt">POSTGRES_DB</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_db</span>
<span class="w"> </span><span class="nt">POSTGRES_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_password</span>
<span class="w"> </span><span class="nt">POSTGRES_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_user</span>
<span class="w"> </span><span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">dbdata:/var/lib/postgresql/data</span>
<span class="hll"><span class="w"> </span><span class="nt">healthcheck</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="nt">test</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="s">"CMD-SHELL"</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="s">"pg_isready"</span><span class="p p-Indicator">]</span>
</span><span class="hll"><span class="w"> </span><span class="nt">interval</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10s</span>
</span><span class="hll"><span class="w"> </span><span class="nt">timeout</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">5s</span>
</span><span class="hll"><span class="w"> </span><span class="nt">retries</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">5</span>
</span><span class="w"> </span><span class="nt">app</span><span class="p">:</span>
<span class="w"> </span><span class="nt">build</span><span class="p">:</span>
<span class="w"> </span><span class="nt">context</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whaleapp</span>
<span class="w"> </span><span class="nt">dockerfile</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Dockerfile</span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span>
<span class="w"> </span><span class="nt">WHALEAPP__DB_HOST</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">db</span>
<span class="w"> </span><span class="nt">WHALEAPP__DB_NAME</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_db</span>
<span class="w"> </span><span class="nt">WHALEAPP__DB_USER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_user</span>
<span class="w"> </span><span class="nt">WHALEAPP__DB_PASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">whale_password</span>
<span class="hll"><span class="w"> </span><span class="nt">depends_on</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="nt">db</span><span class="p">:</span>
</span><span class="hll"><span class="w"> </span><span class="nt">condition</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">service_healthy</span>
</span>
<span class="nt">volumes</span><span class="p">:</span>
<span class="w"> </span><span class="nt">dbdata</span><span class="p">:</span>
</pre></div> </div> </div><p>The health check for the Postgres container leverages the command line tool <code>pg_isready</code> that is successful only when the database is ready to accept connections, and tries every 10 seconds for 5 times. Now, when you run <code>up -d</code> this time you should notice a clear delay before the application is run, but the logs won't contain any connection error.</p><h2 id="final-words-9803">Final words<a class="headerlink" href="#final-words-9803" title="Permanent link">¶</a></h2><p>Well, this was a long one, but I hope you enjoyed the trip and you ended up having a better picture of what problems Docker Compose solve, along with a feeling of how complicated it might be to design an architecture. Everything we did was for a "simple" development environment with a couple of containers, so you can figure what is involved when we get to live environments.</p><h2 id="updates-0083">Updates<a class="headerlink" href="#updates-0083" title="Permanent link">¶</a></h2><p>2022-03-17: Thanks to my colleague Joanna Stadnik for a thorough review, for spotting typos, and for giving me several suggestions based on her experience. Thank you!</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>Public key cryptography: OpenSSH private keys2021-06-03T14:00:00+01:002021-06-03T14:00:00+01:00Leonardo Giordanitag:www.thedigitalcatonline.com,2021-06-03:/blog/2021/06/03/public-key-cryptography-openssh-private-keys/<p>An in-depth discussion of the format of OpenSSH keys</p><p>When you create standard RSA keys with <code>ssh-keygen</code> you end up with a private key in PEM format, and a public key in OpenSSH format. Both have been described in detail in my post <a href="https://www.thedigitalcatonline.com/blog/2018/04/25/rsa-keys/">Public key cryptography: RSA keys</a>. In 2014, OpenSSH introduced a custom format for private keys that is apparently similar to PEM but is internally completely different. This format is used by default when you create ed25519 keys and it is expected to be the default format for all keys in the future, so it is worth having a look.</p><p>While investigating this topic I found a lot of misconceptions and wrong or partially wrong statements on Stack Overflow, so I hope this might be a comprehensive view of what this format is, its relationship with PEM, and the tools that you can use to manipulate it.</p><p>I'm not the first programmer to look into this, clearly, and I have to mention two posts that I read before writing this one: <a href="https://peterlyons.com/problog/2017/12/openssh-ed25519-private-key-file-format/">OpenSSH ed25519 private key file format</a> written in December 2017 by Peter Lyons and <a href="http://dnaeon.github.io/openssh-private-key-binary-format/">The OpenSSH private key binary format</a>, written in August 2020 by Marin Atanasov Nikolov. I'm sure many others have done this research but these are the resources that I found and I want to say a big thanks to both authors for sharing their findings. I will shamelessly use their results in the following explanation, as I hope others will do with what I'm writing here. Sharing knowledge is one of the best ways to help others.</p><p>Please note that all the private keys shown in this post have been trashed after I published it.</p><p>Note: as the word "key" can identify several different component of the systems I will describe, I will as much as possible use the words "private key" and "encryption key". The first is the key that we generate to be used in SSH, while the second is a parameter of a (symmetric) encryption algorithm.</p><h2 id="kdfs-and-protection-at-rest-523a">KDFs and protection at rest<a class="headerlink" href="#kdfs-and-protection-at-rest-523a" title="Permanent link">¶</a></h2><p>Describing the introduction of the new format, the <a href="https://www.openssh.com/txt/release-6.5">OpenSSH changelog</a> says</p><div class="code"><div class="content"><div class="highlight"><pre>Add a new private key format that uses a bcrypt KDF to better
protect keys at rest. This format is used unconditionally for
Ed25519 keys, but may be requested when generating or saving
existing keys of other types via the -o ssh-keygen(1) option.
We intend to make the new format the default in the near future.
Details of the new format are in the PROTOCOL.key file.
</pre></div> </div> </div><p>Before we start dissecting the format, then, it is worth briefly discussing what a KDF is, what bcrypt is, and what it means to protect keys at rest.</p><h3 id="key-derivation-functions-decf">Key Derivation Functions</h3><p>Whenever a system is protected by a password you want to store the latter somewhere. This is clearly necessary to check the validity of the passwords that the user inputs and decide if you should grant access, but you shouldn't store the password in clear text, as a breach in the storage might compromise the whole system. The idea behind storing password securely is to run them through a hash function and store the hash: whenever someone inputs a password we can run the hash function again and compare the two hashes. However, we also want to prevent the attacker to be able to reconstruct the password from the hash, so we need a <em><a href="https://en.wikipedia.org/wiki/Cryptographic_hash_function">cryptographic hash function</a></em>, which is a hash function with added requirements to prevent an easy inversion of the process.</p><p>The same strategy can be applied when it comes to encryption. An encryption system needs a key (a sequence of bits used to encrypt the message) and we need to derive it from the password given by the user. Encryption keys are required to have a specific length dictated by the encryption algorithm that we use, so hashing looks like a good solution, as all hashes generated by a given algorithm are by definition of the same size. <a href="https://en.wikipedia.org/wiki/Advanced_Encryption_Standard">AES</a>, for example, one of the most widespread symmetric block ciphers, uses a key of exactly 128, 192, or 256 bits. Converting the password into a key of predetermined size is called <em>stretching</em>.</p><p>Any cryptographic system can be broken using a brute-force attack, as you can always test all possible inputs. In the case of login, we can just input all possible passwords until we get access to the system, while in the case of encryption we can try to decrypt using all possible keys until we obtain a meaningful result. This means that the most important thing we can do to protect such systems is to make brute-force attacks infeasible. This can be done increasing the key size (using more bits) but also using a slow stretching algorithm.</p><p>While hash functions created for things like digital signatures should be fast, then, hash functions that we use to obfuscate the password (for storage) or to create the key (for encryption/decryption) have to be very slow. The slowness of the processing can frustrate brute-force attacks and make them less effective is not infeasible. An example: at the current state of technology, you can easily hash 1 trillion passwords a second with a trivial expense, but if each one of those hashes takes 1 second you end up having to wait more than 31,000 years before you test all of them.</p><p>The process that converts a password into a key is called <em><a href="https://en.wikipedia.org/wiki/Key_derivation_function">Key Derivation Function</a></em> (KDF) and despite the name it is usually a complex algorithm and not a single mathematical function. <a href="https://en.wikipedia.org/wiki/PBKDF2">PBKDF2</a> is an important KDF, defined as part of the specification <a href="https://datatracker.ietf.org/doc/html/rfc2898">PKCS #5</a>, and it can use any pseudorandom function as part of the key stretching. An important feature of PBKDF2 is that it accepts an iteration count as input, that allows to slow down the process. As we just saw, this is the key to making the algorithm slower in order to adapt to the increasing computing power available to attackers.</p><h3 id="bcrypt-46df">bcrypt</h3><p>The password-hashing function known as <a href="https://en.wikipedia.org/wiki/Bcrypt">bcrypt</a> was created in 1999 and is based on the <a href="https://en.wikipedia.org/wiki/Blowfish_(cipher)">Blowfish</a> cipher created in 1993. Bcrypt is well know to be an extremely good choice thanks to the simple fact that its slowness can be increased tuning one of the parameters of the algorithm called "cost factor". This represents the number of iterations done in the setup of the underlying cipher, and its logarithmic nature makes easy to adapt the whole process to the increasing computational power available to attackers. <a href="https://auth0.com/blog/hashing-in-action-understanding-bcrypt/">This post</a> attempts to estimate the time to hash a password of 15 characters with a cost of 30 (the maximum is actually 31) with a decent 2017 laptop (2.8 GHz Intel Core i7 16 GB RAM). The result turns out to be around 500 days which makes you understand that bcrypt won't die easily. It is important to note here that bcrypt is not a KDF, but a hash function. As such, it might be part of a KDF, but not replace the whole process.</p><h3 id="protection-at-rest-9c52">Protection at rest</h3><p>Protection <a href="https://en.wikipedia.org/wiki/Data_at_rest">at rest</a> refers to the scheme that ensures data is secure when it is stored. Practically speaking, when it comes to SSH keys, we refer to the fact that an attacker that can physically access a key, for example stealing a laptop, actually owns an encrypted version of the key, which can't be used without first decrypting it. As the attacker is supposed to ignore the password used to encrypt the key, the only strategy they can use is to brute-force the key, and here is where the concept of protection at rest comes into play. Actually, the <a href="https://xkcd.com/538/">other strategy</a> they can employ is to kidnap you and to force you to reveal the password, but this somehow falls outside the sphere of cryptographic security. </p><h2 id="pem-format-and-protection-at-rest-aafc">PEM format and protection at rest<a class="headerlink" href="#pem-format-and-protection-at-rest-aafc" title="Permanent link">¶</a></h2><p>Now that I clarified some terminology, let's have a look at what the standard PEM format does to store encrypted passwords. As I explained in my post <a href="https://www.thedigitalcatonline.com/blog/2018/04/25/rsa-keys/">Public key cryptography: RSA keys</a> a PEM file contains a text header, a text footer, and some content. The content is always an ASN.1 structure created using DER and encoded using base64.</p><p>For encrypted private keys, the ASN.1 structure is created following a standard called <a href="https://datatracker.ietf.org/doc/html/rfc5208">PKCS #8</a>. This standard uses an encryption scheme called PBES2 described in the specification PKCS #5, which uses a symmetric cipher and a password, previously converted into an encryption key using the KDF called PBKDF2. I hope at this point some if not all of these names ring a bell.</p><p>We can roughly sketch the process with the following steps:</p><ul><li>Create the private key using the requested asymmetric algorithm (e.g. RSA or ED25519)</li><li>Encrypt the private key following PBES2<ul><li>Stretch the password into an encryption key using PBKDF2 with one of the possible hash functions and a random salt value</li><li>Encrypt the private key using the newly created encryption key</li></ul></li><li>Represent the encrypted key and the parameters used for PBKDF2 using ASN.1/DER</li><li>Encode the result with base64</li><li>Add a header and a footer that specify the nature of the content</li></ul><p>Let's create an encrypted key with OpenSSL and analyse it. The command I used is</p><div class="code"><div class="content"><div class="highlight"><pre>openssl genpkey -aes-256-cbc -algorithm RSA\
-pkeyopt rsa_keygen_bits:4096 -pass pass:foobar\
-out key_rsa_4096_openssl_pw
</pre></div> </div> </div><p>which creates a 4096 bits RSA key and encrypts it with AES using <code>foobar</code> as password. What I get is a file in the aforementioned PEM format</p><div class="code"><div class="content"><div class="highlight"><pre>-----BEGIN ENCRYPTED PRIVATE KEY-----
MIIJrTBXBgkqhkiG9w0BBQ0wSjApBgkqhkiG9w0BBQwwHAQIW+BK6UQtCPACAggA
MAwGCCqGSIb3DQIJBQAwHQYJYIZIAWUDBAEqBBCIvU4FD31mkYR76ugTEhuwBIIJ
UJPHGeObOC1lHMrTTKhdyiekEcJhCO3rzP/gqVpqXkjhUASTWEsE9LEcuGKdrzAN
Dsy/WL9revg9UAQtGAk8WTSqWhv5JaCC4FqLGirqLMzhU51Jf4GbmCOWAWGP7TZu
[...]
QEfBUexTcFVf13cVX7LFGOAZ3yIvFc3sfl5nyYY9Nerk8MxUOW+9Ck5loTEzMj9j
xJf5RsNvcoGVg33Rf7vl2xFIAD+PFdehd8n2CveQ48LJ9Zfn0gsRPQrPL+02Nlhu
7f44uW/Vq2YqG3PN1n8GUTexvF/qCKkd2T2QmHYnK9cryRn0xHvzSjSsQls170sA
Svu0sdTwh1tIs/sxRGuSta+iXPfHJnW4sZzh/2lAMvkgML6h9JAeIYV6e/qUqYSq
GxSfj7s0Qs0K5e3Xv1lCQUhSz82fBysznjeAhWa45YEV
-----END ENCRYPTED PRIVATE KEY-----
</pre></div> </div> </div><p>We can dump the ASN.1 content directly from the PEM format using <code>openssl asn1parse</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ openssl asn1parse -inform pem -in key_rsa_4096_openssl_pw
0:d=0 hl=4 l=2477 cons: SEQUENCE
4:d=1 hl=2 l= 87 cons: SEQUENCE
6:d=2 hl=2 l= 9 prim: OBJECT :PBES2 <span class="callout">1</span>
17:d=2 hl=2 l= 74 cons: SEQUENCE
19:d=3 hl=2 l= 41 cons: SEQUENCE
21:d=4 hl=2 l= 9 prim: OBJECT :PBKDF2 <span class="callout">2</span>
32:d=4 hl=2 l= 28 cons: SEQUENCE
34:d=5 hl=2 l= 8 prim: OCTET STRING [HEX DUMP]:5BE04AE9442D08F0 <span class="callout">4</span>
44:d=5 hl=2 l= 2 prim: INTEGER :0800 <span class="callout">5</span>
48:d=5 hl=2 l= 12 cons: SEQUENCE
50:d=6 hl=2 l= 8 prim: OBJECT :hmacWithSHA256 <span class="callout">6</span>
60:d=6 hl=2 l= 0 prim: NULL
62:d=3 hl=2 l= 29 cons: SEQUENCE
64:d=4 hl=2 l= 9 prim: OBJECT :aes-256-cbc <span class="callout">3</span>
75:d=4 hl=2 l= 16 prim: OCTET STRING [HEX DUMP]:88BD4E050F7D6691847BEAE813121BB0
93:d=1 hl=4 l=2384 prim: OCTET STRING [HEX DUMP]:93C719E39B382D[...]
</pre></div> </div> </div><p>Please note that I truncated the final <code>OCTET STRING</code> that contains the encrypted key as it is pretty long.</p><p>You can clearly see that this key is encrypted using PBES2 <span class="callout">1</span> and PBKDF2 <span class="callout">2</span>. The algorithm used to encrypt the key is <code>aes-256-cbc</code> <span class="callout">3</span>, as I asked. Specifically, this is AES with a key of 256 bits in <a href="https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Cipher_block_chaining_(CBC)">CBC mode</a>).</p><p>According to the <a href="https://datatracker.ietf.org/doc/html/rfc2898#appendix-A.4">PKCS #5 specification</a>, the <code>PBES2</code> block contains</p><div class="code"><div class="content"><div class="highlight"><pre>PBES2-params ::= SEQUENCE {
keyDerivationFunc AlgorithmIdentifier {{PBES2-KDFs}},
encryptionScheme AlgorithmIdentifier {{PBES2-Encs}} }
</pre></div> </div> </div><p>and indeed we have <code>PBKDF2</code> <span class="callout">1</span> for <code>keyDerivationFunc</code>, and <code>aes-256-cbc</code> <span class="callout">3</span> for <code>encryptionScheme</code>. The sequence <code>PBKDF2</code> is specified in the <a href="https://datatracker.ietf.org/doc/html/rfc2898#appendix-A.2">same document</a> as</p><div class="code"><div class="content"><div class="highlight"><pre>PBKDF2-params ::= SEQUENCE {
salt CHOICE {
specified OCTET STRING,
otherSource AlgorithmIdentifier {{PBKDF2-SaltSources}}
},
iterationCount INTEGER (1..MAX),
keyLength INTEGER (1..MAX) OPTIONAL,
prf AlgorithmIdentifier {{PBKDF2-PRFs}} DEFAULT
algid-hmacWithSHA1 }
</pre></div> </div> </div><p>As you can see in the ASN.1 dump the salt is <code>5BE04AE9442D08F0</code> <span class="callout">4</span>, the iteration count is 2048 (<code>0x800</code>) <span class="callout">5</span>, and the hash function (<code>prf</code>, pseudorandom function) is <code>hmacWithSHA256</code> <span class="callout">6</span> without any additional parameters. The value 2048 for the iterations is a default value in OpenSSL (see the definition of <a href="https://github.com/openssl/openssl/blob/5bcbdee621fbf05df7431b8fbb0ea7de7054e1f0/include/openssl/evp.h#L41">PKCS5_DEFAULT_ITER</a>).</p><h2 id="opensshs-private-key-format-413c">OpenSSH's private key format<a class="headerlink" href="#opensshs-private-key-format-413c" title="Permanent link">¶</a></h2><p>As we saw at the beginning of the post, the OpenSSH team came up with a custom format to store the private keys, so now that we are familiar with the nomenclature and with the way PEM stores encrypted keys, lets see what this new format can do.</p><p>The best starting point for our investigation is the tool <code>ssh-keygen</code> which we can use to create private keys. The source can be found in the OpenSSH repository in the file <a href="https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/ssh-keygen.c">ssh-keygen.c</a>. This file uses two different functions, <code>sshkey_private_to_blob2</code> (<a href="https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/sshkey.c#L3883">source code</a>) for the new format and <code>sshkey_private_to_blob_pem_pkcs8</code> (<a href="https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/sshkey.c#L4371">source code</a>) for keys in PKCS #8 format. The former calls <code>bcrypt_pbkdf</code> which comes from OpenBSD (<a href="https://github.com/openbsd/src/blob/2207c4325726fdc5c4bcd0011af0fdf7d3dab137/sys/lib/libsa/bcrypt_pbkdf.c#L96">source code</a>).</p><p>This function contains a modified implementation of PBKDF2 that uses bcrypt as the core hash function. The comment that you can find at the top of the file <a href="https://github.com/openbsd/src/blob/master/sys/lib/libsa/bcrypt_pbkdf.c#L28">bcrypt_pbkdf.c</a> says</p><div class="code"><div class="content"><div class="highlight"><pre>/*
* pkcs #5 pbkdf2 implementation using the "bcrypt" hash
*
* The bcrypt hash function is derived from the bcrypt password hashing
* function with the following modifications:
* 1. The input password and salt are preprocessed with SHA512.
* 2. The output length is expanded to 256 bits.
* 3. Subsequently the magic string to be encrypted is lengthened and modified
* to "OxychromaticBlowfishSwatDynamite"
* 4. The hash function is defined to perform 64 rounds of initial state
* expansion. (More rounds are performed by iterating the hash.)
*
* Note that this implementation pulls the SHA512 operations into the caller
* as a performance optimization.
*
* One modification from official pbkdf2. Instead of outputting key material
* linearly, we mix it. pbkdf2 has a known weakness where if one uses it to
* generate (e.g.) 512 bits of key material for use as two 256 bit keys, an
* attacker can merely run once through the outer loop, but the user
* always runs it twice. Shuffling output bytes requires computing the
* entirety of the key material to assemble any subkey. This is something a
* wise caller could do; we just do it for you.
*/
</pre></div> </div> </div><p>As you can see, this is intended to be a <code>pkcs #5 pbkdf2 implementation</code> that uses <code>bcrypt</code> as its underlying hash function. It also mentions some modifications, and it's worth noting that when you modify a standard you are not following the standard any more. I won't run through all the details of the implementation, though, as it's beyond the scope of the post.</p><p>So, the OpenSSH private key format ultimately contains a private key encrypted with a non-standard version of PBKDF2 that uses bcrypt as its core hash function. The structure that contains the key is not ASN.1, even though it's base64 encoded and wrapped between header and footer that are similar to the PEM ones. A description of the structure can be found in <a href="https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/PROTOCOL.key">https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/PROTOCOL.key</a>.</p><h3 id="cost-factor-and-rounds-1a31">Cost factor and rounds</h3><p>PBKDF2 uses the concept of <em>rounds</em> to make the key stretching slower. This is the number of times the hash function is called internally (using as salt the output of the previous iteration), so in PBKDF2 the number of rounds or iterations is directly proportional to the slowness of the stretching operation.</p><p>Bcrypt implements a similar mechanism with its <em>cost factor</em>. The cost factor in the standard bcrypt implementation is defined as the binary logarithm of the number of iterations of a specific part of the process (the repeated expansion of the password and the salt). Using the binary logarithm means that a cost factor of 4 (the minimum) corresponds to 16 iterations, while 31 (the maximum) corresponds to 2,147,483,648 (more than 2 billion) iterations.</p><p>In the OpenSSH/OpenBSD implementation things are a bit different.</p><p>OpenBSD's version of bcrypt runs with a fixed cost of 6, that creates 64 iterations of the key expansion (<a href="https://github.com/openbsd/src/blob/2207c4325726fdc5c4bcd0011af0fdf7d3dab137/sys/lib/libsa/bcrypt_pbkdf.c#L68">source code</a>), but being an implementation of PBKDF2 it can still be hardened increasing the number of rounds (<a href="https://github.com/openbsd/src/blob/2207c4325726fdc5c4bcd0011af0fdf7d3dab137/sys/lib/libsa/bcrypt_pbkdf.c#L139">source code</a>). Those rounds correspond to the value given to the parameter <code>-a</code> of the <code>ssh-keygen</code> command line.</p><h3 id="how-many-rounds-12df">How many rounds?</h3><p>When it comes to KDFs, the advice is always to run as much iterations as possible while keeping the specific application usable, so you need to tune your SSH keys testing different values in your system. To give you some rough estimations, Wikipedia mentions that for PBKDF2 the number of iterations used by Apple and Lastpass is between 2k and 100k. It is worth reiterating though that you shouldn't aim to use other people's figures, in this case. Instead, run tests of your software and hardware.</p><p>On my laptop, an i7-8565U with 32GiB of RAM running Kubuntu 20.04 I get the following results, which are pretty linear:</p><div class="code"><div class="content"><div class="highlight"><pre>ssh-keygen -a 100 -t ed25519 0.667s
ssh-keygen -a 500 -t ed25519 3.148s
ssh-keygen -a 1000 -t ed25519 6.331s
ssh-keygen -a 5000 -t ed25519 31.624s
</pre></div> </div> </div><p>A sensible value for me might be between 100 and 500, then, so that I don't have to wait too long every time I push and pull my branches from GitHub.</p><h2 id="can-we-convert-private-openssh-keys-into-pem-2c12">Can we convert private OpenSSH keys into PEM?<a class="headerlink" href="#can-we-convert-private-openssh-keys-into-pem-2c12" title="Permanent link">¶</a></h2><p>As OpenSSL doesn't understand the OpenSSH private keys format, a common question among programmers and devops is if it is possible to convert it into a PEM format. As you might have guessed reading the previous sections, the answer is no. The PEM format for private keys uses PKCS#5, so it supports only the standard implementation of PBKDF2.</p><p>It's interesting to note that the OpenSSL team also specifically decided not to support this new format as it is not standard (see <a href="https://github.com/openssl/openssl/issues/5323">https://github.com/openssl/openssl/issues/5323</a>).</p><h2 id="a-poorly-documented-format-2ea8">A poorly documented format<a class="headerlink" href="#a-poorly-documented-format-2ea8" title="Permanent link">¶</a></h2><p>PEM, PKCS #8, ASN.1, and all other formats that we use every day, included the OpenSSH public key format, are well documented and standardised in RFCs or similar documents. The OpenSSH private key format is documented in a tiny file that you can find in the source code, but doesn't offer more than a quick overview. To have a good understanding of what is going on I had to read the source code, not only of OpenSSH, but also of OpenBSD.</p><p>I think poor documentation like this might be acceptable in personal projects or in new tools, but SSH is used by the whole world, and when the team decides to come up with a completely new format for one of its most important elements I would expect them to detail every single bit of it, or at least try to be more open about the reasons and the implementation. I also personally believe that standards can't but benefit intercommunication between systems and, in cryptography, improve security, since they are reviewed and discussed by a wider audience.</p><p>The claim is that the new SSH private key format offers a better protection of keys at rest. I'd be very interested to see a cryptanalysis made by some expert (which I'm not). Cryptography is a tricky field, and often things that are apparently smart end up being tragically wrong.</p><h2 id="resources-edc5">Resources<a class="headerlink" href="#resources-edc5" title="Permanent link">¶</a></h2><ul><li>OpenSSL documentation: <a href="https://www.openssl.org/docs/man1.1.0/apps/asn1parse.html">asn1parse</a>, <a href="https://www.openssl.org/docs/man1.1.0/apps/genpkey.html">genpkey</a></li><li>The <a href="https://en.wikipedia.org/wiki/Base64">Base64</a> encoding</li><li>The Abstract Syntax Notation One <a href="https://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One">ASN.1</a> interface description language</li><li><a href="https://tools.ietf.org/html/rfc4251">RFC 4251 - The Secure Shell (SSH) Protocol Architecture</a></li><li><a href="https://tools.ietf.org/html/rfc4253">RFC 4253 - The Secure Shell (SSH) Transport Layer Protocol</a></li><li><a href="https://tools.ietf.org/html/rfc4716">RFC 4716 - The Secure Shell (SSH) Public Key File Format</a></li><li><a href="https://datatracker.ietf.org/doc/html/rfc2898">RFC 5208 - PKCS #5: Password-Based Cryptography Specification Version 2.0</a></li><li><a href="https://tools.ietf.org/html/rfc5208">RFC 5208 - Public-Key Cryptography Standards (PKCS) #8: Private-Key Information Syntax Specification Version 1.2</a></li><li><a href="https://tools.ietf.org/html/rfc5958">RFC 5958 - Asymmetric Key Packages</a></li><li><a href="https://tools.ietf.org/html/rfc7468">RFC 7468 - Textual Encodings of PKIX, PKCS, and CMS Structures</a></li></ul><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>Stop using tools as if they were solutions2021-05-25T09:00:00+01:002021-08-22T09:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2021-05-25:/blog/2021/05/25/stop-using-tools-as-if-they-were-solutions/<p><em>"I will write a class"</em>.</p><p>I can't tell you how many times I have heard this sentence from candidates during coding interviews.</p><p>What's wrong with this sentence? Nothing, out of context, but let me add this little detail: this is usually the first sentence I hear when the candidate tries to tackle a problem.</p><p>I know that coding interviews can be very stressful and I also think that leading such interviews requires a lot of effort to avoid transforming them into nitpicking sessions in which the candidate feels every single keystroke is scrutinised and analysed. As if the destiny of the whole company depended on how fast you can code a function that reverses a string!</p><p>But even taking into account interview anxiety, I think such an approach reveals something wrong deeper in the way we approach problems as programmers. This is the result of a culture that mistakes tools for solutions, and if I can detect it in senior programmers it means it already propagated into our teams and our companies.</p><h2 id="the-problem-solving-challenge-f589">The problem-solving challenge<a class="headerlink" href="#the-problem-solving-challenge-f589" title="Permanent link">¶</a></h2><p>When you face a problem (<em>any</em> problem) you need to devise a strategy to solve it. You need to have an idea of what to do before doing it, otherwise you are reacting and not acting.</p><p>When you practice any type of combat sport you train your body to react to specific inputs (attacks) with automatic reactions (defences, counterattacks), but you usually do it because in a real fight you don't have the time to make a conscious decision. Such "perfect" reactions, though, are the result of a constant and very focused effort to transform consciously selected actions into involuntary ones. Without training, a pure reaction is usually an average response at best.</p><p>In problem-solving we face the same challenge. Either we devise a strategy, or our approach will be clumsy and ultimately not efficient.</p><p>Imagine you were tasked to build a bridge between two sides of a river. Would your first concern be the specific type of hammers that the workers should use? After all, you can have ball-peen hammers, sledgehammers, brick hammers, and many other types. Choosing the wrong one might severely affect the performances of your workers.</p><p>That's hardly the first thing you should ask yourself. I'm pretty sure you agree that knowing the distance that the bridge should cover is much more urgent. Also, the type and the amount of traffic that it has to carry (walkers, cars, trucks, trains) is an important factor, and you should be concerned about the budget that you are allowed.</p><p>Why are these questions more important than the one about hammers? Because the answers to these questions can heavily influence the whole project. They are pillars of your architecture and not details<sup>[<a id="fr-d8d83501-1" href="#">1</a>]</sup>. My colleague Ken Pemberton always reminds me that most of the time we don't ask ourselves an even more important question: "What problem are you trying to solve?". In the example above, a bridge might not be the best solution in the first place.</p><p>I think the process, at least when it comes to software projects, can be divided into three connected phases: decomposition, communication, implementation.</p><h3 id="decomposition-a9dc">Decomposition</h3><p>Macroscopically, a <em>processing system</em> is made of an initial state, some transformations or intermediate states, and a final state.</p><p>Usually, it's simple to identify the initial and final state, while it's harder to describe what happens between the two. So, we need to proceed iteratively, describing the system using black boxes, and then opening each one of them, zooming in to describe what happens inside.</p><p>At any zoom level, from the 10,000 feet overview down to the description of a single function, you need to identify 4 things: the <strong>input</strong>, the <strong>output</strong>, the <strong>actors</strong> and the <strong>data flow</strong>.</p><p>The input is what enters the black box. It has usually been decided at a higher level of zoom or while discussing a component that provides it as output. So, it is given, and if it turns out to be inadequate we should take a step back in the design and question how we can provide proper input. The same is valid for the output.</p><p>The actors must be black boxes that accept data and transform it, and the data flow is how information is exchanged between the actors. This is clearly where it can take a long time to find a good solution, and we might need to go back and forth several times.</p><p>Let's look at an example. A search engine is a complicated piece of software, and implementing it is not a matter of 1 hour of work. But we can decompose it pretty easily, starting from the fact that the input of the system is a query, and that the output is an ordered set of results. So, my overview of this component is the following: the user inputs a query, the query is processed and the system returns a list of results, ordered by quality.</p><p>I didn't describe what "quality" is, nor discussed the specific implementation of the system that stores all possible results. Those details are buried down somewhere at a certain level of zoom and are utterly useless at this level.</p><h3 id="communication-1fc1">Communication</h3><p>Any level of zoom in the decomposition can be described, and the amount of specific technical knowledge needed to understand the explanation should be directly proportional to the zoom level. You might have heard the quote "You do not really understand something unless you can explain it to your grandmother." I believe this might be very offensive to grandmothers, but paraphrasing it, I would say that "There should be a zoom level at which the project is understandable by anyone who doesn't have a specific knowledge of the field".</p><p>Indeed, the problem of technical communication is that tech-savvy gurus are usually not able to decompose what they are working on into black boxes that are sufficiently abstract to be understandable by any human being. Please note this can happen to anyone, not only to programmers. I had to listen enough times to people working in banking, insurances, or project management (just to name a few different fields) to know that they can be unable to describe their job or specific aspects of it without using 4 obscure words every 5 words, the fifth one probably being a conjunction.</p><p>Being a blogger and an author I want to add a consideration about communication. Explaining things is the best way to see if everything is clear in your mind, which is another way to read the previous quote (without involving grandmothers). The very same post that you are reading started as an intuition, a small list of ideas, and so far has been rewritten 6 times. In the process, I understood the topics I am discussing much better than I did when I first felt the need to write them down.</p><h3 id="implementation-ef77">Implementation</h3><p>Professor Sidney Morris, in a <a href="https://www.youtube.com/watch?v=T1snRQEQuEk">very interesting video</a> about how to write proofs in mathematics, describes the process with these words:</p><div class="excerpt"><div class="content"><ul><li>Step 1: write down what we are given</li><li>Step 2: write down the definition of each technical term in what we are given</li><li>Step 3: write down what we are required to prove</li><li>Step 4: write down the definition of each technical term in what we are required to prove</li></ul>
<p>So these 4 steps are quite easy, quite straightforward.</p>
<p>The next step is not as easy</p>
<ul><li>Step 5: THINK!</li></ul></div></div><p>While we don't need to aim to the same level of formality required to mathematicians who prove theorems, we can surely keep the spirit of the process: write down and define what you have, write down and define what you want to achieve. Then, think.</p><p>We tend to take for granted that we can think, after all we do it all day long. But focusing our attention on a specific topic, giving it time, exploring it, considering questions about it, evaluating possible answers, all these things are increasingly unpopular. This is not the place for a critique of our society full of noise, where ideas, products, and works of art are watched for mere seconds before getting a like and passing into oblivion. But it is worth noting that thinking is <em>not</em> easy.</p><p>The implementation of a black box might require a lot of thinking, and we have to accept this. It might require a lot of rewrites, prove unsuccessful only after a certain amount of time, or even require a separate project to be properly managed. There are no shortcuts here.</p><h2 id="the-coding-interview-problem-9b56">The coding interview problem<a class="headerlink" href="#the-coding-interview-problem-9b56" title="Permanent link">¶</a></h2><p>What do we do during a coding interview? What are we trying to understand with this excruciating exercise that puts people in a pillory for one hour? </p><p>What we should do, in my opinion, is to <em>help the candidate to show how they solve problems</em>. We should facilitate a discussion along the lines of the three points that I mentioned: decomposition, communication, implementation. As you can see implementation is not avoided, it's a coding interview because there should be a part of it in which we write code, but it should be done only after we established a decomposition of the system.</p><p>I also believe that the assignment should be purposefully too complex to implement in a single one-hour session, and this should be explicitly communicated. This forces the candidate to design instead of rushing headlong into implementing the first requirement of the exercise without reading the rest. At any point, if the candidate is unable to implement a specific step, we can also move on to other steps and fake the input. This way we get many benefits:</p><ol><li>The candidate won't feel stressed by the need of showing how good they are at coding. The design part is a friendly chat, where suggestions can be made and specific technologies/solutions might be discarded if not known to the candidate.</li><li>They won't perceive the interview as a failure because they couldn't implement a single step or because they didn't complete the assignment in time.</li><li>We can explore the way the candidate communicates, the way they decompose complex processes, how well they understand problems and, eventually, how they write code.</li><li>We can adjust the level of difficulty of the interview or explore specific topics in detail just asking the candidate to focus on a specific detail.</li></ol><p>As an interviewer, I value the decomposition phase much more than the part in which you show me how well you remember all the functions of the Python standard library in a stressful situation. The truth is that I look them up very often and I don't look down on people because they don't remember the name of a method. I have one hour to decide if you are a good addition to the company, if you can be a good teammate for my next project, and if (possibly with some training) you can be given the responsibility for part of the system. In that hour I need to capture the main traits of your approach.</p><p>Don't get me wrong, I am a terrible nitpicker and probably on the brink of being OCD about some things, such as naming or tidiness of the code. But I try to take my own advice. What is the most important thing about you that I can understand? I think it would be extremely disappointing to discover that I hired someone who knows the standard library by heart but can't pick the right technology to complete a project before the deadline.</p><p>I understand that when you are interviewed you feel like you are in a position of weakness and that you are sitting there at the mercy of an evil interviewer whose purpose is only to uncover what you don't know. I'm sorry if you had to face such interviewers. I had to, and I understand the frustration. My advice is: always remember that working for a company is a matter of giving your time and your energy in exchange for personal growth. You might be interviewing for your dream job, but if the interviewer is not interested in you and your growth it's probably not that useful for you to work with them.</p><p>So, as a candidate, you have a responsibility to show the interviewer how you can solve problems. If you show how good you are at coding, you will impress only interviewers that are interested in your coding skills, and this is, in my opinion, a very limited part of what you can do as a programmer. You need to show that you can design, and this is independent of the level you are at.</p><p>You need to show that you understand problems, that you can compare solutions, that you can take your risks picking one specific strategy and that if needed you can stop at a certain point and say "This is the wrong approach".</p><h2 id="patterns-e366">Patterns<a class="headerlink" href="#patterns-e366" title="Permanent link">¶</a></h2><p>Design patterns are defined by Erich Gamma and his co-authors in their seminal book<sup>[<a id="fr-f70c86c9-2" href="#">2</a>]</sup> with these words: "[...] patterns solve specific design problems and make object-oriented designs more flexible, elegant, and ultimately reusable. [...] A designer who is familiar with such patterns can apply them immediately to design problems without having to rediscover them."</p><p>I want to focus on the words "solve specific design problems" because what I notice is that many people apply patterns without having understood the problem they are trying to solve. Even worse, they look at the world through the lens of the patterns they know, twisting the nature of problems to fit the solution they know.</p><p>Back to the original sentence. "I will write a class" is considered the go-to solution in OOP languages. What we believe is that, in an OOP language, whatever the problem, the solution is to write a class. So, our first move on the chessboard of the interview is to write a class. This is a dangerous misuse of a pattern such as data encapsulation, and an expert interviewer will checkmate us in one move. I saw candidates facing problems that could be solved in 10 minutes with two functions and a dictionary spending more than 50 minutes swamped in a multitude of classes, trying to figure out which object contained the data they needed at a certain point of the process.</p><p>Clearly, classes might be the best solution for some problems, but this should come at the end of your analysis. You write a class because you have data and functions that can be put together, and this is valid for any other technology. Always ask yourself: what is the reason why I use this? What is the problem that I'm trying to solve?</p><h2 id="a-dangerous-culture-3fdd">A dangerous culture<a class="headerlink" href="#a-dangerous-culture-3fdd" title="Permanent link">¶</a></h2><p>We all make the same mistake here: we push (or at least accept) a culture in which we teach and learn tools as go-to solutions without teaching to identify and face problems.</p><p>Programming languages, architectural patterns, algorithms. Those are all tools to implement solutions, they are not the solutions. You should learn them, down to the most minute details if you can, but never put them on the table before you understood the problem.</p><p>Alexis Carrel said, "A few observation and much reasoning lead to error; many observations and a little reasoning to truth."<sup>[<a id="fr-241c28c1-3" href="#">3</a>]</sup> The advice that I take from the French Nobel Prize winner is: what is in front of you has to be observed deeply to find out its real nature. What things are is much more important than what we think they are and how we think we should treat them ("reasoning"). And what things are, if observed properly, will also reveal ways to interact with them, to manipulate them, to solve them.</p><p>If you want a clear example of the opposite, observe a programmer (maybe you yourself) looking for help on an error the web framework or the compiler threw at them. Copy and paste the error message into Google, pick the first result (Stack Overflow), scroll down until you find some code, apply. I dare you to call this "engineering". Many times we don't even read the Stack Overflow question, we directly read the answer, not to mention the fact that many times we don't even read the error message!</p><p>I recommend reading a very interesting article by Joseph Gefroh, <a href="https://medium.com/swlh/why-your-technical-interview-is-broken-and-how-to-fix-it-7004da002aa8">Why Your Technical Interview Is Broken, and How to Fix It</a>, where he discusses the various types of skills that you can explore during an interview, and which ones you should be interested in. In particular, I couldn't agree more with his point about algorithmic interviews, as I believe they are deeply flawed.</p><p>I also recommend having a look at the <a href="https://github.com/guardian/coding-exercises">Guardian Coding Exercises</a> and to read the description of the repository. I think they are a good example of tests that allow the candidate and the interviewer to work together, to actually meet and to discuss a solution. There is no "right" way to solve them, and many of them cannot be solved in 45 minutes, which is usually the time given to a candidate after an initial introductory chat.</p><h2 id="conclusion-506c">Conclusion<a class="headerlink" href="#conclusion-506c" title="Permanent link">¶</a></h2><p>I hope these short considerations helped you to see my point. We should all shift our gaze from the tools we have to the nature of problems and to their solutions. We are missing an important step here, which is ultimately what defines a good engineer and which is the most important thing that you can learn in your career. Observe problems, stop and think, devise a strategy, zoom out and zoom in. Learn to use tools, don't be used by them.</p><p>We need to push for this approach in our interviews, but also try to promote this culture in our teams and companies.</p><hr><div id="_footnotes"><div id=""><a href="#fr-d8d83501-1">1</a> <p>See "What is a software architecture?" in <a href="https://www.thedigitalcatbooks.com">Clean Architectures in Python</a>.</p></div><div id=""><a href="#fr-f70c86c9-2">2</a> <p><em>Design Patterns: Elements of Reusable Object-Oriented Software</em> by Gamma, Vlissides, Johnson, and Helm</p></div><div id=""><a href="#fr-241c28c1-3">3</a> <p><em>Réflexions sur la vie</em>, Paris, 1952</p></div></div><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>How to write a Pelican theme for your static website2021-03-25T14:00:00+01:002021-03-25T14:00:00+01:00Leonardo Giordanitag:www.thedigitalcatonline.com,2021-03-25:/blog/2021/03/25/how-to-write-a-pelican-theme-for-your-static-website/<p>A full-fledged tutorial that shows how to convert an HTML template into a Pelican theme</p><p>I run The Digital Cat using a static site generator called <a href="https://blog.getpelican.com/">Pelican</a>, created by my friend <a href="https://justinmayer.com/">Justin Mayer</a> and actively maintained by him and other developers. I also gave some minor contributions to the project.</p><p>Since I started working on the blog in 2013 I run a great length to customise the theme that I use. I initially went for a pre-made Pelican theme, but I soon started to change small things, and eventually ended up creating a whole new theme that suits my needs.</p><p>Front-end development if not my forte, though, so I didn't want to start from scratch with HTML, CSS and JS. My knowledge of those tools is limited, and I have other interests, so I started from a free (CC BY 3.0) pre-made HTML5 template created by <a href="https://html5up.net/">HTML5 UP</a>. You can see <a href="https://html5up.net/editorial">a demo of the original template</a> and compare it with what you see on this very page.</p><p>Encouraged by Justin, I decided to write down this initial guide on how to port a Pelican theme from a static template. I will show you how to start a blog from scratch, how to get a static template and how to make it usable by Pelican. Everything done step by step without skipping any passage. At the end of the post you will <strong>have a running blog with some demo articles</strong>, you will <strong>have learned how to use the Jinja language and the Pelican variables</strong>, and you will <strong>have an idea of what to do next to further customise your static website</strong>.</p><p>Let's start!</p><h2 id="initial-setup-7d57">Initial setup<a class="headerlink" href="#initial-setup-7d57" title="Permanent link">¶</a></h2><p>Let's create a blog called The Analog Fox, following <a href="https://docs.getpelican.com/en/latest/quickstart.html">Pelican's quickstart guide</a>.</p><p>I created a virtual environment and installed Pelican as suggested, then run</p><div class="code"><div class="content"><div class="highlight"><pre>mkdir theanalogfox
cd theanalogfox
pelican-quickstart
</pre></div> </div> </div><p>For this project I will only run the blog locally, so I didn't configure any specific way to publish it, neither properly set up a URL prefix. If you are about to create a real website please read Pelican's documentation about those settings.</p><div class="code"><div class="content"><div class="highlight"><pre>> Where do you want to create your new web site? [.]
<span class="hll">> What will be the title of this web site? The Analog Fox
</span><span class="hll">> Who will be the author of this web site? Leonardo Giordani
</span>> What will be the default language of this web site? [en]
<span class="hll">> Do you want to specify a URL prefix? e.g., https://example.com (Y/n) n
</span>> Do you want to enable article pagination? (Y/n)
<span class="hll">> How many articles per page do you want? [10] 3
</span>> What is your time zone? [Europe/Paris]
> Do you want to generate a tasks.py/Makefile to automate generation and publishing? (Y/n)
> Do you want to upload your website using FTP? (y/N)
> Do you want to upload your website using SSH? (y/N)
> Do you want to upload your website using Dropbox? (y/N)
> Do you want to upload your website using S3? (y/N)
> Do you want to upload your website using Rackspace Cloud Files? (y/N)
> Do you want to upload your website using GitHub Pages? (y/N)
Done. Your new project is available at /home/leo/devel/theanalogfox
</pre></div> </div> </div><p>If you run <code>pelican -lr</code> now and visit <a href="http://localhost:8000">http://localhost:8000</a> with your browser you will see the first page of the blog rendered with the default theme.</p><h2 id="demo-content-a90a">Demo content<a class="headerlink" href="#demo-content-a90a" title="Permanent link">¶</a></h2><p>Before we venture into the jungle of Jinja templates it's worth creating some content. As this is a very boring activity I prepared a little script that you can run in the terminal.</p><div class="code"><div class="content"><div class="highlight"><pre><span class="ch">#!/bin/bash</span>
<span class="nv">NUM_POSTS</span><span class="o">=</span><span class="m">20</span>
<span class="nv">CONTENT_DIR</span><span class="o">=</span>content
<span class="nv">LOREM_API</span><span class="o">=</span>https://jaspervdj.be/lorem-markdownum/markdown.txt
<span class="nv">IMAGES_API</span><span class="o">=</span>https://placeimg.com/1000/341/animals
rm<span class="w"> </span>-fR<span class="w"> </span>content
mkdir<span class="w"> </span>-p<span class="w"> </span>content/images
<span class="k">for</span><span class="w"> </span>i<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="k">$(</span>seq<span class="w"> </span>-w<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="si">${</span><span class="nv">NUM_POSTS</span><span class="si">}</span><span class="k">)</span>
<span class="k">do</span>
<span class="w"> </span><span class="nv">post_file</span><span class="o">=</span><span class="si">${</span><span class="nv">CONTENT_DIR</span><span class="si">}</span>/post<span class="si">${</span><span class="nv">i</span><span class="si">}</span>.markdown
<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"Creating post </span><span class="si">${</span><span class="nv">i</span><span class="si">}</span><span class="s2">"</span>
<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"Title: A sample article </span><span class="si">${</span><span class="nv">i</span><span class="si">}</span><span class="s2">"</span><span class="w"> </span>>><span class="w"> </span><span class="si">${</span><span class="nv">post_file</span><span class="si">}</span>
<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"Date: 2021-03-</span><span class="si">${</span><span class="nv">i</span><span class="si">}</span><span class="s2">"</span><span class="w"> </span>>><span class="w"> </span><span class="si">${</span><span class="nv">post_file</span><span class="si">}</span>
<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"Category: News"</span><span class="w"> </span>>><span class="w"> </span><span class="si">${</span><span class="nv">post_file</span><span class="si">}</span>
<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"Tags: </span><span class="k">$(</span>seq<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">20</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>shuf<span class="w"> </span><span class="p">|</span><span class="w"> </span>head<span class="w"> </span>-n3<span class="w"> </span><span class="p">|</span><span class="w"> </span>sed<span class="w"> </span>-r<span class="w"> </span>s,<span class="s2">"^"</span>,<span class="s2">"tag"</span>,<span class="w"> </span><span class="p">|</span><span class="w"> </span>paste<span class="w"> </span>-sd<span class="w"> </span><span class="s2">","</span><span class="w"> </span>-<span class="k">)</span><span class="s2">"</span><span class="w"> </span>>><span class="w"> </span><span class="si">${</span><span class="nv">post_file</span><span class="si">}</span>
<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"Image: post</span><span class="si">${</span><span class="nv">i</span><span class="si">}</span><span class="s2">.jpg"</span><span class="w"> </span>>><span class="w"> </span><span class="si">${</span><span class="nv">post_file</span><span class="si">}</span>
<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"Summary: Summary of post </span><span class="si">${</span><span class="nv">i</span><span class="si">}</span><span class="s2">"</span><span class="w"> </span>>><span class="w"> </span><span class="si">${</span><span class="nv">post_file</span><span class="si">}</span>
<span class="w"> </span><span class="nb">echo</span><span class="w"> </span>>><span class="w"> </span><span class="si">${</span><span class="nv">post_file</span><span class="si">}</span>
<span class="w"> </span>curl<span class="w"> </span>-s<span class="w"> </span><span class="si">${</span><span class="nv">LOREM_API</span><span class="si">}</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>sed<span class="w"> </span>-r<span class="w"> </span>s,<span class="s2">"^#"</span>,<span class="s2">"##"</span>,<span class="w"> </span>>><span class="w"> </span><span class="si">${</span><span class="nv">post_file</span><span class="si">}</span>
<span class="w"> </span>curl<span class="w"> </span>-s<span class="w"> </span><span class="si">${</span><span class="nv">IMAGES_API</span><span class="si">}</span><span class="w"> </span>><span class="w"> </span><span class="si">${</span><span class="nv">CONTENT_DIR</span><span class="si">}</span>/images/post<span class="si">${</span><span class="nv">i</span><span class="si">}</span>.jpg
<span class="k">done</span>
</pre></div> </div> </div><p>Save it as <code>create_content.sh</code> and give it execution permissions with <code>chmod 775 create_content.sh</code>. At this point you can run it with <code>./create_content.sh</code> and it will create the directory <code>content</code> with 20 posts and an image for each of them. You can safely run it multiple times, it will automatically delete the previous output.</p><p>If you know bash feel free to hack the script to do something more complicated, but this very simple program does everything we need to work on Pelican themes.</p><p>Running <code>pelican -lr</code> and visiting <a href="http://localhost:8000">http://localhost:8000</a> will now show a richer website.</p>
<div class="advertisement">
<a href="https://www.thedigitalcat.academy/freebie-first-class-objects">
<img src="/images/first-class-objects/cover.jpg" />
</a>
<div class="body">
<h2 id="first-class-objects-in-python-fffa">First-class objects in Python<a class="headerlink" href="#first-class-objects-in-python-fffa" title="Permanent link">¶</a></h2>
<p>Higher-order functions, wrappers, and factories</p>
<p>Learn all you need to know to understand first-class citizenship in Python, the gateway to grasp how decorators work and how functional programming can supercharge your code.</p>
<div class="actions">
<a class="action" href="https://www.thedigitalcat.academy/freebie-first-class-objects">Get your FREE copy</a>
</div>
</div>
</div>
<h2 id="the-template-f1ec">The template<a class="headerlink" href="#the-template-f1ec" title="Permanent link">¶</a></h2><p>From now on I will make extensive use of the documentation at <a href="https://docs.getpelican.com/en/latest/themes.html#creating-themes">https://docs.getpelican.com/en/latest/themes.html#creating-themes</a>, so please be sure to have that page open in your browser.</p><p>For this tutorial I will use the template "Future Imperfect" by <a href="https://html5up.net/">HTML5 UP</a>. The template can be seen in action <a href="https://html5up.net/future-imperfect">at this page</a>, and you can download it using the button in the top right corner of the page itself.</p><p>Please consider supporting HTML UP even only with a Tweet. Being a content creator myself I know how important it can be to receive any type of feedback from readers/users.</p><p>Let's have a quick look at the template before we dive into the core of the post. We have a navbar at the top of the screen, with a link to the homepage, several links to specific pages, a search button, and a menu. In the body of the page there is a sidebar on the left and a preview of the articles on the right.</p><p>The sidebar contains the title and the subtitle of the blog, two lists of posts, the about section, and some social buttons. The first list of posts features image, title, date, and the avatar of the author, while the second list has just a small thumbnail, title, and date. Each post in the main list shows the full image, title, subtitle, name and avatar of the author, publication date, a preview of the content of the article, and a button that links the full version of the article. Last, tags are listed at the bottom right, just next to the number of likes and comments.</p><p>Just to be clear from the start, I won't implement everything we see here in my Pelican theme. I won't touch the navbar, and I won't discuss likes and comments, which require external systems when it comes to static sites. I will also simplify the sidebar, using only one list of posts. Moreover, I will not preview the articles in the main page, but print the full content.</p><p>Unzip the template archive in a subdirectory of the blog directory called <code>future-imperfect</code>. The archive doesn't contains a root folder, so you need to create it explicitly.</p><p>Enter the theme directory and change the layout of the files to follow <a href="https://docs.getpelican.com/en/latest/themes.html#structure">Pelican's requirements</a>:</p><div class="code"><div class="content"><div class="highlight"><pre>mv assets/ static
mv images/ static/
mkdir templates
mv *.html templates/
</pre></div> </div> </div><p>At this point edit the file <code>pelicanconf.py</code> in the main directory of the blog, adding the variable <code>THEME</code></p><div class="code"><div class="title"><code>pelicanconf.py</code></div><div class="content"><div class="highlight"><pre><span class="n">PATH</span> <span class="o">=</span> <span class="s1">'content'</span>
<span class="hll"><span class="n">THEME</span> <span class="o">=</span> <span class="s2">"future-imperfect"</span>
</span>
<span class="n">TIMEZONE</span> <span class="o">=</span> <span class="s1">'Europe/Paris'</span>
<span class="n">DEFAULT_LANG</span> <span class="o">=</span> <span class="s1">'en'</span>
</pre></div> </div> </div><p>If you refresh the blog page now you will see that the output doesn't even have a working style sheet, but don't worry, Pelican is still working correctly. We are overriding Pelican's output with the file <code>future-imperfect/templates/index.html</code>, which is supposed to be a Jinja template, but being part of the HTML5 template is just injecting static content. In particular, the CSS/JS assets are not loaded correctly, as you can see.</p><p>Let's learn the first piece of syntax adjusting the CSS and JS links, then, so that we can at least have a good output to look at. We need to change the path <code>assets/</code> with <code>{{ SITEURL }}/theme/</code></p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre><span class="cp"><!DOCTYPE HTML></span>
<span class="cm"><!--</span>
<span class="cm"> Future Imperfect by HTML5 UP</span>
<span class="cm"> html5up.net | @ajlkn</span>
<span class="cm"> Free for personal and commercial use under the CCA 3.0 license (html5up.net/license)</span>
<span class="cm"> --></span>
<span class="p"><</span><span class="nt">html</span><span class="p">></span>
<span class="p"><</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">title</span><span class="p">></span>Future Imperfect by HTML5 UP<span class="p"></</span><span class="nt">title</span><span class="p">></span>
<span class="p"><</span><span class="nt">meta</span> <span class="na">charset</span><span class="o">=</span><span class="s">"utf-8"</span> <span class="p">/></span>
<span class="p"><</span><span class="nt">meta</span> <span class="na">name</span><span class="o">=</span><span class="s">"viewport"</span> <span class="na">content</span><span class="o">=</span><span class="s">"width=device-width, initial-scale=1, user-scalable=no"</span> <span class="p">/></span>
<span class="hll"> <span class="p"><</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">"stylesheet"</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/css/main.css"</span> <span class="p">/></span>
</span> <span class="p"></</span><span class="nt">head</span><span class="p">></span>
[...]
<span class="hll"> <span class="p"><</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/js/jquery.min.js"</span><span class="p">></</span><span class="nt">script</span><span class="p">></span>
</span><span class="hll"> <span class="p"><</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/js/browser.min.js"</span><span class="p">></</span><span class="nt">script</span><span class="p">></span>
</span><span class="hll"> <span class="p"><</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/js/breakpoints.min.js"</span><span class="p">></</span><span class="nt">script</span><span class="p">></span>
</span><span class="hll"> <span class="p"><</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/js/util.js"</span><span class="p">></</span><span class="nt">script</span><span class="p">></span>
</span><span class="hll"> <span class="p"><</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/js/main.js"</span><span class="p">></</span><span class="nt">script</span><span class="p">></span>
</span>
<span class="p"></</span><span class="nt">body</span><span class="p">></span>
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</pre></div> </div> </div><p>We also need to correctly link images. Change any occurrence of <code>images/</code> into <code>{{ SITEURL}}/theme/images/</code>, e.g.</p><div class="code"><div class="content"><div class="highlight"><pre> <span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"meta"</span><span class="p">></span>
<span class="p"><</span><span class="nt">time</span> <span class="na">class</span><span class="o">=</span><span class="s">"published"</span> <span class="na">datetime</span><span class="o">=</span><span class="s">"2015-11-01"</span><span class="p">></span>November 1, 2015<span class="p"></</span><span class="nt">time</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"author"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"name"</span><span class="p">></span>Jane Doe<span class="p"></</span><span class="nt">span</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/avatar.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
</span> <span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span> <span class="na">class</span><span class="o">=</span><span class="s">"image featured"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/pic01.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
</pre></div> </div> </div><p>If you refresh the page after these changes you will see the template fully rendered (minus the images you saw in the demo, those are replaced by placeholders in the downloaded version).</p><p>A little trick: if you remove the comments at lines 177 and 487 you will get a nice recap of the graphical components of the template. I will not use them, so I removed lines 176-487, but remember that those cheat sheets can be very useful when trying to understand how a template works.</p><p>As I mentioned earlier, I also removed the third list of posts, as it doesn't add anything to what we will learn. You are clearly free to keep it and experiment with it.</p><h3 id="deep-dive-e364">Deep dive</h3><p>How does <code>{{ SITEURL }}/theme</code> work?</p><p>Pelican's <a href="https://docs.getpelican.com/en/latest/settings.html#themes">documentation on themes</a> says</p><div class="callout"><div class="content"><p><code>THEME_STATIC_DIR = 'theme'</code></p>
<p>Destination directory in the output path where Pelican will place the files collected from THEME<em>STATIC</em>PATHS. Default is theme.</p></div></div><p>the variable <code>THEME_STATIC_PATHS</code> is by default <code>static</code>, which is why we created that directory inside the theme.</p><p>As you can see all these paths are configurable, should you prefer different names.</p><h2 id="pelican-variables-db40">Pelican variables<a class="headerlink" href="#pelican-variables-db40" title="Permanent link">¶</a></h2><p>As I mentioned, we are currently overriding Pelican's output with a static template. What we want to do is to inject values known to Pelican into the template itself, be those static variables or more dynamic items like articles, tags, and images.</p><p>To do this, Pelican uses <a href="https://jinja.palletsprojects.com/en/2.11.x/">Jinja</a>, a widely adopted template engine written in Python. If you want to fully understand how to create Pelican themes, then, you need to learn Jinja. Don't worry, it's not complicated, and since Jinja uses Python you will catch up very quickly. I won't get into details about the Jinja syntax that I will use, please check out the <a href="https://jinja.palletsprojects.com/en/2.11.x/">Jinja documentation</a> if you have any doubts.</p><p>We actually already used Pelican's variables and Jinja templates when we prefixed links with <code>{{ SITEURL }}</code>. Aside from that, however, the first and simplest variable injection for our template are title and subtitle.</p><h3 id="title-dfde">Title</h3><p>The Pelican variable we are interested in is <code>SITENAME</code>, which has been initialised by the quickstart script as you can see in the configuration file</p><div class="code"><div class="title"><code>pelicanconf.py</code></div><div class="content"><div class="highlight"><pre><span class="n">SITENAME</span> <span class="o">=</span> <span class="s2">"The Analog Fox"</span>
</pre></div> </div> </div><p>We need to replace the static text with this variable three times: in the tag <code>&lt;title&gt;</code>, in the navigation bar and in the header at the top of the sidebar.</p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre><span class="p"><</span><span class="nt">html</span><span class="p">></span>
<span class="p"><</span><span class="nt">head</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">title</span><span class="p">></span><span class="cp">{{</span> <span class="nv">SITENAME</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">title</span><span class="p">></span>
</span> <span class="p"><</span><span class="nt">meta</span> <span class="na">charset</span><span class="o">=</span><span class="s">"utf-8"</span> <span class="p">/></span>
<span class="p"><</span><span class="nt">meta</span> <span class="na">name</span><span class="o">=</span><span class="s">"viewport"</span> <span class="na">content</span><span class="o">=</span><span class="s">"width=device-width, initial-scale=1, user-scalable=no"</span> <span class="p">/></span>
<span class="p"><</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">"stylesheet"</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/css/main.css"</span> <span class="p">/></span>
<span class="p"></</span><span class="nt">head</span><span class="p">></span>
<span class="p"><</span><span class="nt">body</span> <span class="na">class</span><span class="o">=</span><span class="s">"is-preload"</span><span class="p">></span>
<span class="cm"><!-- Wrapper --></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">"wrapper"</span><span class="p">></span>
<span class="cm"><!-- Header --></span>
<span class="p"><</span><span class="nt">header</span> <span class="na">id</span><span class="o">=</span><span class="s">"header"</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">h1</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"index.html"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">SITENAME</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h1</span><span class="p">></span>
</span> <span class="p"><</span><span class="nt">nav</span> <span class="na">class</span><span class="o">=</span><span class="s">"links"</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span><span class="p">></span>Lorem<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
[...]
<span class="cm"><!-- Sidebar --></span>
<span class="p"><</span><span class="nt">section</span> <span class="na">id</span><span class="o">=</span><span class="s">"sidebar"</span><span class="p">></span>
<span class="cm"><!-- Intro --></span>
<span class="p"><</span><span class="nt">section</span> <span class="na">id</span><span class="o">=</span><span class="s">"intro"</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"logo"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/logo.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">h2</span><span class="p">></span><span class="cp">{{</span> <span class="nv">SITENAME</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">h2</span><span class="p">></span>
</span> <span class="p"><</span><span class="nt">p</span><span class="p">></span>Another fine responsive site template by <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"http://html5up.net"</span><span class="p">></span>HTML5 UP<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">p</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="p"></</span><span class="nt">section</span><span class="p">></span>
</pre></div> </div> </div><h3 id="subtitle-1f67">Subtitle</h3><p>Pelican provides support even for the subtitle, but that wasn't filled in by the setup script for us, so we need to create the variable in the configuration file</p><div class="code"><div class="title"><code>pelicanconf.py</code></div><div class="content"><div class="highlight"><pre><span class="n">SITENAME</span> <span class="o">=</span> <span class="s2">"The Analog Fox"</span>
<span class="n">SITESUBTITLE</span> <span class="o">=</span> <span class="s2">"A great blog about old stuff"</span>
</pre></div> </div> </div><p>Once this is done we can insert the variable at the top of the sidebar, just under the title</p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre> <span class="p"><</span><span class="nt">section</span> <span class="na">id</span><span class="o">=</span><span class="s">"intro"</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"logo"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/logo.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">></span><span class="cp">{{</span> <span class="nv">SITENAME</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">h2</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">p</span><span class="p">></span><span class="cp">{{</span> <span class="nv">SITESUBTITLE</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">p</span><span class="p">></span>
</span> <span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="p"></</span><span class="nt">section</span><span class="p">></span>
</pre></div> </div> </div><p>Marvellous! Now the page should show the title of the blog in the window header, announcing to the world the The Analog Fox is ready to take over the world of vintage!</p><p>OK, I might be a bit overexcited, but I love when plans come together ;)</p><h3 id="deep-dive-e364">Deep dive</h3><p>Pelican passes the whole configuration file to the template, together with the parsed content of the site itself, so you are free to use any variable, should you need them, or to introduce new ones (which we will do in the next section).</p><p>For now, just to familiarise with the concept, you might try to add <code>TIMEZONE</code> under the subtitle</p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre> <span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">></span><span class="cp">{{</span> <span class="nv">SITENAME</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">h2</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span><span class="cp">{{</span> <span class="nv">SITESUBTITLE</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span><span class="cp">{{</span> <span class="nv">TIMEZONE</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
</pre></div> </div> </div><p>I don't think this specific change is really useful, but it's good to remember that all those variables are available.</p><h2 id="social-buttons-d456">Social buttons<a class="headerlink" href="#social-buttons-d456" title="Permanent link">¶</a></h2><p>The template has a section for social buttons under the sidebar, and this is a great use case for a bit of advanced usage of the configuration file.</p><p>Pelican has a native support for social links, as you can see from the <code>SOCIAL</code> variable in <code>pelicanconf.py</code>. For the sake of showing you that you are free to define custom variables in that file and use them I will however ignore it and go with something richer. The template uses nice icons to represent the links, so I'd like to include that information.</p><p>The section of the template that renders those buttons is</p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre> <span class="cm"><!-- Footer --></span>
<span class="p"><</span><span class="nt">section</span> <span class="na">id</span><span class="o">=</span><span class="s">"footer"</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"icons"</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"icon brands fa-twitter"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"label"</span><span class="p">></span>Twitter<span class="p"></</span><span class="nt">span</span><span class="p">></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"icon brands fa-facebook-f"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"label"</span><span class="p">></span>Facebook<span class="p"></</span><span class="nt">span</span><span class="p">></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"icon brands fa-instagram"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"label"</span><span class="p">></span>Instagram<span class="p"></</span><span class="nt">span</span><span class="p">></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"icon solid fa-rss"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"label"</span><span class="p">></span>RSS<span class="p"></</span><span class="nt">span</span><span class="p">></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"icon solid fa-envelope"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"label"</span><span class="p">></span>Email<span class="p"></</span><span class="nt">span</span><span class="p">></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span> <span class="na">class</span><span class="o">=</span><span class="s">"copyright"</span><span class="p">></span><span class="ni">&copy;</span> Untitled. Design: <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"http://html5up.net"</span><span class="p">></span>HTML5 UP<span class="p"></</span><span class="nt">a</span><span class="p">></span>. Images: <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"http://unsplash.com"</span><span class="p">></span>Unsplash<span class="p"></</span><span class="nt">a</span><span class="p">></span>.<span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"></</span><span class="nt">section</span><span class="p">></span>
</pre></div> </div> </div><p>Lists are one of the most common patterns when writing templates, as they usually become just a simple for loop. let's start writing down the data, then we will learn how to render the buttons.</p><p>As I mentioned, I created a new variable <code>CONTACTS</code></p><div class="code"><div class="title"><code>pelicanconf.py</code></div><div class="content"><div class="highlight"><pre><span class="c1"># Social widget</span>
<span class="n">SOCIAL</span> <span class="o">=</span> <span class="p">(</span>
<span class="p">(</span><span class="s2">"You can add links in your config file"</span><span class="p">,</span> <span class="s2">"#"</span><span class="p">),</span>
<span class="p">(</span><span class="s2">"Another social link"</span><span class="p">,</span> <span class="s2">"#"</span><span class="p">),</span>
<span class="p">)</span>
<span class="hll"><span class="n">CONTACTS</span> <span class="o">=</span> <span class="p">[</span>
</span><span class="hll"> <span class="p">(</span><span class="s2">"Twitter"</span><span class="p">,</span> <span class="s2">"twitter"</span><span class="p">,</span> <span class="s2">"https://twitter.com/theanalogfox"</span><span class="p">),</span>
</span><span class="hll"> <span class="p">(</span><span class="s2">"Facebook"</span><span class="p">,</span> <span class="s2">"facebook-f"</span><span class="p">,</span> <span class="s2">"https://facebook.com/theanalogfox"</span><span class="p">),</span>
</span><span class="hll"> <span class="p">(</span><span class="s2">"Instagram"</span><span class="p">,</span> <span class="s2">"instagram"</span><span class="p">,</span> <span class="s2">"https://www.instagram.com/theanalogfox/"</span><span class="p">),</span>
</span><span class="hll"> <span class="p">(</span><span class="s2">"Email"</span><span class="p">,</span> <span class="s2">"envelope"</span><span class="p">,</span> <span class="s2">"info@theanalogfox.com"</span><span class="p">),</span>
</span><span class="hll"><span class="p">]</span>
</span>
<span class="n">DEFAULT_PAGINATION</span> <span class="o">=</span> <span class="mi">3</span>
</pre></div> </div> </div><p>that is a list of tuples, each one including the title of the button, the name of the icon (Font Awesome), and the link itself. Now we can replace the snippet of code above with this</p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre> <span class="cm"><!-- Footer --></span>
<span class="p"><</span><span class="nt">section</span> <span class="na">id</span><span class="o">=</span><span class="s">"footer"</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"icons"</span><span class="p">></span>
<span class="hll"> <span class="cp">{%</span> <span class="k">for</span> <span class="nv">name</span><span class="o">,</span> <span class="nv">icon</span><span class="o">,</span> <span class="nv">link</span> <span class="k">in</span> <span class="nv">CONTACTS</span> <span class="cp">%}</span>
</span><span class="hll"> <span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">link</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">class</span><span class="o">=</span><span class="s">"icon brands fa-</span><span class="cp">{{</span> <span class="nv">icon</span> <span class="cp">}}</span><span class="s">"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"label"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">name</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">span</span><span class="p">></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
</span><span class="hll"> <span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
</span> <span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span> <span class="na">class</span><span class="o">=</span><span class="s">"copyright"</span><span class="p">></span><span class="ni">&copy;</span> Untitled. Design: <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"http://html5up.net"</span><span class="p">></span>HTML5 UP<span class="p"></</span><span class="nt">a</span><span class="p">></span>. Images: <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"http://unsplash.com"</span><span class="p">></span>Unsplash<span class="p"></</span><span class="nt">a</span><span class="p">></span>.<span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"></</span><span class="nt">section</span><span class="p">></span>
</pre></div> </div> </div><p>As you can see, the core of the snippet is a <code>for</code> loop that uses Python's unpacking to assign <code>name</code>, <code>icon</code>, and <code>link</code>. The variables are then used directly in the HTML as we did before. Remember that Jinja doesn't understand HTML, it just blindly replaces the variables in a text file, which allows us to perform nice tricks like <code>fa-{{ icon }}</code> to use the right icon for each social button.</p><h2 id="articles-16fa">Articles<a class="headerlink" href="#articles-16fa" title="Permanent link">¶</a></h2><p>Now that we introduced loops and variables we have all the tools we need to work on the two lists of articles. Let's first change the list in the main body of the page, the one in the sidebar will then receive the very same treatment.</p><p>First of all, I reduced the static list of posts to a single one</p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre> <span class="cm"><!-- Main --></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">"main"</span><span class="p">></span>
<span class="cm"><!-- Post --></span>
<span class="p"><</span><span class="nt">article</span> <span class="na">class</span><span class="o">=</span><span class="s">"post"</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"title"</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span><span class="p">></span>Magna sed adipiscing<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h2</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span>Lorem ipsum dolor amet nullam consequat etiam feugiat<span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"meta"</span><span class="p">></span>
<span class="p"><</span><span class="nt">time</span> <span class="na">class</span><span class="o">=</span><span class="s">"published"</span> <span class="na">datetime</span><span class="o">=</span><span class="s">"2015-11-01"</span><span class="p">></span>November 1, 2015<span class="p"></</span><span class="nt">time</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"author"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"name"</span><span class="p">></span>Jane Doe<span class="p"></</span><span class="nt">span</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/avatar.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span> <span class="na">class</span><span class="o">=</span><span class="s">"image featured"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/pic01.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span>Mauris neque quam, fermentum ut nisl vitae, convallis maximus nisl. Sed mattis nunc id lorem euismod placerat. Vivamus porttitor magna enim, ac accumsan tortor cursus at. Phasellus sed ultricies mi non congue ullam corper. Praesent tincidunt sed tellus ut rutrum. Sed vitae justo condimentum, porta lectus vitae, ultricies congue gravida diam non fringilla.<span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"><</span><span class="nt">footer</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"actions"</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button large"</span><span class="p">></span>Continue Reading<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"stats"</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span><span class="p">></span>General<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"icon solid fa-heart"</span><span class="p">></span>28<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"icon solid fa-comment"</span><span class="p">></span>128<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"></</span><span class="nt">footer</span><span class="p">></span>
<span class="p"></</span><span class="nt">article</span><span class="p">></span>
<span class="cm"><!-- Pagination --></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"actions pagination"</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">""</span> <span class="na">class</span><span class="o">=</span><span class="s">"disabled button large previous"</span><span class="p">></span>Previous Page<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button large next"</span><span class="p">></span>Next Page<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
</pre></div> </div> </div><p>Please note that your content will be slightly different as it has been randomly generated.</p><p>The list of Pelican variables we can access is available at <a href="https://docs.getpelican.com/en/latest/themes.html#index-html">https://docs.getpelican.com/en/latest/themes.html#index-html</a> (variables for the page <code>index.html</code>) and <a href="https://docs.getpelican.com/en/latest/themes.html#article">https://docs.getpelican.com/en/latest/themes.html#article</a> (attributes of <code>Article</code> objects).</p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre><span class="hll"> <span class="cp">{%</span> <span class="k">for</span> <span class="nv">article</span> <span class="k">in</span> <span class="nv">articles</span> <span class="cp">%}</span>
</span> <span class="cm"><!-- Post --></span>
<span class="p"><</span><span class="nt">article</span> <span class="na">class</span><span class="o">=</span><span class="s">"post"</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"title"</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">h2</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.title</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h2</span><span class="p">></span>
</span><span class="hll"> <span class="p"><</span><span class="nt">p</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.summary</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">p</span><span class="p">></span>
</span> <span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"meta"</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">time</span> <span class="na">class</span><span class="o">=</span><span class="s">"published"</span> <span class="na">datetime</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%Y-%m-%d'</span><span class="o">)</span> <span class="cp">}}</span><span class="s">"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%b %-d, %Y'</span><span class="o">)</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">time</span><span class="p">></span>
</span><span class="hll"> <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"author"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"name"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.author</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">span</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/avatar.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
</span> <span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span> <span class="na">class</span><span class="o">=</span><span class="s">"image featured"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/pic01.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="hll"> <span class="cp">{{</span> <span class="nv">article.content</span> <span class="cp">}}</span>
</span> <span class="p"><</span><span class="nt">footer</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"actions"</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button large"</span><span class="p">></span>Continue Reading<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"stats"</span><span class="p">></span>
<span class="hll"> <span class="cp">{%</span> <span class="k">for</span> <span class="nv">tag</span> <span class="k">in</span> <span class="nv">article.tags</span> <span class="cp">%}</span>
</span><span class="hll"> <span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">tag.name</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
</span><span class="hll"> <span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
</span> <span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"></</span><span class="nt">footer</span><span class="p">></span>
<span class="p"></</span><span class="nt">article</span><span class="p">></span>
<span class="hll"> <span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
</pre></div> </div> </div><p>As you can see we can just loop over the page variable <code>articles</code> and read the attributes of object just like we usually do in Python, for example with <code>{{ article.title }}</code> or <code>{{ article.summary }}</code>.</p><p>Jinja filters are very powerful, and as you see I used them twice with dates to give them different formats. One for the internal representation of time <code>{{ article.date | strftime('%Y-%m-%d') }}</code> and one is the more visually pleasant (<code>{{ article.date | strftime('%b %-d, %Y') }}</code>). I don't normally use the American date format, but as the template did I kept it as a good example of what you can do with Jinja filters.</p><p>Last, you can nest for loops in other for loops, as I did with tags. You can see the internal structure of tags <a href="https://docs.getpelican.com/en/latest/themes.html#author-category-tag">in the documentation</a>. Please note that the string representation of tags (and other objects in Pelican) is the attribute <code>name</code>, so we might write</p><div class="code"><div class="content"><div class="highlight"><pre> <span class="cp">{%</span> <span class="k">for</span> <span class="nv">tag</span> <span class="k">in</span> <span class="nv">article.tags</span> <span class="cp">%}</span>
<span class="hll"> <span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">tag</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
</span> <span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
</pre></div> </div> </div><p>and it would return the same output.</p><p>A note about the content of the article. As you can see <code>{{ article.content }}</code> prints the full article, so if you want to print a preview you need to truncate it. Unfortunately you can't just write something like <code>{{ article.content[:600] }}</code> to print 600 characters. Remember that Jinja works on the output of the readers already, which means that <code>article. content</code> is already HTML, and if you arbitrarily truncate it you leave some tags open, which disrupts the rendering of the page.</p><p>At the end of the post I will show you how you can easily solve this in Pelican, once we have a dedicated page for each article.</p><h2 id="pagination-1e5d">Pagination<a class="headerlink" href="#pagination-1e5d" title="Permanent link">¶</a></h2><p>What we did in the previous section prints all the articles in the blog on the same page. This is clearly not acceptable as the number of articles increases, and the standard solution for this is <em>pagination</em>.</p><p>Pelican fully supports it, and if you remember we set it up when we run <code>pelican-quickstart</code>. You can see the current number of articles per page in <code>pelicanconf.py</code></p><div class="code"><div class="content"><div class="highlight"><pre><span class="n">DEFAULT_PAGINATION</span> <span class="o">=</span> <span class="mi">3</span>
</pre></div> </div> </div><p>To leverage pagination we first need to replace the variable <code>articles</code> with <code>articles_page.object_list</code></p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre><span class="hll"> <span class="cp">{%</span> <span class="k">for</span> <span class="nv">article</span> <span class="k">in</span> <span class="nv">articles_page.object_list</span> <span class="cp">%}</span>
</span> <span class="cm"><!-- Post --></span>
<span class="p"><</span><span class="nt">article</span> <span class="na">class</span><span class="o">=</span><span class="s">"post"</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"title"</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.title</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h2</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.summary</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">p</span><span class="p">></span>
</pre></div> </div> </div><p>If you render the page in the browser now you'll see that it shows only the last 3 posts, which corresponds to the value of <code>DEFAULT_PAGINATION</code>. If you want you can try to change it and see it affecting the page.</p><p>To use pagination we also need to configure navigation buttons. The template already includes them after all the articles in the main body.</p><div class="code"><div class="content"><div class="highlight"><pre> <span class="cm"><!-- Pagination --></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"actions pagination"</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">""</span> <span class="na">class</span><span class="o">=</span><span class="s">"disabled button large previous"</span><span class="p">></span>Previous Page<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button large next"</span><span class="p">></span>Next Page<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
</pre></div> </div> </div><p>We have three different cases when it comes to pagination. The first page should grey out or remove the "Previous page" button, the last page should do the same with the "Next page" button, and any page between the two should show both.</p><p>We can achieve it with the following code</p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre> <span class="cm"><!-- Pagination --></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"actions pagination"</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">if</span> <span class="nv">articles_page.has_previous</span><span class="o">()</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"/</span><span class="cp">{{</span> <span class="nv">articles_previous_page.url</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button large previous"</span><span class="p">></span>Previous Page<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">else</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">""</span> <span class="na">class</span><span class="o">=</span><span class="s">"disabled button large previous"</span><span class="p">></span>Previous Page<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endif</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">if</span> <span class="nv">articles_page.has_next</span><span class="o">()</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"/</span><span class="cp">{{</span> <span class="nv">articles_next_page.url</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button large next"</span><span class="p">></span>Next Page<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">else</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"disabled button large next"</span><span class="p">></span>Next Page<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endif</span> <span class="cp">%}</span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
</pre></div> </div> </div><p>The two functions <code>articles_page.has_previous()</code> and <code>articles_page.has_next()</code> can be used to know if there are pages before of after the current one, and if not we can use the class <code>disabled</code> to grey out the button. The link to the previous or next page is provided by <code>articles_previous_page.url</code> and <code>articles_next_page.url</code> respectively. Again, remember that you can learn everything about all these variables reading <a href="https://docs.getpelican.com/en/latest/themes.html">Pelican's documentation on themes</a>.</p><p>Render the page in the browser and you will see that the "Next page" button leads to <a href="http://localhost:8000/index2.html">http://localhost:8000/index2.html</a>, which is the second page of articles. Indeed, if you click it, you will see articles from 17 to 15, and so on until the last page of articles <a href="http://localhost:8001/index7.html">http://localhost:8001/index7.html</a> that contains the first and the secodn articles of the blog (if you generated 20 articles at the beginning as I did).</p><h2 id="slicing-6582">Slicing<a class="headerlink" href="#slicing-6582" title="Permanent link">¶</a></h2><p>The sidebar is usually the best place to show a fixed set of posts like latest ones. We can easily achieve this slicing the list of articles. To do this let's first reduce the number of mini posts to one, so that we can introduce a loop</p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre> <span class="cm"><!-- Mini Posts --></span>
<span class="p"><</span><span class="nt">section</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"mini-posts"</span><span class="p">></span>
<span class="cm"><!-- Mini Post --></span>
<span class="p"><</span><span class="nt">article</span> <span class="na">class</span><span class="o">=</span><span class="s">"mini-post"</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">h3</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span><span class="p">></span>Vitae sed condimentum<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h3</span><span class="p">></span>
<span class="p"><</span><span class="nt">time</span> <span class="na">class</span><span class="o">=</span><span class="s">"published"</span> <span class="na">datetime</span><span class="o">=</span><span class="s">"2015-10-20"</span><span class="p">></span>October 20, 2015<span class="p"></</span><span class="nt">time</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"author"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/avatar.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span> <span class="na">class</span><span class="o">=</span><span class="s">"image"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/pic04.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">article</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">section</span><span class="p">></span>
</pre></div> </div> </div><p>We can show the last 4 articles with this simple sintax</p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre> <span class="cm"><!-- Mini Posts --></span>
<span class="p"><</span><span class="nt">section</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"mini-posts"</span><span class="p">></span>
<span class="hll"> <span class="cp">{%</span> <span class="k">for</span> <span class="nv">article</span> <span class="k">in</span> <span class="nv">articles</span><span class="o">[:</span><span class="m">4</span><span class="o">]</span> <span class="cp">%}</span>
</span> <span class="cm"><!-- Mini Post --></span>
<span class="p"><</span><span class="nt">article</span> <span class="na">class</span><span class="o">=</span><span class="s">"mini-post"</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">h3</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span><span class="p">></span>Vitae sed condimentum<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h3</span><span class="p">></span>
<span class="p"><</span><span class="nt">time</span> <span class="na">class</span><span class="o">=</span><span class="s">"published"</span> <span class="na">datetime</span><span class="o">=</span><span class="s">"2015-10-20"</span><span class="p">></span>October 20, 2015<span class="p"></</span><span class="nt">time</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"author"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/avatar.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span> <span class="na">class</span><span class="o">=</span><span class="s">"image"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/pic04.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">article</span><span class="p">></span>
<span class="hll"> <span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
</span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">section</span><span class="p">></span>
</pre></div> </div> </div><p>which will print the same static article 4 times because we are not using Pelican's variables yet. Applying the same changes we introduced for the main list of articles we finally get</p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre> <span class="cm"><!-- Mini Posts --></span>
<span class="p"><</span><span class="nt">section</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"mini-posts"</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">for</span> <span class="nv">article</span> <span class="k">in</span> <span class="nv">articles</span><span class="o">[:</span><span class="m">4</span><span class="o">]</span> <span class="cp">%}</span>
<span class="cm"><!-- Mini Post --></span>
<span class="p"><</span><span class="nt">article</span> <span class="na">class</span><span class="o">=</span><span class="s">"mini-post"</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">h3</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.title</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h3</span><span class="p">></span>
</span><span class="hll"> <span class="p"><</span><span class="nt">time</span> <span class="na">class</span><span class="o">=</span><span class="s">"published"</span> <span class="na">datetime</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%Y-%m-%d'</span><span class="o">)</span> <span class="cp">}}</span><span class="s">"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%b %-d, %Y'</span><span class="o">)</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">time</span><span class="p">></span>
</span> <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"author"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/avatar.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span> <span class="na">class</span><span class="o">=</span><span class="s">"image"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/pic04.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">article</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">section</span><span class="p">></span>
</pre></div> </div> </div><p>And now you will see the proper titles and dates. Please note that this loop depends on <code>articles</code>, which doesn't change from page to page, so this list will be immutable across the blog.</p><h2 id="pictures-436b">Pictures<a class="headerlink" href="#pictures-436b" title="Permanent link">¶</a></h2><p>The script that created demo articles downloaded an image for each article, that can be used as featured image. There are many way to manage images in a website and I won't dive into that particular subject here. I will only show you a basic way to serve images that are not static, but part of the content.</p><p>The images are stored in <code>content/images</code> and this is also an arbitrary decision. The only thing you need to keep in mind is that Pelican's root is usually set in the directory <code>content</code>, so if you place files elsewhere it might be complicated to reach them.</p><p>Through the automated script that we used to generate content we wrote a specific metadata in each article, with the name of the relative image. For example</p><div class="code"><div class="title"><code>content/post01.markdown</code></div><div class="content"><div class="highlight"><pre>Title: A sample article 01
Date: 2021-03-01
Category: News
Tags: tag6,tag11,tag8
<span class="hll">Image: post01.jpg
</span>Summary: Summary of post 01
[...]
</pre></div> </div> </div><p>This shows you that you can add whatever metadata you want to your posts and have them loaded into the object <code>article</code>. To show the image we just need to load the correct file instead of the placeholder, both in the main list</p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre> <span class="cp">{%</span> <span class="k">for</span> <span class="nv">article</span> <span class="k">in</span> <span class="nv">articles_page.object_list</span> <span class="cp">%}</span>
<span class="cm"><!-- Post --></span>
<span class="p"><</span><span class="nt">article</span> <span class="na">class</span><span class="o">=</span><span class="s">"post"</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"title"</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.title</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h2</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.summary</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"meta"</span><span class="p">></span>
<span class="p"><</span><span class="nt">time</span> <span class="na">class</span><span class="o">=</span><span class="s">"published"</span> <span class="na">datetime</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%Y-%m-%d'</span><span class="o">)</span> <span class="cp">}}</span><span class="s">"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%b %-d, %Y'</span><span class="o">)</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">time</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"author"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"name"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.author</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">span</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/avatar.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span> <span class="na">class</span><span class="o">=</span><span class="s">"image featured"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"images/</span><span class="cp">{{</span> <span class="nv">article.image</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
</span> <span class="cp">{{</span> <span class="nv">article.content</span> <span class="cp">}}</span>
<span class="p"><</span><span class="nt">footer</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"actions"</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button large"</span><span class="p">></span>Continue Reading<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"stats"</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">for</span> <span class="nv">tag</span> <span class="k">in</span> <span class="nv">article.tags</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">tag.name</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"></</span><span class="nt">footer</span><span class="p">></span>
<span class="p"></</span><span class="nt">article</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
</pre></div> </div> </div><p>and in the sidebar</p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre> <span class="cm"><!-- Mini Posts --></span>
<span class="p"><</span><span class="nt">section</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"mini-posts"</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">for</span> <span class="nv">article</span> <span class="k">in</span> <span class="nv">articles</span><span class="o">[:</span><span class="m">4</span><span class="o">]</span> <span class="cp">%}</span>
<span class="cm"><!-- Mini Post --></span>
<span class="p"><</span><span class="nt">article</span> <span class="na">class</span><span class="o">=</span><span class="s">"mini-post"</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">h3</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.title</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h3</span><span class="p">></span>
<span class="p"><</span><span class="nt">time</span> <span class="na">class</span><span class="o">=</span><span class="s">"published"</span> <span class="na">datetime</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%Y-%m-%d'</span><span class="o">)</span> <span class="cp">}}</span><span class="s">"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%b %-d, %Y'</span><span class="o">)</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">time</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"author"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/avatar.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span> <span class="na">class</span><span class="o">=</span><span class="s">"image"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"images/</span><span class="cp">{{</span> <span class="nv">article.image</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
</span> <span class="p"></</span><span class="nt">article</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">section</span><span class="p">></span>
</pre></div> </div> </div><p>Now the page is definitely more appealing!</p><h2 id="advanced-techniques-extend-and-include-3d5d">Advanced techniques: extend and include<a class="headerlink" href="#advanced-techniques-extend-and-include-3d5d" title="Permanent link">¶</a></h2><p>As in everything related to computer programming (and not only) you will soon discover that you are repeating yourself, and one of the top design advice that you should follow says: don't do it.</p><p>Jinja templates provide two tags to help you reduce duplicated code, namely <code>extend</code> and <code>include</code>. To see how they work let's focus our attention on the page of a specific article. When we click on an article we would like to go to a dedicated page, while all links at the moment load the same static page <code>single.html</code>.</p><p>The article page, however, won't be that different from the front page we just created. It might be, as you are free to completely change the style, but in general some elements will be in common, for example the navigation, the sidebar, and the footer.</p><p>In the following section I will show you how to use the Jinja tags <code>extends</code> and <code>include</code> to reuse parts of your theme.</p><h3 id="extends-16a0">Extends</h3><p>Since we want to reuse some parts of the page we need to move them to a "common space". Create a new file called <code>future-imperfect/templates/base.html</code> and move the whole content of <code>index.html</code> into it. Then write this single statement in the now empty <code>index.html</code></p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre><span class="cp">{%</span> <span class="k">extends</span> <span class="s2">"base.html"</span> <span class="cp">%}</span>
</pre></div> </div> </div><p>What we did is to tell Jinja that everything we do in <code>index.html</code> should happen on top of <code>base.html</code> (we'll soon discover the meaning of this "on top"). For now you might think about it as <code>index.html</code> copying the content of <code>base.html</code>.</p><p>The problem with this setup is that <code>base.html</code> doesn't have access to the variables that <code>index.html</code> has access to, as <code>base.html</code> is an arbitrary file and not something known to Pelican. Indeed the command <code>pelican -lr</code> that you are running in a terminal should give you this error</p><div class="code"><div class="content"><div class="highlight"><pre>WARNING: Caught exception:
| "'articles_page' is undefined".
</pre></div> </div> </div><p>Since that is something only <code>index.html</code> can provide we need to do something more than just copying the content of <code>base.html</code>, we need to also fill in some parts, which is exactly what you can do with <code>block</code> (see <a href="https://jinja.palletsprojects.com/en/2.11.x/templates/#template-inheritance">Jinja's documentation on template inheritance</a>).</p><p>Grab the whole content of the <code>&lt;div id="main"&gt;</code> and move it in <code>index.html</code> wrapped in <code>{% block content %}</code></p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre><span class="cp">{%</span> <span class="k">extends</span> <span class="s2">"base.html"</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">block</span> <span class="nv">content</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">for</span> <span class="nv">article</span> <span class="k">in</span> <span class="nv">articles_page.object_list</span> <span class="cp">%}</span>
<span class="cm"><!-- Post --></span>
<span class="p"><</span><span class="nt">article</span> <span class="na">class</span><span class="o">=</span><span class="s">"post"</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"title"</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.title</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h2</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.summary</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"meta"</span><span class="p">></span>
<span class="p"><</span><span class="nt">time</span> <span class="na">class</span><span class="o">=</span><span class="s">"published"</span> <span class="na">datetime</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%Y-%m-%d'</span><span class="o">)</span> <span class="cp">}}</span><span class="s">"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%b %-d, %Y'</span><span class="o">)</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">time</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"author"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"name"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.author</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">span</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/avatar.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span> <span class="na">class</span><span class="o">=</span><span class="s">"image featured"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"images/</span><span class="cp">{{</span> <span class="nv">article.image</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="cp">{{</span> <span class="nv">article.content</span> <span class="cp">}}</span>
<span class="p"><</span><span class="nt">footer</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"actions"</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button large"</span><span class="p">></span>Continue Reading<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"stats"</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">for</span> <span class="nv">tag</span> <span class="k">in</span> <span class="nv">article.tags</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">tag.name</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"></</span><span class="nt">footer</span><span class="p">></span>
<span class="p"></</span><span class="nt">article</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="cm"><!-- Pagination --></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"actions pagination"</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">if</span> <span class="nv">articles_page.has_previous</span><span class="o">()</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"/</span><span class="cp">{{</span> <span class="nv">articles_previous_page.url</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button large previous"</span><span class="p">></span>Previous Page<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">else</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">""</span> <span class="na">class</span><span class="o">=</span><span class="s">"disabled button large previous"</span><span class="p">></span>Previous Page<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endif</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">if</span> <span class="nv">articles_page.has_next</span><span class="o">()</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"/</span><span class="cp">{{</span> <span class="nv">articles_next_page.url</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button large next"</span><span class="p">></span>Next Page<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">else</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"disabled button large next"</span><span class="p">></span>Next Page<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endif</span> <span class="cp">%}</span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endblock</span> <span class="cp">%}</span>
</pre></div> </div> </div><p>At the same time fill the empty space you left in <code>base.html</code> with a call for that block</p><div class="code"><div class="title"><code>future-imperfect/templates/base.html</code></div><div class="content"><div class="highlight"><pre> <span class="cm"><!-- Main --></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">"main"</span><span class="p">></span>
<span class="hll"> <span class="cp">{%</span> <span class="k">block</span> <span class="nv">content</span> <span class="cp">%}{%</span> <span class="k">endblock</span> <span class="cp">%}</span>
</span> <span class="p"></</span><span class="nt">div</span><span class="p">></span>
</pre></div> </div> </div><p>When this is done, reload the page and... nothing should have changed. Sorry, but this is one of those things that happen behind the scenes and that do not have any immediate benefit. However, Pelican shouldn't give you any error on the command line, which is comforting.</p><p>To see the benefit of what we did let's move on and create the page for the single article, <code>article.html</code></p><div class="code"><div class="title"><code>future-imperfect/templates/article.html</code></div><div class="content"><div class="highlight"><pre><span class="cp">{%</span> <span class="k">extends</span> <span class="s2">"base.html"</span> <span class="cp">%}</span>
</pre></div> </div> </div><p>As we said before <code>base.html</code> is supposed to be the common part of all pages, so <code>article.html</code> should start from that as well. To check how that page looks like now let's update the links from the articles in the main body. Open <code>index.html</code> and replace the URLs of each article</p><div class="code"><div class="title"><code>future-imperfect/templates/index.html</code></div><div class="content"><div class="highlight"><pre><span class="p"><</span><span class="nt">article</span> <span class="na">class</span><span class="o">=</span><span class="s">"post"</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"title"</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">h2</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.url</span> <span class="cp">}}</span><span class="s">"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.title</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h2</span><span class="p">></span>
</span> <span class="p"><</span><span class="nt">p</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.summary</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"meta"</span><span class="p">></span>
<span class="p"><</span><span class="nt">time</span> <span class="na">class</span><span class="o">=</span><span class="s">"published"</span> <span class="na">datetime</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%Y-%m-%d'</span><span class="o">)</span> <span class="cp">}}</span><span class="s">"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%b %-d, %Y'</span><span class="o">)</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">time</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"author"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"name"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.author</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">span</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/avatar.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.url</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">class</span><span class="o">=</span><span class="s">"image featured"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"images/</span><span class="cp">{{</span> <span class="nv">article.image</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
</span> <span class="cp">{{</span> <span class="nv">article.content</span> <span class="cp">}}</span>
<span class="p"><</span><span class="nt">footer</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"actions"</span><span class="p">></span>
<span class="hll"> <span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.url</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button large"</span><span class="p">></span>Continue Reading<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
</span> <span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"stats"</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">for</span> <span class="nv">tag</span> <span class="k">in</span> <span class="nv">article.tags</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">tag.name</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"></</span><span class="nt">footer</span><span class="p">></span>
<span class="p"></</span><span class="nt">article</span><span class="p">></span>
</pre></div> </div> </div><p>Now if you render the front page and click on the title of a post in the main column you will end up in the page dedicated to it, which has a worrying empty central column! Well, after all <code>article.html</code> extends <code>base.html</code> but doesn't provide anything for the block <code>content</code>, so let's fix this</p><div class="code"><div class="title"><code>future-imperfect/templates/article.html</code></div><div class="content"><div class="highlight"><pre><span class="cp">{%</span> <span class="k">extends</span> <span class="s2">"base.html"</span> <span class="cp">%}</span>
<span class="cp">{%</span> <span class="k">block</span> <span class="nv">content</span> <span class="cp">%}</span>
<span class="cm"><!-- Post --></span>
<span class="p"><</span><span class="nt">article</span> <span class="na">class</span><span class="o">=</span><span class="s">"post"</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"title"</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.url</span> <span class="cp">}}</span><span class="s">"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.title</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h2</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.summary</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"meta"</span><span class="p">></span>
<span class="p"><</span><span class="nt">time</span> <span class="na">class</span><span class="o">=</span><span class="s">"published"</span> <span class="na">datetime</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%Y-%m-%d'</span><span class="o">)</span> <span class="cp">}}</span><span class="s">"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%b %-d, %Y'</span><span class="o">)</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">time</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"author"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"name"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.author</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">span</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/avatar.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.url</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">class</span><span class="o">=</span><span class="s">"image featured"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"images/</span><span class="cp">{{</span> <span class="nv">article.image</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="cp">{{</span> <span class="nv">article.content</span> <span class="cp">}}</span>
<span class="p"><</span><span class="nt">footer</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"actions"</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.url</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button large"</span><span class="p">></span>Continue Reading<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"stats"</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">for</span> <span class="nv">tag</span> <span class="k">in</span> <span class="nv">article.tags</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">tag.name</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"></</span><span class="nt">footer</span><span class="p">></span>
<span class="p"></</span><span class="nt">article</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endblock</span> <span class="cp">%}</span>
</pre></div> </div> </div><p>I copied the content of the block from the loop in <code>index.html</code>. In this simple example they have the same code, but in a real production environment you might want to differentiate them. As you can see <code>article.html</code> provides the variable <code>article</code> out of the box. Please note that the page of each article doesn't show the pagination buttons, as they are not part of <code>article.html</code>.</p><h3 id="include-fe56">Include</h3><p>When you write a theme there are often snippets of code that you might need to repeat in different parts of the whole site. The sidebar is typically a good example of a collection of such snippets, like the "About" section or the list of latest posts that you might want to reuse somewhere else.</p><p>Jinja provides the tag <code>include</code> that allows you to easily inject the content of another template file into the current one. Let's see how it works moving the list of latest posts to a separate file.</p><p>Move the content of the mini posts section from <code>base.html</code> to a file called <code>templates/includes/latest_posts.html</code></p><div class="code"><div class="title"><code>templates/includes/latest_posts.html</code></div><div class="content"><div class="highlight"><pre><span class="cm"><!-- Mini Posts --></span>
<span class="p"><</span><span class="nt">section</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"mini-posts"</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">for</span> <span class="nv">article</span> <span class="k">in</span> <span class="nv">articles</span><span class="o">[:</span><span class="m">4</span><span class="o">]</span> <span class="cp">%}</span>
<span class="cm"><!-- Mini Post --></span>
<span class="p"><</span><span class="nt">article</span> <span class="na">class</span><span class="o">=</span><span class="s">"mini-post"</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">h3</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.title</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h3</span><span class="p">></span>
<span class="p"><</span><span class="nt">time</span> <span class="na">class</span><span class="o">=</span><span class="s">"published"</span> <span class="na">datetime</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%Y-%m-%d'</span><span class="o">)</span> <span class="cp">}}</span><span class="s">"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%b %-d, %Y'</span><span class="o">)</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">time</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"author"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/avatar.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"single.html"</span> <span class="na">class</span><span class="o">=</span><span class="s">"image"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"images/</span><span class="cp">{{</span> <span class="nv">article.image</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">article</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">section</span><span class="p">></span>
</pre></div> </div> </div><p>And replace it with a call to <code>include</code> in <code>base.html</code></p><div class="code"><div class="title"><code>templates/includes/base.html</code></div><div class="content"><div class="highlight"><pre> <span class="cm"><!-- Intro --></span>
<span class="p"><</span><span class="nt">section</span> <span class="na">id</span><span class="o">=</span><span class="s">"intro"</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"logo"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/logo.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">></span><span class="cp">{{</span> <span class="nv">SITENAME</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">h2</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span><span class="cp">{{</span> <span class="nv">SITESUBTITLE</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="p"></</span><span class="nt">section</span><span class="p">></span>
<span class="hll"> <span class="cp">{%</span> <span class="k">include</span> <span class="s1">'includes/latest_posts.html'</span> <span class="cp">%}</span>
</span>
<span class="cm"><!-- About --></span>
<span class="p"><</span><span class="nt">section</span> <span class="na">class</span><span class="o">=</span><span class="s">"blurb"</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">></span>About<span class="p"></</span><span class="nt">h2</span><span class="p">></span>
<span class="p"><</span><span class="nt">p</span><span class="p">></span>Mauris neque quam, fermentum ut nisl vitae, convallis maximus nisl. Sed mattis nunc id lorem euismod amet placerat. Vivamus porttitor magna enim, ac accumsan tortor cursus at phasellus sed ultricies.<span class="p"></</span><span class="nt">p</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"actions"</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button"</span><span class="p">></span>Learn More<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"></</span><span class="nt">section</span><span class="p">></span>
</pre></div> </div> </div><p>Again, reloading the page won't show anything different, but now we isolated a piece of code that shows the latest posts and we can reuse it in another part of the site which is not the sidebar. You can find the documentation of <code>include</code> at <a href="https://jinja.palletsprojects.com/en/2.11.x/templates/#include">https://jinja.palletsprojects.com/en/2.11.x/templates/#include</a>.</p><h3 id="deep-dive-e364">Deep dive</h3><p>What is the difference between <code>extends</code> and <code>include</code>? Which one shall you use?</p><p>The difference between extending and including is the same difference that lies between a <em>framework</em> and a <em>library</em> in the context of programming languages. A framework provides the bulk of the system, and we provide the code for specific operations, while a library provides specific code that we need to put in a bigger picture. This about a Web framework like Django or Flask: they already "work" in the background, but until you provide the endpoints they don't do anything specific. At the same time, when you need to encrypt a password you import a library and use its functions.</p><p>In the same way <code>extends</code> means that another template is providing the main part of the page and that we provide the content of the blocks. When we use <code>include</code> the opposite happens, we get a specific snippet of code and insert it in some wider context.</p><p>As there are no limitations to the number of lines contained in a block or in an imported snippet, there is a certain amount of overlap between the two. Sometimes it might not be immediately clear which one to use, but in the vast majority of cases it should be straightforward.</p><h2 id="preview-articles-5d7d">Preview articles<a class="headerlink" href="#preview-articles-5d7d" title="Permanent link">¶</a></h2><p>Earlier in the post I mentioned a way to create the preview of articles, so now that we have two different pages it makes sense to implement that.</p><p>As I said before the preview of an article has to be created before the content is converted into HTML, or we might leave open tags. Pelican supports this mechanism out of the box, through the <code>summary</code> metadata. If you specify a summary (as I did through the random generation script), Pelican won't do anything, but id the summary is not present it will be initialised with a preview of the article.</p><p>The two variables involved in this process are <code>SUMMARY_MAX_LENGTH</code> and <code>SUMMARY_END_SUFFIX</code> (see <a href="https://docs.getpelican.com/en/latest/settings.html">the documentation</a>), but we can accept the default values for our little project.</p><p>To see the summary in action let's open the last post (<code>post20.markdown</code>) and remove the line <code>Summary: Summary of post 20</code>. Then we need to adjust the code of the page to properly display the summary as content.</p><div class="code"><div class="title"><code>templates/includes/index.html</code></div><div class="content"><div class="highlight"><pre><span class="cp">{%</span> <span class="k">for</span> <span class="nv">article</span> <span class="k">in</span> <span class="nv">articles_page.object_list</span> <span class="cp">%}</span>
<span class="cm"><!-- Post --></span>
<span class="p"><</span><span class="nt">article</span> <span class="na">class</span><span class="o">=</span><span class="s">"post"</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"title"</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.url</span> <span class="cp">}}</span><span class="s">"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.title</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h2</span><span class="p">></span>
<span class="hll">
</span> <span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"meta"</span><span class="p">></span>
<span class="p"><</span><span class="nt">time</span> <span class="na">class</span><span class="o">=</span><span class="s">"published"</span> <span class="na">datetime</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%Y-%m-%d'</span><span class="o">)</span> <span class="cp">}}</span><span class="s">"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%b %-d, %Y'</span><span class="o">)</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">time</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"author"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"name"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.author</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">span</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span><span class="cp">}}</span><span class="s">/theme/images/avatar.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.url</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">class</span><span class="o">=</span><span class="s">"image featured"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"images/</span><span class="cp">{{</span> <span class="nv">article.image</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="hll"> <span class="cp">{{</span> <span class="nv">article.summary</span> <span class="cp">}}</span>
</span> <span class="p"><</span><span class="nt">footer</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"actions"</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.url</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button large"</span><span class="p">></span>Continue Reading<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"stats"</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">for</span> <span class="nv">tag</span> <span class="k">in</span> <span class="nv">article.tags</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">tag.name</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"></</span><span class="nt">footer</span><span class="p">></span>
<span class="p"></</span><span class="nt">article</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
</pre></div> </div> </div><p>In the article page, instead we want the content to be displayed, so we just need to remove the summary (which at this point is not a single line that can be easily displayed under the title.</p><div class="code"><div class="title"><code>templates/includes/article.html</code></div><div class="content"><div class="highlight"><pre><span class="cp">{%</span> <span class="k">block</span> <span class="nv">content</span> <span class="cp">%}</span>
<span class="cm"><!-- Post --></span>
<span class="p"><</span><span class="nt">article</span> <span class="na">class</span><span class="o">=</span><span class="s">"post"</span><span class="p">></span>
<span class="p"><</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"title"</span><span class="p">></span>
<span class="p"><</span><span class="nt">h2</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.url</span> <span class="cp">}}</span><span class="s">"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.title</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">h2</span><span class="p">></span>
<span class="hll">
</span> <span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">"meta"</span><span class="p">></span>
<span class="p"><</span><span class="nt">time</span> <span class="na">class</span><span class="o">=</span><span class="s">"published"</span> <span class="na">datetime</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%Y-%m-%d'</span><span class="o">)</span> <span class="cp">}}</span><span class="s">"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.date</span> <span class="o">|</span> <span class="nf">strftime</span><span class="o">(</span><span class="s1">'%b %-d, %Y'</span><span class="o">)</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">time</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span> <span class="na">class</span><span class="o">=</span><span class="s">"author"</span><span class="p">><</span><span class="nt">span</span> <span class="na">class</span><span class="o">=</span><span class="s">"name"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">article.author</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">span</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">SITEURL</span> <span class="cp">}}</span><span class="s">/theme/images/avatar.jpg"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p"></</span><span class="nt">header</span><span class="p">></span>
<span class="p"><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.url</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">class</span><span class="o">=</span><span class="s">"image featured"</span><span class="p">><</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"images/</span><span class="cp">{{</span> <span class="nv">article.image</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">alt</span><span class="o">=</span><span class="s">""</span> <span class="p">/></</span><span class="nt">a</span><span class="p">></span>
<span class="cp">{{</span> <span class="nv">article.content</span> <span class="cp">}}</span>
<span class="p"><</span><span class="nt">footer</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"actions"</span><span class="p">></span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"</span><span class="cp">{{</span> <span class="nv">article.url</span> <span class="cp">}}</span><span class="s">"</span> <span class="na">class</span><span class="o">=</span><span class="s">"button large"</span><span class="p">></span>Continue Reading<span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"><</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">"stats"</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">for</span> <span class="nv">tag</span> <span class="k">in</span> <span class="nv">article.tags</span> <span class="cp">%}</span>
<span class="p"><</span><span class="nt">li</span><span class="p">><</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">"#"</span><span class="p">></span><span class="cp">{{</span> <span class="nv">tag.name</span> <span class="cp">}}</span><span class="p"></</span><span class="nt">a</span><span class="p">></</span><span class="nt">li</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endfor</span> <span class="cp">%}</span>
<span class="p"></</span><span class="nt">ul</span><span class="p">></span>
<span class="p"></</span><span class="nt">footer</span><span class="p">></span>
<span class="p"></</span><span class="nt">article</span><span class="p">></span>
<span class="cp">{%</span> <span class="k">endblock</span> <span class="cp">%}</span>
</pre></div> </div> </div>
<div class="advertisement">
<a href="https://www.thedigitalcat.academy/freebie-first-class-objects">
<img src="/images/first-class-objects/cover.jpg" />
</a>
<div class="body">
<h2 id="first-class-objects-in-python-fffa">First-class objects in Python<a class="headerlink" href="#first-class-objects-in-python-fffa" title="Permanent link">¶</a></h2>
<p>Higher-order functions, wrappers, and factories</p>
<p>Learn all you need to know to understand first-class citizenship in Python, the gateway to grasp how decorators work and how functional programming can supercharge your code.</p>
<div class="actions">
<a class="action" href="https://www.thedigitalcat.academy/freebie-first-class-objects">Get your FREE copy</a>
</div>
</div>
</div>
<h2 id="next-steps-5961">Next steps<a class="headerlink" href="#next-steps-5961" title="Permanent link">¶</a></h2><p>As I promised, I showed you how to use Jinja tags and Pelican variables, and we went from a purely static template to a rich dynamic one. There are several things that you might want to investigate and implement now:</p><ul><li>Both <code>index.html</code> and <code>article.html</code> contain a great amount of common code (the part that displays the article). Try to work on that to isolate the shared code and remove the duplication. Remember that they are not exactly the same, so you might want to investigate the tag <code>with</code> provided by Jinja.</li><li>Tags and categories. Pelican creates pages with all the articles that belong to a specific category or tag, and they work exactly like the index page. The documentation of the variables available in those pages can be found at <a href="https://docs.getpelican.com/en/latest/themes.html#category-html">https://docs.getpelican.com/en/latest/themes.html#category-html</a> and <a href="https://docs.getpelican.com/en/latest/themes.html#tag-html">https://docs.getpelican.com/en/latest/themes.html#tag-html</a>. You should try to create a page that lists all the articles with a certain tag and connect it to the tag button at the end of each article.</li><li>Pelican can create pages of content (as opposed to articles), and you might want to show those in the navigation bar. You can find the documentation of the <code>Page</code> objects at <a href="https://docs.getpelican.com/en/latest/themes.html#page">https://docs.getpelican.com/en/latest/themes.html#page</a>. Create an "About me" page and link it there.</li><li>If you want to publish the website you probably want to minify assets to save bandwidth and reduce the loading time. Have a look at <code>webassets</code> (<a href="https://github.com/pelican-plugins/webassets">https://github.com/pelican-plugins/webassets</a>), a plugin that makes assets management as easy as pie.</li><li>Speaking of plugins, Pelican has a huge number of them. The developers are in the process of transitioning them to a new better format, so you will find the main plugins at <a href="https://github.com/pelican-plugins">https://github.com/pelican-plugins</a> and the rest at <a href="https://github.com/getpelican/pelican-plugins">https://github.com/getpelican/pelican-plugins</a>. I recommend having a look at <code>sitemap</code>, which is very easy to setup and provides an important tool for search engines to discover your site.</li><li>When it comes to deploying the static website you should definitely read <a href="https://docs.getpelican.com/en/latest/publish.html">Pelican's documentation</a>, as the management scripts support several backends.</li></ul><p>Last, but hopefully not least, the code of this blog is public and can be found at <a href="https://github.com/TheDigitalCatOnline/blog_source">https://github.com/TheDigitalCatOnline/blog_source</a>. While the overall setup contains a bit of legacy code (I started a while ago and many things have changed in the meanwhile), the theme <code>editorial</code> is pretty new, so feel free to have a look at it and to get inspired (<a href="https://github.com/TheDigitalCatOnline/blog_source/tree/master/editorial">https://github.com/TheDigitalCatOnline/blog_source/tree/master/editorial</a>).</p><p>I hope this was useful. Remember that Pelican is open source, and you can always get in touch with the maintainers to ask questions or to submit enhancements. Also remember to drop a line on Twitter to the author of any template you will use (thanks are always welcome) and to link to the source on the website. Happy blogging!</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>AWS Log Insights as CloudWatch metrics with Python and Terraform2021-03-22T17:00:00+01:002021-03-22T17:00:00+01:00Leonardo Giordanitag:www.thedigitalcatonline.com,2021-03-22:/blog/2021/03/22/aws-log-insights-as-cloudwatch-metrics-with-python-and-terraform/<p> A step-by-step report on how to build a Lambda function with Terraform and Python to convert Log Insights queries into CloudWatch metrics</p><p>Recently I started using <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html">AWS CloudWatch Log Insights</a> and I find the tool really useful to extract data about the systems I'm running without having to set up dedicated monitoring tools, which come with their own set of permissions, rules, configuration language, and so forth.</p><p>Log Insights allow you to query log outputs with a language based on regular expressions with hints of SQL and to produce tables or graphs of quantities that you need to monitor. For example, the system I am monitoring runs Celery in ECS containers that log received tasks with a line like the following</p><div class="code"><div class="content"><div class="highlight"><pre>16:39:11,156 [32mINFO [0m [34m[celery.worker.strategy][0m [01mReceived task: lib.tasks.lists.trigger_list_log_notification[9b33b464-d4f9-4909-8d4e-1a3134fead97] [0m
</pre></div> </div> </div><p>In this case the specific function in the system that was triggered is <code>lib.tasks.log_notification</code>, and I'm interested in knowing which functions are called the most, so I can easily count them with</p><div class="code"><div class="content"><div class="highlight"><pre>parse @message /\[celery\.(?<source>[a-z.]+)\].*Received task: (?<task>[a-z._]+)\[/
| filter not isblank(source)
| stats count(*) as number by task
| sort number desc
| limit 9
</pre></div> </div> </div><p>This gives me a nice table of the top 9 <code>source</code> functions and the number of <code>task</code> submitted for each, and the time frame can be adjusted with the usual CloudWatch controls</p><div class="code"><div class="content"><div class="highlight"><pre>1 lib.tasks.lists.trigger_list_log_notification 4559
2 lib.tasks.notify.notify_recipient 397
3 lib.message._send_mobile_push_notification 353
4 lib.tasks.jobs.check_job_cutoffs 178
5 lib.tasks.notify.check_message_cutoffs 177
6 lib.tasks.notify.check_notification_retry 177
7 lib.tasks.notify.async_list_response 81
8 lib.tasks.hmrc_poll.govtalk_periodic_poll 59
9 lib.tasks.lists.recalculate_list_entry 56
</pre></div> </div> </div><p>Using time bins, quantities can also be easily plotted. For example, I can process and visualise the number of received tasks with</p><div class="code"><div class="content"><div class="highlight"><pre>parse @message /\[celery\.(?<source>[a-z.]+)\].*Received task: (?<task>[a-z._]+)\[/
| filter not isblank(source)
| stats count(*) by bin(30s)
</pre></div> </div> </div><p>Unfortunately I quickly discovered an important limitation of Log Insights, that is <strong>queries are not metrics</strong>. Which also immediately implies that I can't set up alarms on those queries. As fun as it is to look at nice plots, I need something automatic that sends me messages or scales up systems in reaction to specific events such as "too many submitted tasks".</p><p>The standard solution to this problem suggested by AWS is to write a Lambda that runs the query and stores the value into a custom CloudWatch metric, which I can then use to satisfy my automation needs. I did it, and in this post I will show you exactly how, using Terraform, Python and Zappa, CloudWatch, and DynamoDB. At the end of the post I will also briefly discuss the cost of the solution.</p><h2 id="the-big-picture-f6bc">The big picture<a class="headerlink" href="#the-big-picture-f6bc" title="Permanent link">¶</a></h2><p>Before I get into the details of the specific tools or solutions that I decided to implement, let me have a look at the bigger picture. The initial idea is very simple: a Lambda function can run a specific Log Insights query and store the results in a custom metric, which can in turn be used to trigger alarms and other actions.</p><p>For a single system I already have 4 or 5 of these queries that I'd like to run, and I have multiple systems, so I'd prefer to have a solution that doesn't require me to deploy and maintain a different Lambda for each query. The maintenance can be clearly automated as well, but such a solution smells of duplicated code miles away, and if there is no specific reason to go down that road I prefer to avoid it.</p><p>Since Log Insights queries are just strings of code, however, we can store them somewhere and then simply loop on all of them within the same Lambda function. To implement this, I created a DynamoDB table and every element contains all the data I need to run each query, such as the log group that I want to investigate and the name of the target metric.</p><h2 id="terraform-a3cb">Terraform<a class="headerlink" href="#terraform-a3cb" title="Permanent link">¶</a></h2><p>In the following sections I will discuss the main components of the solution from the infrastructural point of view, showing how I created them with Terraform. The four main AWS services that I will use are: <a href="https://aws.amazon.com/dynamodb/">DynamoDB</a>, <a href="https://aws.amazon.com/lambda/">Lambda</a>, <a href="https://aws.amazon.com/iam/">IAM</a>, <a href="https://aws.amazon.com/cloudwatch/">CloudWatch</a>.</p><p>I put the bulk of the code in a module so that I can easily create the same structure for multiple AWS accounts. While my current setup is a bit more complicated that that, the structure of the code can be simplified as</p><div class="code"><div class="content"><div class="highlight"><pre>+ common
+ lambda-loginsights2metrics
+ cloudwatch.tf
+ dynamodb.tf
+ iam.tf
+ lambda.tf
+ variables.tf
+ account1
+ lambda-loginsights2metrics
+ main.tf
+ variables.tf
</pre></div> </div> </div><h3 id="variables-7edf">Variables</h3><p>Since I will refer to them in the following sections, let me show you the four variables I defined for this module.</p><p>First I need to receive the items that I need to store in the DynamoDB table</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/variables.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">variable</span><span class="w"> </span><span class="nv">"items"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"list"</span>
<span class="w"> </span><span class="na">default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[]</span>
<span class="p">}</span>
</pre></div> </div> </div><p>I prefer to have a prefix in front of my components that allows me to duplicate them without clashes</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/variables.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">variable</span><span class="w"> </span><span class="nv">"prefix"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"string"</span>
<span class="w"> </span><span class="na">default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"loginsights2metrics"</span>
<span class="p">}</span>
</pre></div> </div> </div><p>The Lambda function will require a list of security groups that grant access to specific network components</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/variables.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">variable</span><span class="w"> </span><span class="nv">"security_groups"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"list"</span>
<span class="w"> </span><span class="na">default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[]</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Finally, Lambda functions need to be told which VPC subnets they can use to run</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/variables.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">variable</span><span class="w"> </span><span class="nv">"vpc_subnets"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"list"</span>
<span class="w"> </span><span class="na">default</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[]</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://www.terraform.io/docs/configuration-0-11/variables.html">Terraform variables</a>.</li><li>An <a href="https://spacelift.io/blog/how-to-use-terraform-variables">in-depth post</a> that explains how to use variables in Terraform, by Sumeet Ninawe</li></ul><h3 id="dynamodb-55e8">DynamoDB</h3><p>Let's start with the corner stone, which is the DynamoDB table that contains data for the queries. As DynamoDB is not a SQL database we don't need to define columns in advance. This clearly might get us into trouble later, so we need to be careful and be consistent when we write items, adding everything is needed by the Lambda code.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/dynamodb.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_dynamodb_table"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.prefix}-items"</span>
<span class="w"> </span><span class="na">billing_mode</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"PAY_PER_REQUEST"</span>
<span class="w"> </span><span class="na">hash_key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"SlotName"</span>
<span class="w"> </span><span class="nb">attribute</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"SlotName"</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"S"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Speaking of items, I assume I will pass them when I call the module, so here I just need to loop on the input variable <code>items</code></p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/dynamodb.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_dynamodb_table_item"</span><span class="w"> </span><span class="nv">"item"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">count</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="nv">var.items</span><span class="p">)</span>
<span class="w"> </span><span class="na">table_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_dynamodb_table.loginsights2metrics.name</span>
<span class="w"> </span><span class="na">hash_key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_dynamodb_table.loginsights2metrics.hash_key</span>
<span class="w"> </span><span class="na">item</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">jsonencode</span><span class="p">(</span><span class="nf">element</span><span class="p">(</span><span class="nv">var.items</span><span class="p">,</span><span class="w"> </span><span class="nv">count.index</span><span class="p">))</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Since the query is written as a Terraform string and will be read from Python there are two small caveats here. To be consistent with Terraform's syntax we need to escape double quotes in the query, and to avoid fights with Python we need to escape backslashes. So for example a valid query like</p><div class="code"><div class="content"><div class="highlight"><pre>parse @message /\[celery\.(?<source>[a-z.]+)\].*Received task: (?<task>[a-z._]+)\[/
| filter not isblank(source)
| stats count(*) as Value by bin(1m)
</pre></div> </div> </div><p>will be stored as</p><div class="code"><div class="content"><div class="highlight"><pre>"parse @message /\\[celery\\.(?<source>[a-z.]+)\\].*Received task: (?<task>[a-z._]+)\\[/ | filter not isblank(source) | stats count(*) as Value by bin(1m)"
</pre></div> </div> </div><p>Another remark is that the Lambda I will write in Python will read data plotted with the name <code>Value</code> on bins of 1 minute, so the query should end with <code>stats X as Value by bin(1m)</code> where <code>X</code> is a specific stat, for example <code>stats count(*) as Value by bin(1m)</code>.</p><p>The reason behind 1 minute is that the maximum standard resolution of CloudWatch metrics is 1 minute. Should you want more you need to have a look at <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html#high-resolution-metrics">CloudWatch High-Resolution Metrics</a>.</p><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://aws.amazon.com/dynamodb/">Amazon DynamoDB</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dynamodb_table">aws_dynamodb_table documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dynamodb_table_item">aws_dynamodb_table_item documentation</a></li></ul><h3 id="iam-part-1-cde2">IAM part 1</h3><p>IAM roles are central in AWS. In this specific case we have the so-called <a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-intro-execution-role.html">Lambda execution role</a>, which is the IAM role that the Lambda assumes when you run it. In AWS users or services (that is humans or AWS components) <em>assume</em> a role, receiving the permissions connected with it. To assume roles, however, they need to have a specific permission, a so-called <em>trust policy</em>.</p><p>Let's define a trust policy that allows the Lambda service to assume the role that we will define</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"aws_iam_policy_document"</span><span class="w"> </span><span class="nv">"trust"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">statement</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">actions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s2">"sts:AssumeRole"</span><span class="p">]</span>
<span class="w"> </span><span class="nb">principals</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Service"</span>
<span class="w"> </span><span class="na">identifiers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s2">"lambda.amazonaws.com"</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>and after that the role in question</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_iam_role"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.prefix</span>
<span class="w"> </span><span class="na">assume_role_policy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">data.aws_iam_policy_document.trust.json</span>
<span class="p">}</span>
</pre></div> </div> </div><p>To run, Lambdas need an initial set of permissions which can be found in the canned policy <code>AWSLambdaVPCAccessExecutionRole</code>. You can see the content of the policy in the IAM console or dumping it with <code>aws iam get-policy</code> and <code>aws iam get-policy-version</code></p><div class="code"><div class="content"><div class="highlight"><pre>$ aws iam get-policy --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
{
"Policy": {
"PolicyName": "AWSLambdaVPCAccessExecutionRole",
"PolicyId": "ANPAJVTME3YLVNL72YR2K",
"Arn": "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole",
"Path": "/service-role/",
"DefaultVersionId": "v2",
"AttachmentCount": 0,
"PermissionsBoundaryUsageCount": 0,
"IsAttachable": true,
"Description": "Provides minimum permissions for a Lambda function to execute while accessing a resource within a VPC - create, describe, delete network interfaces and write permissions to CloudWatch Logs. ",
"CreateDate": "2016-02-11T23:15:26Z",
"UpdateDate": "2020-10-15T22:53:03Z"
}
}
$ aws iam get-policy-version --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole --version-id v2
{
"PolicyVersion": {
"Document": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"ec2:CreateNetworkInterface",
"ec2:DescribeNetworkInterfaces",
"ec2:DeleteNetworkInterface",
"ec2:AssignPrivateIpAddresses",
"ec2:UnassignPrivateIpAddresses"
],
"Resource": "*"
}
]
},
"VersionId": "v2",
"IsDefaultVersion": true,
"CreateDate": "2020-10-15T22:53:03Z"
}
}
</pre></div> </div> </div><p>Attaching a canned policy is just a matter of creating a specific <code>aws_iam_role_policy_attachment</code> resource</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_iam_role_policy_attachment"</span><span class="w"> </span><span class="nv">"loginsights2metrics-"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_iam_role.loginsights2metrics.name</span>
<span class="w"> </span><span class="na">policy_arn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Now that we have the IAM role and the basic policy we can assign custom permissions to it. We need to grant the Lambda permissions on other AWS components, namely CloudWatch to run Log Insights queries and to store metrics and DynamoDB to retrieve all the items from the queries table.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"aws_iam_policy_document"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">statement</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">actions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s2">"cloudwatch:PutMetricData"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"cloudwatch:PutMetricAlarm"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"logs:StartQuery"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"logs:GetQueryResults"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"logs:GetLogEvents"</span><span class="p">,</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="na">resources</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s2">"*"</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">statement</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">actions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s2">"dynamodb:Scan"</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="na">resources</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="nv">aws_dynamodb_table.loginsights2metrics.arn</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Through <code>aws_iam_role_policy</code> we can create and assign the policy out of a <code>data</code> structure</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_iam_role_policy"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.prefix</span>
<span class="w"> </span><span class="na">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_iam_role.loginsights2metrics.name</span>
<span class="w"> </span><span class="na">policy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">data.aws_iam_policy_document.loginsights2metrics.json</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document">aws_iam_policy_document documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role">aws_iam_role documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment">aws_iam_role_policy_attachment documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy">aws_iam_role_policy documentation</a></li><li><a href="https://docs.aws.amazon.com/cli/latest/reference/iam/get-policy.html">AWS CLI iam get-policy documentation</a></li><li><a href="https://docs.aws.amazon.com/cli/latest/reference/iam/get-policy-version.html">AWS CLI iam get-policy-version documentation</a></li></ul><h3 id="lambda-0ea2">Lambda</h3><p>We can now create the Lambda function container. I do not use Terraform as a deployer, as I think it should be used to define static infrastructure only, so I will use a dummy function here and later deploy the real code using the AWS CLI.</p><p>The dummy function can be easily created with</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/lambda.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"archive_file"</span><span class="w"> </span><span class="nv">"dummy"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"zip"</span>
<span class="w"> </span><span class="na">output_path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${path.module}/lambda.zip"</span>
<span class="w"> </span><span class="nb">source</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">content</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"dummy"</span>
<span class="w"> </span><span class="na">filename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"dummy.txt"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>The Lambda function is a bit more complicated. As I mentioned, I'll use Zappa to package the function, so the <code>handler</code> has to be <code>"zappa.handler.lambda_handler"</code>. The IAM role given to the function is the one we defined previously, while <code>memory_size</code> and <code>timeout</code> clearly depend on the specific function. Lambdas should run in private networks, and I won't cover here the steps to create them. The AWS docs contains a lot of details on this topic, e.g. <a href="https://aws.amazon.com/premiumsupport/knowledge-center/internet-access-lambda-function/">https://aws.amazon.com/premiumsupport/knowledge-center/internet-access-lambda-function/</a>.</p><p>The environment variables allow me to inject the name of the DynamoDB table so that I don't need to hardcode it. I also pass another variable, the <a href="https://sentry.io/welcome/">Sentry DSN</a> that I use in my configuration. This is not essential for the problem at hand, but I left it there to show how to pass such values.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/lambda.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_lambda_function"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">function_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"loginsights2metrics"</span>
<span class="w"> </span><span class="na">handler</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"zappa.handler.lambda_handler"</span>
<span class="w"> </span><span class="na">runtime</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"python3.8"</span>
<span class="w"> </span><span class="na">filename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">data.archive_file.dummy.output_path</span>
<span class="w"> </span><span class="na">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_iam_role.loginsights2metrics.arn</span>
<span class="w"> </span><span class="na">memory_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">128</span>
<span class="w"> </span><span class="na">timeout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">300</span>
<span class="w"> </span><span class="nb">vpc_config</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">subnet_ids</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.vpc_subnets</span>
<span class="w"> </span><span class="na">security_group_ids</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.security_groups</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">environment</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">variables</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"SENTRY_DSN"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"https://XXXXXX:@sentry.io/YYYYYY"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"DYNAMODB_TABLE"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_dynamodb_table.loginsights2metrics.name</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">lifecycle</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">ignore_changes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="nb">last_modified, filename</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Please note that I instructed Terraform to ignore changes to the two attributes <code>last_modified</code> and <code>filename</code>, and that I haven't used any <code>source_code_hash</code>. This way I can safely apply Terraform to change parameters like <code>memory_size</code> or <code>timeout</code> without affecting what I deployed with the CI.</p><p>Since I want to trigger the function from AWS CloudWatch Events I need to grant the service <code>events.amazonaws.com</code> the <code>lambda:InvokeFunction</code> permission.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/lambda.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_lambda_permission"</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">statement_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"AllowExecutionFromCloudWatch"</span>
<span class="w"> </span><span class="na">action</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lambda:InvokeFunction"</span>
<span class="w"> </span><span class="na">function_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_lambda_function.loginsights2metrics.function_name</span>
<span class="w"> </span><span class="na">principal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"events.amazonaws.com"</span>
<span class="w"> </span><span class="na">source_arn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_cloudwatch_event_rule.rate.arn</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/archive/latest/docs/data-sources/archive_file">archive_file documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function">aws_lambda_function documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_permission">aws_lambda_permission documentation</a></li></ul><h3 id="iam-part-2-7f1e">IAM part 2</h3><p>Since 2018 Lambdas have a maximum execution time of 15 minutes (900 seconds), which is more than enough for many services, but to be conservative I preferred to leverage Zappa's asynchronous calls and to make the main Lambda call itself for each query. The Lambda doesn't clearly call the same Python function (it's not recursive), but from AWS's point of view we have a Lambda that calls itself, so we need to give it a specific permission to do this.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"aws_iam_policy_document"</span><span class="w"> </span><span class="nv">"loginsights2metrics_exec"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">statement</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">actions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s2">"lambda:InvokeAsync"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"lambda:InvokeFunction"</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="na">resources</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="nv">aws_lambda_function.loginsights2metrics.arn</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>I could not define this when I defined the rest of the IAM components because this needs the Lambda to be defined, but the resource is in the same file. Terraform doesn't care about which resource we defined first and where we define it as long as there are no loops in the definitions.</p><p>We can now assign the newly created policy document to the IAM role we created previously</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/iam.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_iam_role_policy"</span><span class="w"> </span><span class="nv">"loginsights2metrics_exec"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.prefix}-exec"</span>
<span class="w"> </span><span class="na">role</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_iam_role.loginsights2metrics.name</span>
<span class="w"> </span><span class="na">policy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">data.aws_iam_policy_document.loginsights2metrics_exec.json</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document">aws_iam_policy_document documentation</a> documentation")</li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy">aws_iam_role_policy documentation</a></li></ul><h3 id="cloudwatch-518e">CloudWatch</h3><p>Whenever you need to run Lambdas (or other things) periodically, the standard AWS solution is to use CloudWatch Events, which work as the AWS cron system. CloudWatch Events are made of rules and targets, so first of all I defined a rule that gets triggered every 2 minutes</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/cloudwatch.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_cloudwatch_event_rule"</span><span class="w"> </span><span class="nv">"rate"</span><span class="w"> </span><span class="p">{</span>
<span class="c1"> # Zappa requires the name to match the processing function</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"main.loginsights2metrics"</span>
<span class="w"> </span><span class="na">description</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Trigger Lambda ${var.prefix}"</span>
<span class="w"> </span><span class="na">schedule_expression</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"rate(2 minutes)"</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Please note that Zappa has a specific requirement for CloudWatch Events, so I left a comment to clarify this to my future self. The second part of the event is the target, which is the Lambda function that we defined in the previous section.</p><div class="code"><div class="title"><code>common/lambda-loginsights2metrics/cloudwatch.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">resource</span><span class="w"> </span><span class="nc">"aws_cloudwatch_event_target"</span><span class="w"> </span><span class="nv">"lambda"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">rule</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_cloudwatch_event_rule.rate.name</span>
<span class="w"> </span><span class="na">target_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"${var.prefix}-target"</span>
<span class="w"> </span><span class="na">arn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">aws_lambda_function.loginsights2metrics.arn</span>
<span class="p">}</span>
</pre></div> </div> </div><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_rule">aws_cloudwatch_event_rule documentation</a></li><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_target">aws_cloudwatch_event_target documentation</a></li></ul><h3 id="using-the-module-5d88">Using the module</h3><p>Now the module is finished, so I just need to create some items for the DynamoDB table and to call the module itself</p><div class="code"><div class="title"><code>account1/lambda-loginsights2metrics/main.tf</code></div><div class="content"><div class="highlight"><pre><span class="nb">locals</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">items</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"SlotName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Celery Logs submitted tasks"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"LogGroup"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster/celery"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"ClusterName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Query"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"parse @message /\\[celery\\.(?<source>[a-z.]+)\\].*Received task: (?<task>[a-z._]+)\\[/ | filter not isblank(source) | stats count(*) as Value by bin(1m)"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Namespace"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Custom"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"MetricName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Submitted tasks"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"SlotName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Celery Logs succeeded tasks"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"LogGroup"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster/celery"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"ClusterName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Query"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"parse @message /\\[celery.(?<source>[a-z\\._]+)].*Task (?<task>[a-z\\._]+)\\[.*\\] (?<event>[a-z]+)/ | filter source = \"app.trace\" | filter event = \"succeeded\" | stats count(*) as Value by bin(1m)"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Namespace"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Custom"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"MetricName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Succeeded tasks"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"SlotName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Celery Logs retried tasks"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"LogGroup"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster/celery"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"ClusterName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"mycluster"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Query"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"parse @message /\\[celery.(?<source>[a-z\\._]+)].*Task (?<task>[a-z\\._]+)\\[.*\\] (?<event>[a-z]+)/ | filter source = \"app.trace\" | filter event = \"retry\" | stats count(*) as Value by bin(1m)"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"Namespace"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Custom"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s2">"MetricName"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s2">"S"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"Retried tasks"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</pre></div> </div> </div><p>I need to provide a security group for the Lambda, and in this case I can safely use the default one provided by the VPC</p><div class="code"><div class="title"><code>account1/lambda-loginsights2metrics/main.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">data</span><span class="w"> </span><span class="nc">"aws_security_group"</span><span class="w"> </span><span class="nv">"default"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"default"</span>
<span class="w"> </span><span class="na">vpc_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.vpc_id</span>
<span class="p">}</span>
</pre></div> </div> </div><p>And I can finally call the module</p><div class="code"><div class="title"><code>account1/lambda-loginsights2metrics/main.tf</code></div><div class="content"><div class="highlight"><pre><span class="kr">module</span><span class="w"> </span><span class="nv">"loginsights2metrics"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="na">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"../../common/lambda-loginsights2metrics"</span>
<span class="w"> </span><span class="na">items</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">local.items</span>
<span class="w"> </span><span class="na">security_groups</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="nv">data.aws_security_group.default.id</span><span class="p">]</span>
<span class="w"> </span><span class="na">vpc_subnets</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">var.vpc_private_subnets</span>
<span class="p">}</span>
</pre></div> </div> </div><p>Please note that the variable <code>vpc_private_subnets</code> is a list of subnet names that I created in another module.</p><h4 id="resources-8ec3">Resources</h4><ul><li><a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/security_group">aws_security_group documentation</a></li><li><a href="https://www.terraform.io/docs/language/modules/develop/index.html">Creating Terraform modules</a></li></ul><h2 id="python-43d2">Python<a class="headerlink" href="#python-43d2" title="Permanent link">¶</a></h2><p>As I mentioned before, the Python code of the Lambda function is contained in a different repository and deployed with the CI using <a href="https://github.com/zappa/Zappa">Zappa</a>. Given we are interacting with AWS I am clearly using Boto3, the <a href="https://boto3.amazonaws.com/v1/documentation/api/latest/index.html">AWS SDK for Python</a>. The code was developed locally without Zappa's support, to test out the Boto3 functions I wanted to use, then quickly adjusted to be executed in a Lambda.</p><p>I think the code is pretty straightforward, but I left my original comments to be sure everything is clear. </p><div class="code"><div class="content"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span>
<span class="kn">import</span> <span class="nn">boto3</span>
<span class="kn">from</span> <span class="nn">zappa.asynchronous</span> <span class="kn">import</span> <span class="n">task</span>
<span class="c1"># CONFIG</span>
<span class="n">logs</span> <span class="o">=</span> <span class="n">boto3</span><span class="o">.</span><span class="n">client</span><span class="p">(</span><span class="s2">"logs"</span><span class="p">,</span> <span class="n">region_name</span><span class="o">=</span><span class="s2">"eu-west-1"</span><span class="p">)</span>
<span class="n">cw</span> <span class="o">=</span> <span class="n">boto3</span><span class="o">.</span><span class="n">client</span><span class="p">(</span><span class="s2">"cloudwatch"</span><span class="p">,</span> <span class="n">region_name</span><span class="o">=</span><span class="s2">"eu-west-1"</span><span class="p">)</span>
<span class="n">dynamodb</span> <span class="o">=</span> <span class="n">boto3</span><span class="o">.</span><span class="n">resource</span><span class="p">(</span><span class="s2">"dynamodb"</span><span class="p">,</span> <span class="n">region_name</span><span class="o">=</span><span class="s2">"eu-west-1"</span><span class="p">)</span>
<span class="nd">@task</span>
<span class="k">def</span> <span class="nf">put_metric_data</span><span class="p">(</span><span class="n">item</span><span class="p">):</span> <span class="callout">3</span>
<span class="n">slot_name</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"SlotName"</span><span class="p">]</span>
<span class="n">log_group</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"LogGroup"</span><span class="p">]</span>
<span class="n">cluster_name</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"ClusterName"</span><span class="p">]</span>
<span class="n">query</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"Query"</span><span class="p">]</span>
<span class="n">namespace</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"Namespace"</span><span class="p">]</span>
<span class="n">metric_name</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s2">"MetricName"</span><span class="p">]</span>
<span class="c1"># This runs the Log Insights query fetching data</span>
<span class="c1"># for the last 15 minutes.</span>
<span class="c1"># As we deal with logs processing it's entirely possible</span>
<span class="c1"># for the metric to be updated, for example because</span>
<span class="c1"># a log was received a bit later.</span>
<span class="c1"># When we put multiple values for the same timestamp</span>
<span class="c1"># in the metric CW can show max, min, avg, and percentiles.</span>
<span class="c1"># Since this is an update of a count we should then always</span>
<span class="c1"># use "max".</span>
<span class="n">start_query_response</span> <span class="o">=</span> <span class="n">logs</span><span class="o">.</span><span class="n">start_query</span><span class="p">(</span> <span class="callout">4</span>
<span class="n">logGroupName</span><span class="o">=</span><span class="n">log_group</span><span class="p">,</span>
<span class="n">startTime</span><span class="o">=</span><span class="nb">int</span><span class="p">((</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span> <span class="o">-</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">minutes</span><span class="o">=</span><span class="mi">15</span><span class="p">))</span><span class="o">.</span><span class="n">timestamp</span><span class="p">()),</span>
<span class="n">endTime</span><span class="o">=</span><span class="nb">int</span><span class="p">(</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span><span class="o">.</span><span class="n">timestamp</span><span class="p">()),</span>
<span class="n">queryString</span><span class="o">=</span><span class="n">query</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">query_id</span> <span class="o">=</span> <span class="n">start_query_response</span><span class="p">[</span><span class="s2">"queryId"</span><span class="p">]</span>
<span class="c1"># Just polling the API. 5 seconds seems to be a good</span>
<span class="c1"># compromise between not pestering the API and not paying</span>
<span class="c1"># too much for the Lambda.</span>
<span class="n">response</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">while</span> <span class="n">response</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">response</span><span class="p">[</span><span class="s2">"status"</span><span class="p">]</span> <span class="o">==</span> <span class="s2">"Running"</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">slot_name</span><span class="si">}</span><span class="s2">: waiting for query to complete ..."</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">logs</span><span class="o">.</span><span class="n">get_query_results</span><span class="p">(</span><span class="n">queryId</span><span class="o">=</span><span class="n">query_id</span><span class="p">)</span>
<span class="c1"># Data comes in a strange format, a dictionary of</span>
<span class="c1"># {"field":name,"value":actual_value}, so this converts</span>
<span class="c1"># it into something that can be accessed through keys</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">response</span><span class="p">[</span><span class="s2">"results"</span><span class="p">]:</span> <span class="callout">5</span>
<span class="n">sample</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">d</span><span class="p">:</span>
<span class="n">field</span> <span class="o">=</span> <span class="n">i</span><span class="p">[</span><span class="s2">"field"</span><span class="p">]</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">i</span><span class="p">[</span><span class="s2">"value"</span><span class="p">]</span>
<span class="n">sample</span><span class="p">[</span><span class="n">field</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
<span class="n">data</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>
<span class="c1"># Now that we have the data, let's put them into a metric.</span>
<span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
<span class="n">timestamp</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">strptime</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s2">"bin(1m)"</span><span class="p">],</span> <span class="s2">"%Y-%m-</span><span class="si">%d</span><span class="s2"> %H:%M:%S.000"</span><span class="p">)</span>
<span class="n">value</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s2">"Value"</span><span class="p">])</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">slot_name</span><span class="si">}</span><span class="s2">: putting </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="s2"> on </span><span class="si">{</span><span class="n">timestamp</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">cw</span><span class="o">.</span><span class="n">put_metric_data</span><span class="p">(</span> <span class="callout">6</span>
<span class="n">Namespace</span><span class="o">=</span><span class="n">namespace</span><span class="p">,</span>
<span class="n">MetricData</span><span class="o">=</span><span class="p">[</span>
<span class="p">{</span>
<span class="s2">"MetricName"</span><span class="p">:</span> <span class="n">metric_name</span><span class="p">,</span>
<span class="s2">"Dimensions"</span><span class="p">:</span> <span class="p">[{</span><span class="s2">"Name"</span><span class="p">:</span> <span class="s2">"Cluster"</span><span class="p">,</span> <span class="s2">"Value"</span><span class="p">:</span> <span class="n">cluster_name</span><span class="p">}],</span>
<span class="s2">"Timestamp"</span><span class="p">:</span> <span class="n">timestamp</span><span class="p">,</span>
<span class="s2">"Value"</span><span class="p">:</span> <span class="n">value</span><span class="p">,</span>
<span class="s2">"Unit"</span><span class="p">:</span> <span class="s2">"None"</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">loginsights2metrics</span><span class="p">(</span><span class="n">event</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span> <span class="callout">1</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"package_info.json"</span><span class="p">,</span> <span class="s2">"r"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">package_info</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">build_timestamp</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">package_info</span><span class="p">[</span><span class="s2">"build_time"</span><span class="p">])</span>
<span class="n">build_datetime</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">fromtimestamp</span><span class="p">(</span><span class="n">build_timestamp</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"###################################"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span>
<span class="s2">"LogInsights2Metrics - Build date: "</span>
<span class="sa">f</span><span class="s1">'</span><span class="si">{</span><span class="n">build_datetime</span><span class="o">.</span><span class="n">strftime</span><span class="p">(</span><span class="s2">"%Y/%m/</span><span class="si">%d</span><span class="s2"> %H:%M:%S"</span><span class="p">)</span><span class="si">}</span><span class="s1">'</span>
<span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"###################################"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Reading task from DynamoDB table </span><span class="si">{</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">"DYNAMODB_TABLE"</span><span class="p">]</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
<span class="n">table</span> <span class="o">=</span> <span class="n">dynamodb</span><span class="o">.</span><span class="n">Table</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">"DYNAMODB_TABLE"</span><span class="p">])</span>
<span class="c1"># This is the simplest way to get all entries in the table</span>
<span class="c1"># The next loop will asynchronously call `put_metric_data`</span>
<span class="c1"># on each entry.</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">table</span><span class="o">.</span><span class="n">scan</span><span class="p">(</span><span class="n">Select</span><span class="o">=</span><span class="s2">"ALL_ATTRIBUTES"</span><span class="p">)</span> <span class="callout">2</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">response</span><span class="p">[</span><span class="s2">"Items"</span><span class="p">]:</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"* Processing item </span><span class="si">{</span><span class="n">i</span><span class="p">[</span><span class="s1">'SlotName'</span><span class="p">]</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">put_metric_data</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
</pre></div> </div> </div><p>So, when the Lambda is executed, the entry point is the function <code>loginsights2metrics</code> <span class="callout">1</span> which queries the DynamoDB table <span class="callout">2</span> and loops over all the items contained in it. The loop executes the function <code>put_metric_data</code> <span class="callout">3</span> which being a Zappa <code>task</code> runs it in a new Lambda invocation. This function runs the Log Insights query <span class="callout">4</span>, adjusts Boto3's output <span class="callout">5</span>, and finally puts the values in the custom metric <span class="callout">6</span>.</p><p>The problem I mention in the comment just before I run <code>logs.start_query</code> is interesting. Log Insights are queries, and since they extract data from logs the result can change between two calls of the same query. This means that, since there is an overlap between calls (we run a query on the last 15 minutes every 2 minutes), the function will put multiple values in the same bin of the metric. This is perfectly normal, and it's the reason why CloudWatch allows you to show the maximum, minimum, average, or various percentiles of the same metric. When it comes to counting events, the number can only increase or stay constant in time, but never decrease, so it's sensible to look at the maximum. This is not true if you are looking at execution times, for example, so pay attention to the nature of the underlying query when you graph the metric.</p><p>The Zappa settings I use for the function are</p><div class="code"><div class="title"><code>zappa_settings.json</code></div><div class="content"><div class="highlight"><pre><span class="p">{</span>
<span class="w"> </span><span class="nt">"main"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"app_module"</span><span class="p">:</span><span class="w"> </span><span class="s2">"main"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"app_function"</span><span class="p">:</span><span class="w"> </span><span class="s2">"main.loginsights2metrics"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"runtime"</span><span class="p">:</span><span class="w"> </span><span class="s2">"python3.8"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"log_level"</span><span class="p">:</span><span class="w"> </span><span class="s2">"WARNING"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"xray_tracing"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"exception_handler"</span><span class="p">:</span><span class="w"> </span><span class="s2">"zappa_sentry.unhandled_exceptions"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div> </div> </div><p>And the requirements are</p><div class="code"><div class="title"><code>requirements.txt</code></div><div class="content"><div class="highlight"><pre>zappa
zappa-sentry
</pre></div> </div> </div><p>Please note that as I mentioned before <code>zappa-sentry</code> is not a strict requirement for this solution.</p><p>The code can be packaged and deployed with a simple bash script like</p><div class="code"><div class="content"><div class="highlight"><pre><span class="ch">#!/bin/bash</span>
<span class="nv">VENV_DIRECTORY</span><span class="o">=</span>venv
<span class="nv">LAMBDA_PACKAGE</span><span class="o">=</span>lambda.zip
<span class="nv">REGION</span><span class="o">=</span>eu-west-1
<span class="nv">FUNCTION_NAME</span><span class="o">=</span>loginsights2metrics
<span class="k">if</span><span class="w"> </span><span class="o">[[</span><span class="w"> </span>-d<span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span><span class="w"> </span><span class="o">]]</span><span class="p">;</span><span class="w"> </span><span class="k">then</span><span class="w"> </span>rm<span class="w"> </span>-fR<span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span><span class="p">;</span><span class="w"> </span><span class="k">fi</span>
<span class="k">if</span><span class="w"> </span><span class="o">[[</span><span class="w"> </span>-f<span class="w"> </span><span class="si">${</span><span class="nv">LAMBDA_PACKAGE</span><span class="si">}</span><span class="w"> </span><span class="o">]]</span><span class="p">;</span><span class="w"> </span><span class="k">then</span><span class="w"> </span>rm<span class="w"> </span>-fR<span class="w"> </span><span class="si">${</span><span class="nv">LAMBDA_PACKAGE</span><span class="si">}</span><span class="p">;</span><span class="w"> </span><span class="k">fi</span>
python<span class="w"> </span>-m<span class="w"> </span>venv<span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span>
<span class="nb">source</span><span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span>/bin/activate
pip<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>requirements.txt
zappa<span class="w"> </span>package<span class="w"> </span>main<span class="w"> </span>-o<span class="w"> </span><span class="si">${</span><span class="nv">LAMBDA_PACKAGE</span><span class="si">}</span>
rm<span class="w"> </span>-fR<span class="w"> </span><span class="si">${</span><span class="nv">VENV_DIRECTORY</span><span class="si">}</span>
aws<span class="w"> </span>--region<span class="o">=</span><span class="si">${</span><span class="nv">REGION</span><span class="si">}</span><span class="w"> </span>lambda<span class="w"> </span>update-function-code<span class="w"> </span>--function-name<span class="w"> </span><span class="si">${</span><span class="nv">FUNCTION_NAME</span><span class="si">}</span><span class="w"> </span>--zip-file<span class="w"> </span><span class="s2">"fileb://</span><span class="si">${</span><span class="nv">LAMBDA_PACKAGE</span><span class="si">}</span><span class="s2">"</span>
</pre></div> </div> </div>
<div class="advertisement">
<a href="https://www.thedigitalcat.academy/freebie-first-class-objects">
<img src="/images/first-class-objects/cover.jpg" />
</a>
<div class="body">
<h2 id="first-class-objects-in-python-fffa">First-class objects in Python<a class="headerlink" href="#first-class-objects-in-python-fffa" title="Permanent link">¶</a></h2>
<p>Higher-order functions, wrappers, and factories</p>
<p>Learn all you need to know to understand first-class citizenship in Python, the gateway to grasp how decorators work and how functional programming can supercharge your code.</p>
<div class="actions">
<a class="action" href="https://www.thedigitalcat.academy/freebie-first-class-objects">Get your FREE copy</a>
</div>
</div>
</div>
<h2 id="costs-dbe1">Costs<a class="headerlink" href="#costs-dbe1" title="Permanent link">¶</a></h2><p>I will follow here the <a href="https://aws.amazon.com/lambda/pricing/">AWS guide on Lambda pricing</a> and the calculations published in 2018 by my colleague João Neves on <a href="https://silvaneves.org/how-much-does-a-lambda-cost.html">his blog</a>.</p><p>I assume the following:</p><ul><li>The Lambda runs 4 queries, so we have 5 invocations (1 for the main Lambda and 4 asynchronous tasks)</li><li>Each invocation runs for 5 seconds. The current average time of each invocation in my AWS accounts is 4.6 seconds</li><li>I run the Lambda every 2 minutes</li></ul><p>Requests: <code>5 invocations/event * 30 events/hour * 24 hours/day * 31 days/month = 111600 requests</code></p><p>Duration: <code>0.128 GB/request * 111600 requests * 5 seconds = 71424 GB-second</code></p><p>Total: <code>$0.20 * 111600 / 10^6 + $0.0000166667 * 71424 ~= $1.22/month</code></p><p>As you can see, for applications like this it's extremely convenient to use a serverless solution like Lambda functions.</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>First-class objects in Python - Higher-order functions, wrappers, and factories2021-03-09T16:00:00+00:002022-09-18T23:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2021-03-09:/blog/2021/03/09/first-class-objects-in-python/<p>My new book "First-class objects in Python" is out! Grab your <strong>FREE</strong> copy <a href="https://www.thedigitalcat.academy/freebie-first-class-objects">here</a>!</p><div class="imageblock"><img src="/images/first-class-objects-in-python.jpg"></div>Mau: a lightweight markup language2021-02-22T10:00:00+00:002021-02-25T18:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2021-02-22:/blog/2021/02/22/mau-a-lightweight-markup-language/<p>Mau is a lightweight markup language heavily inspired by AsciiDoc that makes is very easy to write blog posts or books.</p><h2 id="what-is-mau-fb3c">What is Mau?<a class="headerlink" href="#what-is-mau-fb3c" title="Permanent link">¶</a></h2><p>Mau is a lightweight markup language heavily inspired by AsciiDoc that makes is very easy to write blog posts or books.</p><p>The main goal of Mau is to provide a customisable markup language, reusing the good parts of AsciiDoc and providing a pure Python 3 implementation.</p><p>You can find Mau's source code on <a href="https://github.com/Project-Mau/mau">GitHub</a>.</p><h2 id="why-not-markdown-or-asciidoc-b535">Why not Markdown or AsciiDoc?<a class="headerlink" href="#why-not-markdown-or-asciidoc-b535" title="Permanent link">¶</a></h2><p>Markdown is a very good format, and I used it for all the posts in this blog so far. I grew increasingly unsatisfied, though, because of the lack of some features and the poor amount of customisation that it provides. When I wrote the second version of my book "Clean Architectures in Python" I considered using Markdown (through Pelican), but I couldn't find a good way to create tips and warnings. Recently, Python Markdown added a feature that allows to specify the file name for the source code, but the resulting HTML cannot easily be changed, making it difficult to achieve the graphical output I wanted.</p><p>AsciiDoc started as a Python project, but then was abandoned and eventually resurrected by Dan Allen with Asciidoctor. AsciiDoc has a lot of features and I consider it superior to Markdown, but Asciidoctor is a Ruby program, and this made it difficult for me to use it. In addition, the standard output of Asciidoctor is a nice single HTML page but again customising it is a pain. I had to struggle to add my Google Analytics code and a <code>sitemap.xml</code> to the book site.</p><p>I simply thought I could try to write my own tool, in a language that I know well (Python). It works, and I learned a lot writing it, so I'm definitely happy. I'd be delighted to know that this can be useful to other people, though.</p><h2 id="pelican-f581">Pelican<a class="headerlink" href="#pelican-f581" title="Permanent link">¶</a></h2><p>A reader for Mau source files is available in Pelican, you can find the code at <a href="https://github.com/getpelican/pelican-plugins/pull/1327">https://github.com/getpelican/pelican-plugins/pull/1327</a>. Simply add the code to your Pelican plugins directory and activate it adding <code>"mau_reader"</code> to <code>PLUGINS</code> in your file <code>pelicanconf.py</code>. The Mau reader processes only files with the <code>.mau</code> extension, so you can use Markdown/reStructuredText and Mau at the same time.</p><h2 id="development-f3c5">Development<a class="headerlink" href="#development-f3c5" title="Permanent link">¶</a></h2><p>If you are interested you can leave a star on the project on the <a href="https://github.com/Project-Mau/mau">GitHub page</a>, start using it, or contribute ideas, code, bugfixes.</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>