Overture

Whoever uses a computer has to deal every day with software version numbers, and even occasional users shall sooner or later introduce the word version in their speech. Alas, this made it one of those concepts so pervading that it is now taken for granted, and so widespread that even professionals often forget or do not even know the complexity hidden behind it. This article is an attempt to shine a light on this topic.

This post started as a collection of some thoughts that aimed to summarise my experience in more than 20 years of software development. As usual, short sentences proved to be too short and became long sentences, then paragraphs, and the short list was already a long article. However, despite the length, my analysis does not claim to be either complete or exact. It simply represents an attempt to introduce the reader to the complexity of a matter which is always relevant in the software world.

A known concept?

Isn't versioning just a simple concept that everybody knows?

Almost all computer users around the world can roughly deal with different versions of the most common programs. Windows 10 is more recent than Windows 8, Photoshop 2020 has been replaced by Photoshop 2021, and so on.

Certain software products, however, have more complicated version numbers. For example, the latest Chrome version at the time of writing is 93.0.4577.63, and the Linux kernel that I am running on my notebook is 5.4.0-81-generic. Whoever approaches software development for the first time might be surprised by such complicated numbers, and fail to grasp the reason behind them.

I think that as software developers we need to be aware of what is in a version number, and the reasons behind the choice of one scheme or another.

An example of product versioning

To show that things are a bit more complicated than they appear at a first glance, let's have a look at a simple example of versioning outside the pure software world. In 1994, Sony created the Playstation and has since then released 5 versions of it, named Playstation, Playstation 2, Playstation 3, Playstation 4, and Playstation 5. A simple and effective versioning scheme.

In 2000, Sony released the PS One, which was a redesigned version of the original Playstation, but as you can see the name doesn't fit the initial simple list. Even different configurations of the following models fall outside the main naming scheme: the Playstation 2 Slim, the Playstation 3 Super Slim, the Playstation 4 Pro, and the Playstation 5 Digital Edition are examples of such configurations.

Even though such names do not follow strict rules, it's still pretty easy to identify the so-called major versions and to decide which is newer, as any configuration of the Playstation 3 (standard, Slim, and Super Slim) is more modern than any configuration of the Playstation 2. It is not so simple to know where to put the Playstation One, though, without knowing the history of the product.

In 2001 Microsoft entered the market with the Xbox, one year after Sony introduced the Playstation 2. About four years later, Sony was planning to release the third version of Playstation, and Microsoft answered with Xbox 360. The third version of Xbox was initially nicknamed Xbox 720, but it hit the market with the official name of Xbox One, followed 7 years later by the Xbox Series S and Series X.

In this case, it's not so trivial to identify which console came first. While the Playstation One was a new version of the original console, the Xbox One belongs to the third generation of that product line, and without being into the console world, it might be tough to guess whether the Xbox 360 came before or after the Series S.

A formal definition of versioning

So far, I have leveraged an intuitive understanding of the concepts of versioning and versioning scheme, but it's time to try to give a formal definition to them.

Versioning means labelling the state of an object at a given moment in time according to a set of rules. This set of rules is called versioning scheme or versioning system. Versioning schemes can be formal and strict, such as those used in software products, or weak, such as those used in marketing. A weak system can just use arbitrary labels, as we saw in the example of consoles, while strong versioning schemes have to clearly establish the rules that version numbers are meant to follow.

A versioning scheme has two main characteristics that we want to take into account, which I called expressivity and coverage. The first is the amount of information that you can obtain just by looking at a certain version number or comparing two, while the second measures how well the versioning scheme covers all the possible states in which the versioned object can be.

Back to the example drawn from the console world, the Playstation versioning scheme has a certain amount of expressivity, as it is at least easy to decide which console came first, while the Xbox version codes can't truly be compared. From the point of view of coverage, both schemes are pretty weak, as they don't establish a strict way to identify minor configuration changes. Please also note that neither scheme allows us to know which components changed between major versions just by looking at the version number itself.

As mentioned before, such shortcomings are perfectly acceptable in the context of marketing but must be avoided in a more formal engineering context such as that of software development.

A simple example

Let us consider an example of software versioning so that we can better visualise the problems we face. The simplest versioning scheme we can conceive is made of a single integer number, such as that used by Sony with its Playstation. The first version of the software will be labelled version 1, the next version will be version 2, then version 3 and so on.

Such a scheme seems to work well, and as we saw it has been successfully used for commercial products. When it comes to software products, however, we need to be aware that both the expressivity and the coverage of this system can be problematic.

With such a scheme, it is impossible to grasp the type and amount of changes that occurred between two versions just by comparing the two version numbers. For example, both fixing a minor bug and a complete rewrite of the graphical interface will increase the version number by one. Thus, the impact of the changes between version X and Y are unknown until we check the release notes that will hopefully clarify the matter.

Things are not much better when it comes to the coverage. Let's consider the case of a piece of software that has two major versions available on the market at the same time, labelled 1 and 2 according to the scheme. If both versions are actively maintained, bugs have to be fixed and a new version has to be released. This is often the case with software: as an example take the Python programming language, whose versions 2 and 3 have been actively maintained together for 12 years while large codebases were migrated to the newest major version.

In such a scenario, developers might find a bug that affects both versions, thus requiring a new version to be created. The new release of version 2 will be called 3 according to the versioning scheme, but what would happen to version 1? It cannot be called version 2 since that label has already been assigned to another state of the code.

Now, the fact that both the expressivity and the coverage of this versioning scheme proved low doesn't mean it cannot be used. Always remember that shortcomings are such in relation to some requirements. For example, if you never have two different versions available at the same time, the above scheme might be a good solution.

An improved example

The first thing that comes to mind to improve the previous example is to add a secondary component to the version number that can be changed without affecting the primary one. A very broadly accepted and used method to add a new component to a version number is to add digits separating them from other components with a single dot (e.g. version 1.2).

As soon as we decide to introduce a new component we need also to decide the role of each of the two parts of the version. For now, let's simplify the matter by calling them "major" and "minor" without giving a formal definition, which will come later.

So, the first version of the software will be 1.0 (Major version 1, minor version 0). A minor change will increase the rightmost number, producing versions 1.1, 1.2, and so on, while a major one will increase the leftmost number and reset the rightmost, producing versions 2.0, 3.0, and so on. A typical version trail might be for example 1.0, 1.1, 1.2, 2.0, 2.1, 3.0, and so on.

From the point of view of the expressivity, comparing version numbers in this scheme tells us more about the changes that occurred. Version 1.3 that follows version 1.2 will have minor changes, while a jump from 1.3 to 2.0 will contain code with a greater impact. Please note, however, that we still have to define what "major" and "minor" mean in this context, and that "greater impact" is thus still a vague concept. When we consider coverage we also see improvements in this scheme when compared to the previous one. If both versions 1.0 and 2.0 are actively maintained, a minor change that affects both will simple generate versions 1.1 and 2.1.

About numbering

I think it is important at this point to discuss numbering systems a bit more in-depth. If you are not very familiar with software versioning you might have been surprised by the version trail shown above: why is version 1.0 the first one and not 1.1?

Generally, we start counting from 1, and if you ask someone to tell you the first ten numbers, he or she will likely answer with the sequence 1, 2, 3, ..., 10. Base-10 numerals, however, go from 0 to 9, so we might argue that zero should be considered the first number. You might be surprised to discover that this debate is all but simple and that there are many different conventions and opinions on the matter.

In particular, in computer programming, there are languages that index arrays from 0 (e.g. Lisp, C, Python, JavaScript), and languages that opted for 1-indexed arrays (e.g. Fortran, COBOL, MATLAB). While it might be argued that indexing from zero is exactly what we do when we consider all numbers with a given amount of digits, there is an ambiguity in the common language since we name positions in a list starting from 1, thus mixing the cardinal and ordinal system. For example, in English, the word "third" comes from three, "fourth" comes from four ("the first” comes from "the foremost" and “the second” comes from the Latin secundus), so if we used 0-indexed sequences we would end up with the third element having index 2, the fourth having index 3, and so on.

This ambiguity comes from the rather late introduction of the number zero in western culture, officially around 1200, and if you think this doesn't affect your everyday life please consider how you read the clock. The day starts at 00:00, not at 01:01. Indeed "the first hour" is hour number 0.

When it comes to versioning, it's commonly accepted that the first version of a piece of software is 1.0, and as you can see if that's true there is inconsistency between the way we treat the first digit and the second one. Versions starting with 0 exist but they are usually considered belonging to the early stages of development and thus being "unstable". I will come back to this point later.

Incompatibility and versioning

Previously I talked about changes that introduce an incompatibility, but I did not formally define this concept. Incompatibility, in the computer world, is the complete or partial impossibility of using a version of a piece of software (or hardware) in place of another.

Let’s consider a simple example taken from the everyday life of a common computer user. I can write a document using Microsoft Word 2007 and save it in its native format, namely DOCX. Then I send the file to another person, who can only use Word 2003 and he or she cannot open it. Why? Simply because the changes introduced in Office 2007 to the file format are incompatible, that is they change the file format to such an extent that a previous version of the software cannot even show a part of the content.

As you can imagine, the fields where incompatibility problems can arise are many. Speaking in general every part of a piece of software (or hardware) that is exposed to external systems can be affected by incompatibility issues. Such parts are altogether known as interfaces, and according to the field of application, you can have network interfaces, programming interfaces, hardware interfaces, etc.

An example borrowed from the hardware world will make this clear. If you design a new version of a webcam, replacing the internal components to provide a much better picture quality but keeping the same connector (e.g. USB-A) it used previously, every user will be able to instantly switch to the new version. If you change the connector (e.g. from USB-A to USB-C) users might have to buy an adapter, if their notebook doesn't have ports with the new interface.

It should be noted at this point that not every change to an interface introduces an incompatibility. A backward compatible interface can act both as the new version and the old one (or ones), either automatically or manually driven. For example, the previously mentioned Office 2007 is backwards compatible with Office 2003 since it can automatically read the old file format and can manually be driven to save a file in that format.

Users will always take for granted that your software is backwards compatible with its previous versions. So it would be highly desirable for a versioning scheme to be able to clearly show possible incompatibilities in its interfaces when moving between versions to the next. The details of the incompatibility would still require to be clarified in the documentation, but the version numbers would act as a first warning.

Example: semantic versioning

A well-known and widespread versioning scheme is Semantic Versioning (also known as SemVer), which full documentation can be found at https://semver.org/. The central idea of this scheme is to use three numbers separated by dots and to assign to each of them a meaning that is strictly connected with the changes to the interfaces.

As you can read in the official documentation, a SemVer version is expressed as MAJOR.MINOR.PATCH where a change of the MAJOR number signals an incompatible change, a change of the MINOR a compatible change, and a change of PATCH a bugfix (again, backward compatible).

As you can see, it's very easy to understand what we should expect from an upgrade just by looking at the version numbers. An upgrade from version 2.3.0 to version 2.3.1 is supposed to just fix issues and thus can be done safely. An upgrade from version 2.3.0 to version 2.4.0 will introduce new features or mark some as deprecated (but still keeping them active), which means that we can upgrade safely, but we should have a good look at the documentation to know what is going on. Last, a change from version 2.3.0 to version 3.0.0 is supposed to be destructive, so it has to be planned carefully.

SemVer allows to label pre-releases and to append build metadata, and the versioning system documentation has been versioned with SemVer itself, having reached version 2.0.0 in 2013.

Please note, however, that the documentation states that "Software using Semantic Versioning MUST declare a public API", which makes it clear that the scheme is well suited for libraries and may be less useful or appropriate for user-facing tools like frameworks and programming languages.

Example: calendar versioning

While semantic versioning proposes a pretty strict approach and set of rules, Calendar versioning (CalVer) is more intended to be a set of guidelines to design your own versioning scheme. Despite the name, calendar versioning is not strictly connected with objective time but rather tries to make the release number more connected with the release calendar of the piece of software you maintain.

As such, CalVer versions usually incorporate a part that is time-related, and other numbers that are more similar to the ones used by SemVer in that they try to capture and communicate the extent of changes made in a version.

You can read CalVer documentation at https://calver.org/.

Versioning is a discipline

Versioning is not a set of fixed rules, but a methodology. When a development group follows a new methodology it both adopts and adapts it. Adopting means shifting the work procedures to follow those proposed by the methodology while adapting means changing the methodology to fit the environment where it is supposed to be used.

Versioning is a discipline, a way of thinking, where rules are the product of general concepts modelled after the scope and the environment where they are used. So the versioning scheme of a product can vary over time since product or user requirements change.

Software is one of the environments where design is faster, always changing, and distributed among several people, so many things need to be versioned, not just the final product. The latter, indeed, could have a life cycle of several years, but in the meantime, bug fixes could be released, experimental code produced, tools and documentation enhanced. All those things can, and many times they should, be versioned.

Do not assume that one single versioning scheme can fit all your needs. The method for dealing with an object is imposed by the object itself, so the versioning scheme is imposed by the versioned object. Your documentation probably needs a different scheme from a software tool.

Thus, the following guidelines must be shaped by the specific needs of the development group, and these are forced by the product, the size of the team, the market and several other surrounding conditions of which each team manager should take account.

1. Versioning rules and procedures should take into account not only current needs but also future ones, to avoid as much as possible to be forced to change them. When you design something it is advisable to consider the matter from several points of view and to explore different “what if?” scenarios.

2. Versioning rules must be clearly explained and written in an easily accessible document (e.g. on an intranet in HTML format). Rules must be short, and you should provide use cases and answers to frequently asked questions.

3. Versioning rules must be quickly changed if they are no more suitable to the use cases that the developers have to deal with. One of the Extreme Programming mantras, “embrace change”, must be the spirit of any rule system which aims to be useful.

4. The change of versioning rules should be fully backwards compatible if possible. Every time the system evolves, it should thus become a superset of the previous rules. This way the technical shock originated by the change can be easily absorbed. Backward compatibility is not always possible to achieve, and in such cases, the two systems should possibly be marked so that a version number is unequivocally connected to one of them (using a prefix, for example). The date when the new system takes over the old one must be recorded.

Finale

As you can see there is much to say about this subject, and this article only scratches the surface of the matter. However, I hope it helped you to have a big picture of the different aspects that hide behind those simple labels or numbers called versions.

If you are managing a project and are looking for a tool to manage the versions you might want to have a look at punch, a highly configurable tool written in Python that can automatically increase the version number in any file according to the versioning rules you defined.

Resources

Feedback

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.