When you create standard RSA keys with ssh-keygen you end up with a private key in PEM format, and a public key in OpenSSH format. Both have been described in detail in my post Public key cryptography: RSA keys. In 2014, OpenSSH introduced a custom format for private keys that is apparently similar to PEM but is internally completely different. This format is used by default when you create ed25519 keys and it is expected to be the default format for all keys in the future, so it is worth having a look.

While investigating this topic I found a lot of misconceptions and wrong or partially wrong statements on Stack Overflow, so I hope this might be a comprehensive view of what this format is, its relationship with PEM, and the tools that you can use to manipulate it.

I'm not the first programmer to look into this, clearly, and I have to mention two posts that I read before writing this one: OpenSSH ed25519 private key file format written in December 2017 by Peter Lyons and The OpenSSH private key binary format, written in August 2020 by Marin Atanasov Nikolov. I'm sure many others have done this research but these are the resources that I found and I want to say a big thanks to both authors for sharing their findings. I will shamelessly use their results in the following explanation, as I hope others will do with what I'm writing here. Sharing knowledge is one of the best ways to help others.

Please note that all the private keys shown in this post have been trashed after I published it.

Note: as the word "key" can identify several different component of the systems I will describe, I will as much as possible use the words "private key" and "encryption key". The first is the key that we generate to be used in SSH, while the second is a parameter of a (symmetric) encryption algorithm.

KDFs and protection at rest

Describing the introduction of the new format, the OpenSSH changelog says

Add a new private key format that uses a bcrypt KDF to better
protect keys at rest. This format is used unconditionally for
Ed25519 keys, but may be requested when generating or saving
existing keys of other types via the -o ssh-keygen(1) option.
We intend to make the new format the default in the near future.
Details of the new format are in the PROTOCOL.key file.

Before we start dissecting the format, then, it is worth briefly discussing what a KDF is, what bcrypt is, and what it means to protect keys at rest.

Key Derivation Functions

Whenever a system is protected by a password you want to store the latter somewhere. This is clearly necessary to check the validity of the passwords that the user inputs and decide if you should grant access, but you shouldn't store the password in clear text, as a breach in the storage might compromise the whole system. The idea behind storing password securely is to run them through a hash function and store the hash: whenever someone inputs a password we can run the hash function again and compare the two hashes. However, we also want to prevent the attacker to be able to reconstruct the password from the hash, so we need a cryptographic hash function, which is a hash function with added requirements to prevent an easy inversion of the process.

The same strategy can be applied when it comes to encryption. An encryption system needs a key (a sequence of bits used to encrypt the message) and we need to derive it from the password given by the user. Encryption keys are required to have a specific length dictated by the encryption algorithm that we use, so hashing looks like a good solution, as all hashes generated by a given algorithm are by definition of the same size. AES, for example, one of the most widespread symmetric block ciphers, uses a key of exactly 128, 192, or 256 bits. Converting the password into a key of predetermined size is called stretching.

Any cryptographic system can be broken using a brute-force attack, as you can always test all possible inputs. In the case of login, we can just input all possible passwords until we get access to the system, while in the case of encryption we can try to decrypt using all possible keys until we obtain a meaningful result. This means that the most important thing we can do to protect such systems is to make brute-force attacks infeasible. This can be done increasing the key size (using more bits) but also using a slow stretching algorithm.

While hash functions created for things like digital signatures should be fast, then, hash functions that we use to obfuscate the password (for storage) or to create the key (for encryption/decryption) have to be very slow. The slowness of the processing can frustrate brute-force attacks and make them less effective is not infeasible. An example: at the current state of technology, you can easily hash 1 trillion passwords a second with a trivial expense, but if each one of those hashes takes 1 second you end up having to wait more than 31,000 years before you test all of them.

The process that converts a password into a key is called Key Derivation Function (KDF) and despite the name it is usually a complex algorithm and not a single mathematical function. PBKDF2 is an important KDF, defined as part of the specification PKCS #5, and it can use any pseudorandom function as part of the key stretching. An important feature of PBKDF2 is that it accepts an iteration count as input, that allows to slow down the process. As we just saw, this is the key to making the algorithm slower in order to adapt to the increasing computing power available to attackers.

bcrypt

The password-hashing function known as bcrypt was created in 1999 and is based on the Blowfish cipher created in 1993. Bcrypt is well know to be an extremely good choice thanks to the simple fact that its slowness can be increased tuning one of the parameters of the algorithm called "cost factor". This represents the number of iterations done in the setup of the underlying cipher, and its logarithmic nature makes easy to adapt the whole process to the increasing computational power available to attackers. This post attempts to estimate the time to hash a password of 15 characters with a cost of 30 (the maximum is actually 31) with a decent 2017 laptop (2.8 GHz Intel Core i7 16 GB RAM). The result turns out to be around 500 days which makes you understand that bcrypt won't die easily. It is important to note here that bcrypt is not a KDF, but a hash function. As such, it might be part of a KDF, but not replace the whole process.

Protection at rest

Protection at rest refers to the scheme that ensures data is secure when it is stored. Practically speaking, when it comes to SSH keys, we refer to the fact that an attacker that can physically access a key, for example stealing a laptop, actually owns an encrypted version of the key, which can't be used without first decrypting it. As the attacker is supposed to ignore the password used to encrypt the key, the only strategy they can use is to brute-force the key, and here is where the concept of protection at rest comes into play. Actually, the other strategy they can employ is to kidnap you and to force you to reveal the password, but this somehow falls outside the sphere of cryptographic security.

PEM format and protection at rest

Now that I clarified some terminology, let's have a look at what the standard PEM format does to store encrypted passwords. As I explained in my post Public key cryptography: RSA keys a PEM file contains a text header, a text footer, and some content. The content is always an ASN.1 structure created using DER and encoded using base64.

For encrypted private keys, the ASN.1 structure is created following a standard called PKCS #8. This standard uses an encryption scheme called PBES2 described in the specification PKCS #5, which uses a symmetric cipher and a password, previously converted into an encryption key using the KDF called PBKDF2. I hope at this point some if not all of these names ring a bell.

We can roughly sketch the process with the following steps:

  • Create the private key using the requested asymmetric algorithm (e.g. RSA or ED25519)
  • Encrypt the private key following PBES2
    • Stretch the password into an encryption key using PBKDF2 with one of the possible hash functions and a random salt value
    • Encrypt the private key using the newly created encryption key
  • Represent the encrypted key and the parameters used for PBKDF2 using ASN.1/DER
  • Encode the result with base64
  • Add a header and a footer that specify the nature of the content

Let's create an encrypted key with OpenSSL and analyse it. The command I used is

openssl genpkey -aes-256-cbc -algorithm RSA\
    -pkeyopt rsa_keygen_bits:4096 -pass pass:foobar\
    -out key_rsa_4096_openssl_pw

which creates a 4096 bits RSA key and encrypts it with AES using foobar as password. What I get is a file in the aforementioned PEM format

-----BEGIN ENCRYPTED PRIVATE KEY-----
MIIJrTBXBgkqhkiG9w0BBQ0wSjApBgkqhkiG9w0BBQwwHAQIW+BK6UQtCPACAggA
MAwGCCqGSIb3DQIJBQAwHQYJYIZIAWUDBAEqBBCIvU4FD31mkYR76ugTEhuwBIIJ
UJPHGeObOC1lHMrTTKhdyiekEcJhCO3rzP/gqVpqXkjhUASTWEsE9LEcuGKdrzAN
Dsy/WL9revg9UAQtGAk8WTSqWhv5JaCC4FqLGirqLMzhU51Jf4GbmCOWAWGP7TZu
[...]
QEfBUexTcFVf13cVX7LFGOAZ3yIvFc3sfl5nyYY9Nerk8MxUOW+9Ck5loTEzMj9j
xJf5RsNvcoGVg33Rf7vl2xFIAD+PFdehd8n2CveQ48LJ9Zfn0gsRPQrPL+02Nlhu
7f44uW/Vq2YqG3PN1n8GUTexvF/qCKkd2T2QmHYnK9cryRn0xHvzSjSsQls170sA
Svu0sdTwh1tIs/sxRGuSta+iXPfHJnW4sZzh/2lAMvkgML6h9JAeIYV6e/qUqYSq
GxSfj7s0Qs0K5e3Xv1lCQUhSz82fBysznjeAhWa45YEV
-----END ENCRYPTED PRIVATE KEY-----

We can dump the ASN.1 content directly from the PEM format using openssl asn1parse

$ openssl asn1parse -inform pem -in key_rsa_4096_openssl_pw
    0:d=0  hl=4 l=2477 cons: SEQUENCE          
    4:d=1  hl=2 l=  87 cons: SEQUENCE          
    6:d=2  hl=2 l=   9 prim: OBJECT            :PBES2  1
   17:d=2  hl=2 l=  74 cons: SEQUENCE          
   19:d=3  hl=2 l=  41 cons: SEQUENCE          
   21:d=4  hl=2 l=   9 prim: OBJECT            :PBKDF2  2
   32:d=4  hl=2 l=  28 cons: SEQUENCE          
   34:d=5  hl=2 l=   8 prim: OCTET STRING      [HEX DUMP]:5BE04AE9442D08F0  4
   44:d=5  hl=2 l=   2 prim: INTEGER           :0800  5
   48:d=5  hl=2 l=  12 cons: SEQUENCE          
   50:d=6  hl=2 l=   8 prim: OBJECT            :hmacWithSHA256  6
   60:d=6  hl=2 l=   0 prim: NULL              
   62:d=3  hl=2 l=  29 cons: SEQUENCE          
   64:d=4  hl=2 l=   9 prim: OBJECT            :aes-256-cbc  3
   75:d=4  hl=2 l=  16 prim: OCTET STRING      [HEX DUMP]:88BD4E050F7D6691847BEAE813121BB0
   93:d=1  hl=4 l=2384 prim: OCTET STRING      [HEX DUMP]:93C719E39B382D[...]

Please note that I truncated the final OCTET STRING that contains the encrypted key as it is pretty long.

You can clearly see that this key is encrypted using PBES2 1 and PBKDF2 2. The algorithm used to encrypt the key is aes-256-cbc 3, as I asked. Specifically, this is AES with a key of 256 bits in CBC mode).

According to the PKCS #5 specification, the PBES2 block contains

PBES2-params ::= SEQUENCE {
       keyDerivationFunc AlgorithmIdentifier {{PBES2-KDFs}},
       encryptionScheme AlgorithmIdentifier {{PBES2-Encs}} }

and indeed we have PBKDF2 1 for keyDerivationFunc, and aes-256-cbc 3 for encryptionScheme. The sequence PBKDF2 is specified in the same document as

PBKDF2-params ::= SEQUENCE {
       salt CHOICE {
           specified OCTET STRING,
           otherSource AlgorithmIdentifier {{PBKDF2-SaltSources}}
       },
       iterationCount INTEGER (1..MAX),
       keyLength INTEGER (1..MAX) OPTIONAL,
       prf AlgorithmIdentifier {{PBKDF2-PRFs}} DEFAULT
       algid-hmacWithSHA1 }

As you can see in the ASN.1 dump the salt is 5BE04AE9442D08F0 4, the iteration count is 2048 (0x800) 5, and the hash function (prf, pseudorandom function) is hmacWithSHA256 6 without any additional parameters. The value 2048 for the iterations is a default value in OpenSSL (see the definition of PKCS5_DEFAULT_ITER).

OpenSSH's private key format

As we saw at the beginning of the post, the OpenSSH team came up with a custom format to store the private keys, so now that we are familiar with the nomenclature and with the way PEM stores encrypted keys, lets see what this new format can do.

The best starting point for our investigation is the tool ssh-keygen which we can use to create private keys. The source can be found in the OpenSSH repository in the file ssh-keygen.c. This file uses two different functions, sshkey_private_to_blob2 (source code) for the new format and sshkey_private_to_blob_pem_pkcs8 (source code) for keys in PKCS #8 format. The former calls bcrypt_pbkdf which comes from OpenBSD (source code).

This function contains a modified implementation of PBKDF2 that uses bcrypt as the core hash function. The comment that you can find at the top of the file bcrypt_pbkdf.c says

/*
 * pkcs #5 pbkdf2 implementation using the "bcrypt" hash
 *
 * The bcrypt hash function is derived from the bcrypt password hashing
 * function with the following modifications:
 * 1. The input password and salt are preprocessed with SHA512.
 * 2. The output length is expanded to 256 bits.
 * 3. Subsequently the magic string to be encrypted is lengthened and modified
 *    to "OxychromaticBlowfishSwatDynamite"
 * 4. The hash function is defined to perform 64 rounds of initial state
 *    expansion. (More rounds are performed by iterating the hash.)
 *
 * Note that this implementation pulls the SHA512 operations into the caller
 * as a performance optimization.
 *
 * One modification from official pbkdf2. Instead of outputting key material
 * linearly, we mix it. pbkdf2 has a known weakness where if one uses it to
 * generate (e.g.) 512 bits of key material for use as two 256 bit keys, an
 * attacker can merely run once through the outer loop, but the user
 * always runs it twice. Shuffling output bytes requires computing the
 * entirety of the key material to assemble any subkey. This is something a
 * wise caller could do; we just do it for you.
 */

As you can see, this is intended to be a pkcs #5 pbkdf2 implementation that uses bcrypt as its underlying hash function. It also mentions some modifications, and it's worth noting that when you modify a standard you are not following the standard any more. I won't run through all the details of the implementation, though, as it's beyond the scope of the post.

So, the OpenSSH private key format ultimately contains a private key encrypted with a non-standard version of PBKDF2 that uses bcrypt as its core hash function. The structure that contains the key is not ASN.1, even though it's base64 encoded and wrapped between header and footer that are similar to the PEM ones. A description of the structure can be found in https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/PROTOCOL.key.

Cost factor and rounds

PBKDF2 uses the concept of rounds to make the key stretching slower. This is the number of times the hash function is called internally (using as salt the output of the previous iteration), so in PBKDF2 the number of rounds or iterations is directly proportional to the slowness of the stretching operation.

Bcrypt implements a similar mechanism with its cost factor. The cost factor in the standard bcrypt implementation is defined as the binary logarithm of the number of iterations of a specific part of the process (the repeated expansion of the password and the salt). Using the binary logarithm means that a cost factor of 4 (the minimum) corresponds to 16 iterations, while 31 (the maximum) corresponds to 2,147,483,648 (more than 2 billion) iterations.

In the OpenSSH/OpenBSD implementation things are a bit different.

OpenBSD's version of bcrypt runs with a fixed cost of 6, that creates 64 iterations of the key expansion (source code), but being an implementation of PBKDF2 it can still be hardened increasing the number of rounds (source code). Those rounds correspond to the value given to the parameter -a of the ssh-keygen command line.

How many rounds?

When it comes to KDFs, the advice is always to run as much iterations as possible while keeping the specific application usable, so you need to tune your SSH keys testing different values in your system. To give you some rough estimations, Wikipedia mentions that for PBKDF2 the number of iterations used by Apple and Lastpass is between 2k and 100k. It is worth reiterating though that you shouldn't aim to use other people's figures, in this case. Instead, run tests of your software and hardware.

On my laptop, an i7-8565U with 32GiB of RAM running Kubuntu 20.04 I get the following results, which are pretty linear:

ssh-keygen -a 100 -t ed25519	0.667s
ssh-keygen -a 500 -t ed25519	3.148s
ssh-keygen -a 1000 -t ed25519	6.331s
ssh-keygen -a 5000 -t ed25519	31.624s

A sensible value for me might be between 100 and 500, then, so that I don't have to wait too long every time I push and pull my branches from GitHub.

Can we convert private OpenSSH keys into PEM?

As OpenSSL doesn't understand the OpenSSH private keys format, a common question among programmers and devops is if it is possible to convert it into a PEM format. As you might have guessed reading the previous sections, the answer is no. The PEM format for private keys uses PKCS#5, so it supports only the standard implementation of PBKDF2.

It's interesting to note that the OpenSSL team also specifically decided not to support this new format as it is not standard (see https://github.com/openssl/openssl/issues/5323).

A poorly documented format

PEM, PKCS #8, ASN.1, and all other formats that we use every day, included the OpenSSH public key format, are well documented and standardised in RFCs or similar documents. The OpenSSH private key format is documented in a tiny file that you can find in the source code, but doesn't offer more than a quick overview. To have a good understanding of what is going on I had to read the source code, not only of OpenSSH, but also of OpenBSD.

I think poor documentation like this might be acceptable in personal projects or in new tools, but SSH is used by the whole world, and when the team decides to come up with a completely new format for one of its most important elements I would expect them to detail every single bit of it, or at least try to be more open about the reasons and the implementation. I also personally believe that standards can't but benefit intercommunication between systems and, in cryptography, improve security, since they are reviewed and discussed by a wider audience.

The claim is that the new SSH private key format offers a better protection of keys at rest. I'd be very interested to see a cryptanalysis made by some expert (which I'm not). Cryptography is a tricky field, and often things that are apparently smart end up being tragically wrong.

Resources

Feedback

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.