[PATCH 0/9] Enable hibernation when Lockdown is enabled

From: Matthew Garrett
Date: Fri Feb 19 2021 - 20:34:07 EST


Lockdown in integrity mode aims to ensure that arbitrary code cannot end
up in ring 0 without owner approval. The combination of module signing
and secure boot gets most of the way there, but various other features
are disabled by lockdown in order to avoid more convoluted approaches
that would enable unwanted code to end up in kernel space. One of these
is hibernation, since at present the kernel performs no verification of
the code it's resuming. If hibernation were permitted, an attacker with
root (but not kernel) privileges could disable swap, write a valid
hibernation image containing code of their own choosing to the swap
partition, and then reboot. On reboot, the kernel would happily resume
the provided code.

This patchset aims to provide a secure implementation of hibernation. It
is based on the presumption that simply storing a key in-kernel is
insufficient, since if Lockdown is merely in integrity (rather than
confidentiality) mode we assume that root is able to read kernel memory
and so would be able to obtain these secrets. It also aims to be
unspoofable - an attacker should not be able to write out a hibernation
image using cryptographic material under their control.

TPMs can be used to generate key material that is encrypted with a key
that does not leave the TPM. This means that we can generate an AES key,
encrypt the image hash with it, encrypt it with a TPM-backed key, and store
the encrypted key in the hibernation image. On resume, we pass the key
back to the TPM, receive the unencrypted AES key, and use that to
validate the image.

However, this is insufficient. Nothing prevents anyone else with access
to the TPM asking it to decrypt the key. We need to be able to
distinguish between accesses that come from the kernel directly and
accesses that come from userland.

TPMs have several Platform Configuration Registers (PCRs) which are used
for different purposes. PCRs are initialised to a known value, and
cannot be modified directly by the platform. Instead, the platform can
provide a hash of some data to the TPM. The TPM combines the existing
PCR value with the new hash, and stores the hash of this combination in
the PCR. Most PCRs can only be extended, which means that the only way
to achieve a specific value for a TPM is to perform the same series of
writes.

When secrets are encrypted by the TPM, they can be accompanied by a
policy that describes the state the TPM must be in in order for it to
decrypt them. If the TPM is not in this state, it will refuse to decrypt
the material even if it is otherwise capable of doing so. This allows
keys to be tied to specific system state - if the system is not in that
state, the TPM will not grant access.

PCR 23 is special in that it can be reset on demand. This patchset
re-purposes PCR 23 and blocks userland's ability to extend or reset it.
The kernel is then free to impose policy by resetting PCR 23 to a known
starting point, extending it with a known value, encrypting key material
with a policy that ties it to PCR 23, and then resetting it. Even if
userland has access to the encrypted blob, it cannot decrypt it since it
has no way to force PCR 23 to be in the same state.

So. During hibernation we freeze userland. We then reset PCR 23 and
extend it to a known value. We generate a key, use it and then encrypt
it with the TPM. When we encrypt it, we define a policy which states
that the TPM must have the same PCR 23 value as it presently does. We
also store the current PCR 23 value in the key metadata. On resume, we
again freeze userland, reset PCR 23 and extend it to the same value. We
decrypt the key, and verify from the metadata that it was created when
PCR 23 had the expected value. If so, we use it to decrypt the hash used
to verify the hibernation image and ensure that the image matches it. If
everything looks good, we resume. If not, we return to userland. Either
way, we reset PCR 23 before any userland code runs again.

This all works on my machine, but it's imperfect - there's a meaningful
performance hit on resume forced by reading all the blocks in-order, and
it probably makes more sense to do that after reads are complete instead
but I wanted to get the other components of this out for review first.
It's also not ideal from a security perspective until there's more
ecosystem integration - we need a kernel to be able to assert to a
signed bootloader that it implements this, since otherwise an attacker
can simply downgrade to a kernel that doesn't implement this policy and
gain access to PCR 23 themselves. There's ongoing work in the UEFI
bootloader space that would make this possible.