Re: [REGRESSION] Unable to unlock encrypted disk starting with kernel 5.19-rc1+

From: Dave Hansen
Date: Tue Jun 28 2022 - 12:55:40 EST

Next message: Joel Fernandes: "Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value"
Previous message: Namhyung Kim: "Re: [PATCH 1/6] perf offcpu: Fix a build failure on old kernels"
In reply to: Borislav Petkov: "Re: [REGRESSION] Unable to unlock encrypted disk starting with kernel 5.19-rc1+"
Next in thread: Alexandre Messier: "Re: [REGRESSION] Unable to unlock encrypted disk starting with kernel 5.19-rc1+"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

First of all, thank you for bisecting this! I know those are a lot of work.

That XSAVEC patch modifies the AVX register save/restore code. There is
a set of x86 AES acceleration instructions called AES-NI. Those
instructions use the AVX registers. So, it's at least a plausible
connection between that patch and your symptoms. But, I don't think
anyone's been able to reproduce what you're seeing yet.

The kernel XSAVE buffer formats also differ slightly between AMD and
Intel. That *should* be OK, but it might explain why I can't reproduce
this.

If you get a chance, could you apply this (ugly hackish) patch to the
userspace 'cryptsetup' utility and run it?

https://sr71.net/~dave/intel/cryptsetup-memcmp.patch

On Ubuntu at least, it was as simple as:

apt-get source cryptsetup
apt-get build-dep cryptsetup
cd cryptsetup-1.6.6
./configure
make

Then I could run:

./src/cryptsetup benchmark --cipher=aes-xts --key-size=512
and
./src/cryptsetup benchmark --cipher=aes-xts --key-size=256

With that patch applied, you should see some output like:

# ./src/cryptsetup benchmark --cipher=aes-xts --key-size=512
# Tests are approximate using memory only (no storage IO).
memcmp12: 0
memcmp23: 0
memcmp13: 0
memcmp12: -173
memcmp23: 173
memcmp13: 0
# Algorithm | Key | Encryption | Decryption
aes-xts 512b 4592.2 MiB/s 4192.0 MiB/s

The "memcmp13:" lines should both be 0. That means that an encryption
and decryption cycle didn't change the data. You *might* have to run
this in a loop if there's some kind of bad timing involved in triggering
the bug.

If you see a "memcmp13:" with something other than 0, that will narrow
things down and means we'll have a pretty quick reproducer that doesn't
involve luks which should speed things along.

Next message: Joel Fernandes: "Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value"
Previous message: Namhyung Kim: "Re: [PATCH 1/6] perf offcpu: Fix a build failure on old kernels"
In reply to: Borislav Petkov: "Re: [REGRESSION] Unable to unlock encrypted disk starting with kernel 5.19-rc1+"
Next in thread: Alexandre Messier: "Re: [REGRESSION] Unable to unlock encrypted disk starting with kernel 5.19-rc1+"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]