Re: [PATCH] block: partitions: efi: Always check for alternative GPT at end of drive

From: Austin S. Hemmelgarn
Date: Wed Apr 27 2016 - 09:00:24 EST


On 2016-04-27 02:00, Ard Biesheuvel wrote:
On 26 April 2016 at 22:34, Elliott, Robert (Persistent Memory)
<elliott@xxxxxxx> wrote:


-----Original Message-----
From: linux-kernel-owner@xxxxxxxxxxxxxxx [mailto:linux-kernel-
owner@xxxxxxxxxxxxxxx] On Behalf Of Davidlohr Bueso
Sent: Tuesday, April 26, 2016 1:34 PM
To: Karel Zak <kzak@xxxxxxxxxx>
Cc: Julius Werner <jwerner@xxxxxxxxxxxx>; linux-efi@xxxxxxxxxxxxxxx;
linux-kernel@xxxxxxxxxxxxxxx; linux-block@xxxxxxxxxxxxxxx; Gwendal
Grignou <gwendal@xxxxxxxxxxxx>; Doug Anderson <dianders@xxxxxxxxxxxx>
Subject: Re: [PATCH] block: partitions: efi: Always check for
alternative GPT at end of drive

On Tue, 26 Apr 2016, Karel Zak wrote:

On Mon, Apr 25, 2016 at 06:06:46PM -0700, Julius Werner wrote:
The GUID Partiton Table layout maintains two synonymous partition
tables on a block device, one starting in sector 1 and one in the
very last sectors of the block device. This is useful if one of
the tables gets
accidentally corrupted (e.g. through a partial write because of an
unexpected power loss).

Linux normally only boots if the primary GPT is valid. It will not
even try to find the alternative GPT to an invalid primary one
unless the "gpt" command line option forces more aggressive
detection. This doesn't
really make any sense... if the "gpt" option is not set, the code
validates the protective or hybrid MBR in sector 0 anyway before
it even starts looking for the actual GPTs. If we get to the point
where a valid proctective or hybrid MBR was found but the primary
GPT was not found (valid), checking the alternative GPT is our
best bet: we know that this

'best bet' in a kernel is not enough :) Which is why userland tools
can fix and/or do any sort of crazy stuff with the backup and recover
the primary etc etc.

Drive blocks go bad; the redundant GPTs are there to let the
system keep booting and running if that happens.

Rewriting the bad GPTs is what should require user intervention.


block device is meant to use GPT (because any other partitioning
system
would've presumably overwritten sector 0), and we know that if the
alternative GPT is valid it should contain more accurate
information
than parsing the protective/hybrid MBR with msdos_partition()
would
yield (which would otherwise be what happens next).

I guess "force_gpt" (and "gpt" on kernel command line) exists to
force users to think and care about a reason why the device has
unreadable (broken) primary GPT header.

Yes, from find_valid_gpt():

* If the Primary GPT header is not valid, the Alternate GPT header
* is not checked unless the 'gpt' kernel command line option is
passed.
* This protects against devices which misreport their size, and
forces
* the user to decide to use the Alternate GPT.

... so users are at least forced in some way to think about this.

It seems like bad (and dangerous) idea to silently ignore corrupted
primary GTP header and boot from such device.

Yeah, there's no way in hell I trust a backup gpt in kernel space.
We simply have no way of distinguishing between good and bad devices.

And note that alternative GPT header and the end of the device is a
just guess. The proper location of the alternative header is
specified with-in primary header (pgpt->alternate_lba). The header
at the end of
the device (as used for "force_gpt") is a fallback solution only.

And this only illustrates the ambiguity of the backup.

The UEFI specification is not ambiguous - you should always look
for the backup GPT Header at the last LBA:

"Two GPT Header structures are stored on the device: the primary
and the backup. The primary GPT Header must be located in LBA 1
(i.e., the second logical block), and the backup GPT Header must
be located in the last LBA of the device."

If the primary GPT Header is corrupted (e.g., CRC is bad), you
cannot trust any fields in it, including the Alternate LBA field.
The Alternate LBA field is there to help you tolerate failures
while growing or shrinking the block device size (not important
for individual physical drives, but an issue for logical drives
presented by RAID controllers).


What the UEFI spec stipulates is not really relevant for the kernel.
So the firmware must use the backup GPT if the CRC of the primary one
indicates that it is corrupted, fine. Once we are in the kernel, the
policy is currently different, which makes sense since we are not only
mounting the boot device, but other block devices as well.

No, it is relevant considering that it's the authoritative standard for the GPT format. Sure, we have to deal with other block devices. The fact is though, we currently refuse to do anything with a disk that has a corrupted primary GPT, but a valid secondary. I agree that the user needs to be notified somehow that something is wrong, but refusing to work is not a user friendly behavior, and doesn't really give much specific information about what's wrong (keep in mind, most typical desktop users won't look at kernel logs, and a lot of people using embedded devices can't).

For what it's worth, Windows 7 and newer will properly read partitions on a disk with a corrupt primary GPT and a valid secondary, and I'd be willing to bet that OS X does so as well.