Re: [PATCH v3 1/3] arm64, vmcoreinfo : Append 'PTRS_PER_PGD' to vmcoreinfo

From: James Morse
Date: Fri Jun 07 2019 - 11:16:00 EST


Hi Bhupesh,

(sorry for the delay on this)

On 04/05/2019 13:53, Bhupesh Sharma wrote:
> On 04/03/2019 11:24 PM, Bhupesh Sharma wrote:
>> On 04/02/2019 10:56 PM, James Morse wrote:
>>> Yes the kernel code is going to move around, this is why the information we expose via
>>> vmcoreinfo needs to be thought through: something we would always need, regardless of how
>>> the kernel implements it.
>>>

>>> Pointer-auth changes all this again, as we may prefer to use the bits for pointer-auth in
>>> one TTB or the other. PTRS_PER_PGD may show the 52bit value in this case, but neither TTBR
>>> is mapping 52bits of VA.
>>>
>>>
>>>> So far, I have generally come across discussions where the following variations of the
>>>> address spaces have been proposed/requested:
>>>> - 48bit kernel VA + 48-bit User VA,
>>>> - 48-bit kernel VA + 52-bit User VA,
>>>
>>> + 52bit kernel, because there is excessive quantities of memory, and the kernel maps it
>>> all, but 48-bit user, because it never maps all the memory, and we prefer the bits for
>>> pointer-auth.
>>>
>>>> - 52-bit kernel VA + 52-bit User VA.
>>>
>>> And... all four may happen with the same built image. I don't see how you can tell these
>>> cases apart with the one (build-time-constant!) PTRS_PER_PGD value.
>>>
>>> I'm sure some of these cases are hypothetical, but by considering it all now, we can avoid
>>> three more kernel:vmcoreinfo updates, and three more fix-user-space-to-use-the-new-value.
>>
>> Agree.
>>
>>> I think you probably do need PTRS_PER_PGD, as this is the one value the mm is using to
>>> generate page tables. I'm pretty sure you also need T0SZ and T1SZ to know if that's
>>> actually in use, or the kernel is bodging round it with an offset.
>>
>> Sure, I am open to suggestions (as I realize that we need an additional VA_BITS_ACTUAL
>> variable export'ed for 52-bit kernel VA changes).

(stepping back a bit:)

I'm against exposing arch-specific #ifdefs that correspond to how we've configured the
arch code's interactions with mm. These are all moving targets, we can't have any of it
become ABI.

I have a straw-man for this: What is the value of PTE_FILE_MAX_BITS on your system?
I have no idea what this value is or means, an afternoon's archaeology would be needed(!).
This is something that made sense for one kernel version, a better idea came along, and it
was replaced. If we'd exposed this to user-space, we'd have to generate a value, even if
it doesn't mean anything. Exposing VA_BITS_ACTUAL is the same.

(Keep an eye out for when we change the kernel memory map, and any second-guessing based
on VA_BITS turns out to be wrong)


What we do have are the hardware properties. The kernel can't change these.


>> Also how do we standardize reading T0SZ and T1SZ in user-space. Do you propose I make an
>> enhancement in the cpu-feature-registers interface (see [1]) or the HWCAPS interface
>> (see [2]) for the same?

cpufeature won't help you if you've already panic()d and only have the vmcore file. This
stuff needs to go in vmcoreinfo.

As long as there is a description of how userspace uses these values, I think adding
key/values for TCR_EL1.TxSZ to the vmcoreinfo is a sensible way out of this. You probably
need TTBR1_EL1.BADDR too. (it should be specific fields, to prevent 'new uses' becoming ABI)

This tells you how the hardware was configured, and covers any combination of TxSZ tricks
we play, and whether those address bits are used for VA, or ptrauth for TTBR0 or TTRB1.


> Any comments on the above points? At the moment we have to carry these fixes in the
> distribution kernels and I would like to have these fixed in upstream kernel itself.


Thanks,

James