Re: [RFC PATCH v0 0/6] x86/AMD: Userspace address tagging

From: Andy Lutomirski
Date: Fri Apr 01 2022 - 15:41:38 EST




On Wed, Mar 23, 2022, at 12:48 AM, Bharata B Rao wrote:
> On 3/22/2022 3:59 AM, Andy Lutomirski wrote:
>> On Thu, Mar 10, 2022, at 3:15 AM, Bharata B Rao wrote:
>>> Hi,
>>>
>>> This patchset makes use of Upper Address Ignore (UAI) feature available
>>> on upcoming AMD processors to provide user address tagging support for x86/AMD.
>>>
>>> UAI allows software to store a tag in the upper 7 bits of a logical
>>> address [63:57]. When enabled, the processor will suppress the
>>> traditional canonical address checks on the addresses. More information
>>> about UAI can be found in section 5.10 of 'AMD64 Architecture
>>> Programmer's Manual, Vol 2: System Programming' which is available from
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=300549
>>
>> I hate to be a pain, but I'm really not convinced that this feature is suitable for Linux. There are a few reasons:
>>
>> Right now, the concept that the high bit of an address determines whether it's a user or a kernel address is fairly fundamental to the x86_64 (and x86_32!) code. It may not be strictly necessary to preserve this, but violating it would require substantial thought. With UAI enabled, kernel and user addresses are, functionally, interleaved. This makes things like access_ok checks, and more generally anything that operates on a range of addresses, behave potentially quite differently. A lot of auditing of existing code would be needed to make it safe.
>
> Ok got that. However can you point to me a few instances in the current
> kernel code where such assumption of high bit being user/kernel address
> differentiator exists so that I get some idea of what it takes to
> audit all such cases?

Anything that thinks that an address >= some value means kernel.

>
> Also wouldn't the problem of high bit be solved by using only the
> 6 out of 7 available bits in UAI and leaving the 63rd bit alone?
> The hardware will still ignore the top bit, but this should take
> care of the requirement of high bit being 0/1 for user/kernel in the
> x86_64 kernel. Wouldn't that work?

Maybe, but that seems quite ugly. This will make the userspace and kernel semantics diverge.

>
>>
>> UAI looks like it wasn't intended to be context switched and, indeed, your series doesn't context switch it. As far as I'm concerned, this is an error, and if we support UAI at all, we should context switch it. Yes, this will be slow, perhaps painfully slow. AMD knows how to fix it by, for example, reading the Intel SDM. By *not* context switching UAI, we force it on for all user code, including unsuspecting user code, as well as for kernel code. Do we actually want it on for kernel code? With LAM, in contrast, the semantics for kernel pointers vs user pointers actually make sense and can be set per mm, which will make things like io_uring (in theory) do the right thing.
>
> I plan to enable/disable UAI based on the next task's settings by
> doing MSR write to EFER during context switch. I will have to measure
> how much additional cost an MSR write in context switch path brings in.
> However given that without a hardware feature like ARM64 MTE, this would
> primarily be used in non-production environments. Hence I wonder if MSR
> write cost could be tolerated?

I'm not sure what you mean by a feature like ARM64 MTE.

>
> Regarding enabling UAI for kernel, I will have to check how clean and
> efficient it would be to disable/enable UAI on user/kernel entry/exit
> points.
>
>>
>> UAI and LAM are incompatible from a userspace perspective. Since LAM is pretty clearly superior [0], it seems like a better long term outcome would be for programs that want tag bits to target LAM and for AMD to support LAM if there is demand. For that matter, do we actually expect any userspace to want to support UAI? (Are there existing too-clever sandboxes that would be broken by enabling UAI?)
>>
>> Given that UAI is not efficiently context switched, the implementation of uaccess is rather bizarre. With the implementation in this series in particular, if the access_ok checks ever get out of sync with actual user access, a user access could be emitted with the high bits not masked despite the range check succeeding due to masking, which would, unless great care is taken, result in a "user" access hitting the kernel range. That's no good.
>
> Okay, I guess if context switching and sticking to 6 bits as mentioned
> earlier is feasible, this concern too goes away unless I am missing something.

I think it does go away.