Re: [PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory
From: Will Deacon
Date: Thu Oct 12 2023 - 12:39:42 EST
On Thu, Oct 12, 2023 at 12:44:39PM -0300, Jason Gunthorpe wrote:
> On Thu, Oct 12, 2023 at 03:48:08PM +0100, Will Deacon wrote:
>
> > I guess my wider point is that I'm not convinced that non-cacheable is
> > actually much better and I think we're going way off the deep end looking
> > at what particular implementations do and trying to justify to ourselves
> > that non-cacheable is safe, even though it's still a normal memory type
> > at the end of the day.
>
> When we went over this with ARM it became fairly clear there wasn't an
> official statement that Device-* is safe from uncontained
> failures. For instance, looking at the actual IP, our architects
> pointed out that ARM IP already provides ways for Device-* to trigger
> uncontained failures today.
>
> We then mutually concluded that KVM safe implementations must already
> be preventing uncontained failures for Device-* at the system level
> and that same prevention will carry over to NormalNC as well.
>
> IMHO, this seems to be a gap where ARM has not fully defined when
> uncontained failures are allowed and left that as an implementation
> choice.
>
> In other words, KVM safety around uncontained failure is not a
> property that can be reasoned about from the ARM architecture alone.
>
> > The current wording talks about use-cases (I get this) and error containment
> > (it's a property of the system) but doesn't talk at all about why Normal-NC
> > is the right result.
>
> Given that Device-* and NormalNC are equally implementation defined
> with regards to uncontained failures, NormalNC allows more VM
> functionality.
>
> Further, we have a broad agreement that this use case is important,
> and that NormalNC is the correct way to adress it.
>
> I think you are right to ask for more formality from ARM team but also
> we shouldn't hold up fixing real functional bugs in real shipping
> server ARM products.
All I'm asking for is justification as to why Normal-NC is the right
memory type rather than any other normal memory type. If it's not possible
to explain that architecturally, then I'm not sure this change belongs in
architecture code.
Ultimately, we need to be able to maintain this stuff, so we can't just
blindly implement changes based on a combination of off-list discussions
and individual product needs. For example, if somebody else rocks up
tomorrow and asks for this to be Normal-writethrough, what grounds do
we have to say no if we've taken this change already?
So please let's get to a point where we can actually reason about this.
Will