Re: [PATCH v1 25/29] cxl/amd: Enable Zen5 address translation using ACPI PRMT
From: Gregory Price
Date: Wed Jan 15 2025 - 12:05:43 EST
On Wed, Jan 15, 2025 at 04:05:16PM +0100, Robert Richter wrote:
> >
> > Should be dpa as argument? Confusing to convert an hpa to an hpa.
>
> We need to handle the decoder address ranges, the argument is always
> the HPA range the decoder belongs to.
I see, and this is where my confusion stems from. Basically these
addresses are consider "HPA" because they are programmed to decoders,
and decoder addresses are "always HPA".
i.e. 2 interleaved devices (endpoint decoders) with normalized addresses:
dev0: base(0x0) len(0x200000000)
dev1: base(0x0) len(0x200000000)
These are HPAs because decoders are programmed with HPAs.
It's just that in this (specific) case HPA=DPA, while root decoders and
host bridge decoders will always have HPA=SPA. We're just translating
up the stack from HPA range to HPA range.
I've been dealing with virtualization for a long time and this has been
painful for me to follow - but I think I'm getting there.
> > DPA(0)
> > dev0: base(0xc050000000) spa(0xc050000000)
> > dev1: base(0xc050000000) spa(0xc050000000)
> >
> > DPA(0x1fffffffff)
> > dev0: base(0xc050000000) spa(0xe04fffffff)
> > dev1: base(0xc050000000) spa(0xe04fffffff)
> >
> > The bases seems correct, the SPAs looks suspect.
>
> SPA range length must be 0x4000000000 (2x 128G). That is, upper SPA
> must be 0x10050000000 (0xc050000000 + 0x4000000000 - 1). This one is
> too short.
>
> The decoder range lengths below look correct (0x2000000000), the
> interleaving configuration should be checked for the decoders.
>
If i understand correctly, this configuration may be suspect
[decoder0.0]# cat start size interleave_ways interleave_granularity
0xc050000000
0x4000000000
2 <----- root decoder reports interleave ways = 2
256
[decoder1.0]# cat start size interleave_ways interleave_granularity
0xc050000000
0x4000000000
1 <----- host bridge decoder reports interleave ways = 1
256
[decoder3.0]# cat start size interleave_ways interleave_granularity
0xc050000000
0x4000000000
1 <----- host bridge decoder reports interleave ways = 1
256
> > I do not understand this chunk here, we seem to just be chopping the HPA
> > in half to acquire the DPA. But the value passed in is already a DPA.
> >
> > dpa = (0x1fffffffff & ~(256 * 2 - 1)) / 2 + (0x1fffffffff & (256 - 1))
> > = 0xfffffffff
>
> HPA is:
>
> HPA = 2 * 0x2000000000 - 1 = 0x3fffffffff
>
... snip ...
> There is probably a broken interleaving config causing half the size
> of total device mem.
>
In my case, I never see 0x3fffffffff passed in. The value 0x1fffffffff
from the endpoint decoders is always passed in. This suggests the host
bridge interleave ways should be 2.
I can force this and figure out why its reporting 1 and get back to you.
> > dev0 (dpa -> hpa -> spa): 0x0 -> 0x0 -> 0xc050000000
> > dev1 (dpa -> hpa -> spa): 0x0 -> 0x100 -> 0xc050000100
> > dev0 (dpa -> hpa -> spa): 0x1fffffffff -> 0x3ffffffeff -> 0x1004ffffeff
> > dev1 (dpa -> hpa -> spa): 0x1fffffffff -> 0x3fffffffff -> 0x1004fffffff
>
> Yes, would be the result without the offset applied for spa2 above.
> The check above calculates the *total* length of hpa and spa with out
> considering the interleaving position. This is corrected using the
> offset. There is no call prm_cxl_dpa_spa(dev0, 0x1fffffffff) that
> returns 0x1004fffffff, but we want to check the upper boundery of the
> SPA range.
>
This makes sense now, there's no dpa->spa direct translation because you
may have to go through multiple layers of translation to get there - so
the best you can do is calculate the highest possible endpoint and say
"Yeah this range is in there somewhere".
Thank you for taking the time to walk me through this, I'm sorry I've
been confused on DPA/HPA/SPA for so long - it's been a bit of a
struggle.
~Gregory