On Wed, Oct 20, 2021 at 10:25:42PM +0200, Peter Zijlstra wrote:
On Wed, Oct 20, 2021 at 03:08:41PM -0500, Tom Lendacky wrote:
On 10/20/21 2:51 PM, Peter Zijlstra wrote:
On Wed, Oct 20, 2021 at 08:12:51AM -0500, Tom Lendacky wrote:
On 10/15/21 4:44 AM, tip-bot2 for Tim Chen wrote:
The following commit has been merged into the sched/core branch of tip:
If it does boot, what does something like:
for i in /sys/devices/system/cpu/cpu*/topology/*{_id,_list}; do echo -n "${i}: " ; cat $i; done
produce?
The output is about 160K in size, I'll email it to you off-list.
/sys/devices/system/cpu/cpu0/topology/cluster_cpus_list: 0
/sys/devices/system/cpu/cpu0/topology/core_cpus_list: 0,128
/sys/devices/system/cpu/cpu128/topology/cluster_cpus_list: 128
/sys/devices/system/cpu/cpu128/topology/core_cpus_list: 0,128
So for some reason that thing thinks each SMT thread has it's own L2,
which seems rather unlikely. Or SMT has started to mean something
radically different than it used to be :-)
Let me continue trying to make sense of cacheinfo.c
OK, I think I see what's happening.
AFAICT cacheinfo.c does *NOT* set l2c_id on AMD/Hygon hardware, this
means it's set to BAD_APICID.
This then results in match_l2c() to never match. And as a direct
consequence set_cpu_sibling_map() will generate cpu_l2c_shared_mask with
just the one CPU set.
And we have the above result and things come unstuck if we assume:
SMT <= L2 <= LLC
Now, the big question, how to fix this... Does AMD have means of
actually setting l2c_id or should we fall back to using match_smt() for
l2c_id == BAD_APICID ?