Re: [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers

From: Dave Hansen
Date: Mon Oct 24 2022 - 13:09:49 EST


On 10/22/22 09:12, Zhang Rui wrote:
>>> I'm not sure if we have a perfect solution here.
>> Are the implementations fixable?
> currently, I don't have any idea.
>
>> Or, at least tolerable?

That would be great to figure out before we start throwing more patches
around.

>> For instance, I can live with the implementation being a bit goofy
>> when
>> kernel commandlines are in play. We can pr_info() about those cases.
> My understanding is that the cpus in the last package may still have
> small cpu id value. This means that the 'logical_packages' is hard to
> break unless we boot with very small CPU count and happened to disable
> all cpus in one/more packages. Feng is experiencing with this and may
> have some update later.
>
> If this is the case, is this a valid case that we need to take care of?

Well, let's talk through it a bit.

What is the triggering event and what's the fallout?

Is the user on a truly TSC stable system or not?

What kind of maxcpus= argument do they need to specify? Is it something
that's likely to get used in production or is it most likely just for
debugging?

What is the maxcpus= fallout? Does it over estimate or under estimate
the number of logical packages?

How many cases outside of maxcpus= do we know of that lead to an
imprecise "logical packages" calculation?

Does this lead to the TSC being mistakenly marked stable when it is not,
or *not* being marked stable when it is?

Let's get all of that info in one place and make sure we are all agreed
on the *problem* before we got to the solution space.