Re: [Xen-devel] [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit

From: Boris Ostrovsky
Date: Thu Apr 20 2017 - 12:02:59 EST


On 04/20/2017 11:40 AM, Vitaly Kuznetsov wrote:
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
>
>> On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
>>> In this patch I suggest we set __max_logical_packages based on the
>>> max_physical_pkg_id and total_cpus,
>> So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
>> instead of 4.
>>
>> This wastes quite a bit of memory for the per-node arrays. Luckily most
>> are just pointer arrays, but still, wasting 140*8 bytes for each of
>> them.
>>
>>> this should be safe and cover all
>>> possible cases. Alternatively, we may think about eliminating the concept
>>> of __max_logical_packages completely and relying on max_physical_pkg_id/
>>> total_cpus where we currently use topology_max_packages().
>>>
>>> The issue could've been solved in Xen too I guess. CPUID returning
>>> x86_max_cores can be tweaked to be the lowerest(?) possible number of
>>> all logical packages of the guest.
>> This is getting ludicrous. Xen is plain broken, and instead of fixing
>> it, you propose to somehow deal with its obviously crack induced
>> behaviour :-(
> Totally agree and I don't like the solution I propose (and that's why
> this is RFC)... The problem is that there are such Xen setups in the
> wild and with the recent changes some guests will BUG() :-(
>
> Alternatively, we can just remove the BUG() and do something with CPUs
> which have their pkg >= __max_logical_packages, e.g. assign them to the
> last package. Far from ideal but will help to avoid the regression.

Do you observe this failure on PV or HVM guest?

We've had a number of issues with topology discovery for PV guests but
AFAIK they have been addressed (so far). I wonder though whether it
would make sense to have some sort of a callback (or an smp_ops.op) to
override native topology info, if needed.

-boris