Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries
From: Jan Beulich
Date: Tue Nov 15 2016 - 04:45:13 EST
>>> On 15.11.16 at 09:42, <JGross@xxxxxxxx> wrote:
> On 15/11/16 09:01, Jan Beulich wrote:
>>>>> On 15.11.16 at 08:36, <JGross@xxxxxxxx> wrote:
>>> On 15/11/16 08:15, Jan Beulich wrote:
>>>>>>> On 15.11.16 at 07:33, <JGross@xxxxxxxx> wrote:
>>>>> In case I'm right the Xen hypervisor should be prepared for a larger
>>>>> e820 map, but this won't help alone as there would still be additional
>>>>> entries for the IOAPICs created.
>>>>>
>>>>> So I think we need something like:
>>>>>
>>>>> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
>>>>>
>>>>> and use this for sizing xen_e820_map[].
>>>>
>>>> I would say that if any change gets done here, there shouldn't be
>>>> any static upper limit at all. That could even be viewed as in line
>>>> with recent e820.c changes moving to dynamic allocations. In
>>>> particular I don't see why MAX_IO_APICS would need adding in
>>>> here, but not other (current and future) factors determining the
>>>> (pseudo) E820 map Xen presents to Dom0.
>>>
>>> The hypervisor interface of XENMEM_machine_memory_map requires to
>>> specify the size of the e820 map where the hypervisor can supply
>>> entries. The alternative would be try and error: start with a small
>>> e820 map and in case of error increasing the size until success. I
>>> don't like this approach. Especially as dynamic memory allocations
>>> are not possible at this stage (using RESERVE_BRK() isn't any better
>>> than a static __initdata array IMO).
>>
>> Well, I think as a first step we should extend
>> XENMEM_{,machine_}memory_map to the model used elsewhere
>> where passing in a NULL handle (and perhaps count being zero) is
>> a request for the number of needed entries. Granted this doesn't
>> help with Linux'es way of consuming the output, but it at least
>> allows for dynamic sizing. And Linux would then be free to prepare
>> a static array or have a RESERVE_BRK() as it sees fit.
>
> This still needs the definition of an upper limit, as RESERVE_BRK()
> is a compile time feature.
That's why I did say "as it sees fit".
> For a fully dynamical solution we'd need a way to get a partial
> E820 map from the hypervisor (e.g. first 128 entries) in order to
> be able to setup at least some memory and later get the rest of
> the memory map using some dynamically allocated memory.
And we could of course also make the hypercall allow for that (e.g.
by defining the semantics of a specific error code, so far not used
by it, to avoid mis-interpretation of output on older hypervisors),
or introduce a new clone of the existing one(s).
Jan