Re: [RFC PATCH] xen/x86: Increase xen_e820_map to E820_X_MAX possible entries
From: Juergen Gross
Date: Tue Nov 15 2016 - 04:56:00 EST
On 15/11/16 10:45, Jan Beulich wrote:
>>>> On 15.11.16 at 09:42, <JGross@xxxxxxxx> wrote:
>> On 15/11/16 09:01, Jan Beulich wrote:
>>>>>> On 15.11.16 at 08:36, <JGross@xxxxxxxx> wrote:
>>>> On 15/11/16 08:15, Jan Beulich wrote:
>>>>>>>> On 15.11.16 at 07:33, <JGross@xxxxxxxx> wrote:
>>>>>> In case I'm right the Xen hypervisor should be prepared for a larger
>>>>>> e820 map, but this won't help alone as there would still be additional
>>>>>> entries for the IOAPICs created.
>>>>>>
>>>>>> So I think we need something like:
>>>>>>
>>>>>> #define E820_XEN_MAX (E820_X_MAX + MAX_IO_APICS)
>>>>>>
>>>>>> and use this for sizing xen_e820_map[].
>>>>>
>>>>> I would say that if any change gets done here, there shouldn't be
>>>>> any static upper limit at all. That could even be viewed as in line
>>>>> with recent e820.c changes moving to dynamic allocations. In
>>>>> particular I don't see why MAX_IO_APICS would need adding in
>>>>> here, but not other (current and future) factors determining the
>>>>> (pseudo) E820 map Xen presents to Dom0.
>>>>
>>>> The hypervisor interface of XENMEM_machine_memory_map requires to
>>>> specify the size of the e820 map where the hypervisor can supply
>>>> entries. The alternative would be try and error: start with a small
>>>> e820 map and in case of error increasing the size until success. I
>>>> don't like this approach. Especially as dynamic memory allocations
>>>> are not possible at this stage (using RESERVE_BRK() isn't any better
>>>> than a static __initdata array IMO).
>>>
>>> Well, I think as a first step we should extend
>>> XENMEM_{,machine_}memory_map to the model used elsewhere
>>> where passing in a NULL handle (and perhaps count being zero) is
>>> a request for the number of needed entries. Granted this doesn't
>>> help with Linux'es way of consuming the output, but it at least
>>> allows for dynamic sizing. And Linux would then be free to prepare
>>> a static array or have a RESERVE_BRK() as it sees fit.
>>
>> This still needs the definition of an upper limit, as RESERVE_BRK()
>> is a compile time feature.
>
> That's why I did say "as it sees fit".
>
>> For a fully dynamical solution we'd need a way to get a partial
>> E820 map from the hypervisor (e.g. first 128 entries) in order to
>> be able to setup at least some memory and later get the rest of
>> the memory map using some dynamically allocated memory.
>
> And we could of course also make the hypercall allow for that (e.g.
> by defining the semantics of a specific error code, so far not used
> by it, to avoid mis-interpretation of output on older hypervisors),
> or introduce a new clone of the existing one(s).
I'd go with the new error code. What about E2BIG or ENOSPC?
I think the hypervisor should fill in the number of entries required
in this case.
In case nobody objects I can post patches for this purpose (both Xen
and Linux).
Juergen