Re: [PATCH] SQUASHME: Fixes to e820 handling of pmem

From: Boaz Harrosh
Date: Thu Apr 02 2015 - 07:20:57 EST


On 04/02/2015 12:30 PM, Christoph Hellwig wrote:
> On Wed, Apr 01, 2015 at 05:25:22PM +0300, Boaz Harrosh wrote:
>> pfn = PFN_DOWN(ei->addr + ei->size);
>>
>> - switch (ei->type) {
>> - case E820_RAM:
>> - case E820_PRAM:
>> - case E820_RESERVED_KERN:
>> - break;
>> - default:
>> + if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
>> register_nosave_region(PFN_UP(ei->addr), pfn);
>> - }
>
> I guess this makes sense - if the content is persistent already we don't need
> to save it.
>

Yes

>> - if (e820.map[i].type != E820_RESERVED || res->start < (1ULL<<20)) {
>> - if (e820.map[i].type != E820_PRAM)
>> - res->flags |= IORESOURCE_BUSY;
>> + if (((e820.map[i].type != E820_RESERVED) &&
>> + (e820.map[i].type != E820_PRAM)) ||
>> + res->start < (1ULL<<20)) {
>
> So now we also trigger for PRAM regions under 1ULL<<20, was that the
> intentional change? Honestly I don't really understand this 1ULL<<20
> magic here even for the existing case. Guess this is magic from the
> old ISA PC days?
>

OK so by now I have had a chance to test all cases and figure out
what happened. I was running on your V1 and then V2. And had huge
problems with NUMA. The thing that actually fixed everything and
brought the system back to life, was already in your V3.

It was the remove of reserve_pmem() and the call to memblock_reserve()
and init_memory_mapping() from within. It was making a mess.

So your V3 was already running nice. But I already fixed everything
on my side by the time you sent V3, and what I sent here is the
diff from what I had and your V3.

But these are all good fixes still. (Though not fatal as V2 was)

3 small fixes here:
* Adding back the memmap=! now that everything works perfectly
as expected.

* The one above that fixes register_nosave_region

and ...
>> + res->flags |= IORESOURCE_BUSY;
>
> Guess this is the real change, and I'd love to understand why this
> makes a difference for you. IORESOURCE_BUSY is checked almost never,
> and is intented to mean it's a driver mapping.

This here is a very minor thing. I have lots of experience with this
code and with the internals of insert_resource() from my old e820.c
fixes (Which are BTW still relevant but no longer needed for any
current platform)

So what happens here is just the adding of the IORESOURCE_BUSY flag.
Actually all these resources are already in the resource list and this
code is just changing the flag. So if you are not changing the flag there
is just no point in calling insert_resource(). It will change nothing.

>From what I understand from the history of this file and from prints
that I put in this file and at the resource manager. The logic is that
it wants to make all E820_XXX busy so to not let any driver try and claim them.
And Also the code does not want to allow any kind of e820 resource below the 1M
be allowed for drivers, reserved or otherwise.

Any E820_RESERVED regions it allows for driver use by not setting IORESOURCE_BUSY.

Your code and mine are effectively the same only that I optimize out the call to
insert_resource() since the flags were not changed.

Testing status:
We had some birth problems with the new system and the APIs that changed so the
testing rig broke half through the night. But we have wrinkled out the last
problems and the machines are pumping away full steam, NUMA and everything.
So far it looks good. I will very soon now, Send you a tested-by: That
confirms these patches are just as good as what we had until now out-of-tree.
(I'm running with my page-struct patches on top of your pmem driver. Will publish
trees soon)

Thank you for your patience
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/