Re: [RFC][Patch v11 1/2] mm: page_hinting: core infrastructure

From: David Hildenbrand
Date: Mon Jul 15 2019 - 05:26:30 EST


On 10.07.19 22:45, Dave Hansen wrote:
> On 7/10/19 12:51 PM, Nitesh Narayan Lal wrote:
>> +struct zone_free_area {
>> + unsigned long *bitmap;
>> + unsigned long base_pfn;
>> + unsigned long end_pfn;
>> + atomic_t free_pages;
>> + unsigned long nbits;
>> +} free_area[MAX_NR_ZONES];
>
> Why do we need an extra data structure. What's wrong with putting
> per-zone data in ... 'struct zone'? The cover letter claims that it
> doesn't touch core-mm infrastructure, but if it depends on mechanisms
> like this, I think that's a very bad thing.
>
> To be honest, I'm not sure this series is worth reviewing at this point.
> It's horribly lightly commented and full of kernel antipatterns lik
>
> void func()
> {
> if () {
> ... indent entire logic
> ... of function
> }
> }

"full of". Hmm.

>
> It has big "TODO"s. It's virtually comment-free. I'm shocked it's at
> the 11th version and still looking like this.
>
>> +
>> + for (zone_idx = 0; zone_idx < MAX_NR_ZONES; zone_idx++) {
>> + unsigned long pages = free_area[zone_idx].end_pfn -
>> + free_area[zone_idx].base_pfn;
>> + bitmap_size = (pages >> PAGE_HINTING_MIN_ORDER) + 1;
>> + if (!bitmap_size)
>> + continue;
>> + free_area[zone_idx].bitmap = bitmap_zalloc(bitmap_size,
>> + GFP_KERNEL);
>
> This doesn't support sparse zones. We can have zones with massive
> spanned page sizes, but very few present pages. On those zones, this
> will exhaust memory for no good reason.

Yes, AFAIKS, sparse zones are problematic when we have NORMAL/MOVABLE mixed.

1 bit for 2MB, 1 byte for 16MB, 64 bytes for 1GB

IOW, this isn't optimal but only really problematic for big systems /
very huge sparse zones.

>
> Comparing this to Alex's patch set, it's of much lower quality and at a
> much earlier stage of development. The two sets are not really even
> comparable right now. This certainly doesn't sell me on (or even really

To be honest, I find this statement quite harsh. Nitesh's hard work in
the previous RFC's and many discussions with Alex essentially resulted
in the two approaches we have right now. Alex's approach would not look
the way it looks today without Nitesh's RFCs.

So much to that.

> enumerate the deltas in) this approach vs. Alex's.

I am aware that memory hotplug is not properly supported yet (future
work). Sparse zones work but eventually waste a handful of pages (!) -
future work. Anything else you are aware of that is missing?

My opinion:

1. Alex' solution is clearly beneficial, as we don't need to manage/scan
a bitmap. *however* we were concerned right from the beginning if
core-buddy modifications will be accepted upstream for a purely
virtualization-specific (as of now!) feature. If we can get it upstream,
perfect. Back when we discussed the idea with Alex I was skeptical - I
was expecting way more core modifications.

2. We were looking for an alternative solution that doesn't require to
modify the buddy. We have that now - yes, some things have to be worked
out and cleaned up, not arguing against that. A cleaned-up version of
this RFC with some fixes and enhancements should be ready to be used in
*many* (not all) setups. Which is perfectly fine.

So in summary, I think we should try our best to get Alex's series into
shape and accepted upstream. However, if we get upstream resistance or
it will take ages to get it in, I think we can start with this series
here (which requires no major buddy modifications as of now) and the
slowly see if we can convert it into Alex approach.

The important part for me is that the core<->driver interface and the
virtio interface is in a clean shape, so we can essentially swap out the
implementation specific parts in the core.

Cheers.

--

Thanks,

David / dhildenb