Re: [PATCH v13 4/5] mm: support reporting free page blocks

From: Michal Hocko
Date: Thu Aug 03 2017 - 06:44:26 EST


On Thu 03-08-17 18:42:15, Wei Wang wrote:
> On 08/03/2017 05:11 PM, Michal Hocko wrote:
> >On Thu 03-08-17 14:38:18, Wei Wang wrote:
[...]
> >>+static int report_free_page_block(struct zone *zone, unsigned int order,
> >>+ unsigned int migratetype, struct page **page)
> >This is just too ugly and wrong actually. Never provide struct page
> >pointers outside of the zone->lock. What I've had in mind was to simply
> >walk free lists of the suitable order and call the callback for each one.
> >Something as simple as
> >
> > for (i = 0; i < MAX_NR_ZONES; i++) {
> > struct zone *zone = &pgdat->node_zones[i];
> >
> > if (!populated_zone(zone))
> > continue;
> > spin_lock_irqsave(&zone->lock, flags);
> > for (order = min_order; order < MAX_ORDER; ++order) {
> > struct free_area *free_area = &zone->free_area[order];
> > enum migratetype mt;
> > struct page *page;
> >
> > if (!free_area->nr_pages)
> > continue;
> >
> > for_each_migratetype_order(order, mt) {
> > list_for_each_entry(page,
> > &free_area->free_list[mt], lru) {
> >
> > pfn = page_to_pfn(page);
> > visit(opaque2, prn, 1<<order);
> > }
> > }
> > }
> >
> > spin_unlock_irqrestore(&zone->lock, flags);
> > }
> >
> >[...]
>
>
> I think the above would take the lock for too long time. That's why we
> prefer to take one free page block each time, and taking it one by one
> also doesn't make a difference, in terms of the performance that we
> need.

I think you should start with simple approach and impove incrementally
if this turns out to be not optimal. I really detest taking struct pages
outside of the lock. You never know what might happen after the lock is
dropped. E.g. can you race with the memory hotremove?

> The struct page is used as a "state" to get the next free page block. It is
> only
> given for an internal implementation of a function in mm ( not seen by the
> outside caller). Would this be OK?
> If not, how about pfn - we can also pass in pfn to the function, and do
> pfn_to_page each time the function starts, and then do page_to_pfn when
> returns.

No, just do not try to play tricks with struct pages which might have
gone away.
--
Michal Hocko
SUSE Labs