Re: [PATCH v13 4/5] mm: support reporting free page blocks

From: Michal Hocko
Date: Thu Aug 03 2017 - 08:41:14 EST


On Thu 03-08-17 20:11:58, Wei Wang wrote:
> On 08/03/2017 07:28 PM, Michal Hocko wrote:
> >On Thu 03-08-17 19:27:19, Wei Wang wrote:
> >>On 08/03/2017 06:44 PM, Michal Hocko wrote:
> >>>On Thu 03-08-17 18:42:15, Wei Wang wrote:
> >>>>On 08/03/2017 05:11 PM, Michal Hocko wrote:
> >>>>>On Thu 03-08-17 14:38:18, Wei Wang wrote:
> >>>[...]
> >>>>>>+static int report_free_page_block(struct zone *zone, unsigned int order,
> >>>>>>+ unsigned int migratetype, struct page **page)
> >>>>>This is just too ugly and wrong actually. Never provide struct page
> >>>>>pointers outside of the zone->lock. What I've had in mind was to simply
> >>>>>walk free lists of the suitable order and call the callback for each one.
> >>>>>Something as simple as
> >>>>>
> >>>>> for (i = 0; i < MAX_NR_ZONES; i++) {
> >>>>> struct zone *zone = &pgdat->node_zones[i];
> >>>>>
> >>>>> if (!populated_zone(zone))
> >>>>> continue;
> >>>>> spin_lock_irqsave(&zone->lock, flags);
> >>>>> for (order = min_order; order < MAX_ORDER; ++order) {
> >>>>> struct free_area *free_area = &zone->free_area[order];
> >>>>> enum migratetype mt;
> >>>>> struct page *page;
> >>>>>
> >>>>> if (!free_area->nr_pages)
> >>>>> continue;
> >>>>>
> >>>>> for_each_migratetype_order(order, mt) {
> >>>>> list_for_each_entry(page,
> >>>>> &free_area->free_list[mt], lru) {
> >>>>>
> >>>>> pfn = page_to_pfn(page);
> >>>>> visit(opaque2, prn, 1<<order);
> >>>>> }
> >>>>> }
> >>>>> }
> >>>>>
> >>>>> spin_unlock_irqrestore(&zone->lock, flags);
> >>>>> }
> >>>>>
> >>>>>[...]
> >>>>I think the above would take the lock for too long time. That's why we
> >>>>prefer to take one free page block each time, and taking it one by one
> >>>>also doesn't make a difference, in terms of the performance that we
> >>>>need.
> >>>I think you should start with simple approach and impove incrementally
> >>>if this turns out to be not optimal. I really detest taking struct pages
> >>>outside of the lock. You never know what might happen after the lock is
> >>>dropped. E.g. can you race with the memory hotremove?
> >>
> >>The caller won't use pages returned from the function, so I think there
> >>shouldn't be an issue or race if the returned pages are used (i.e. not free
> >>anymore) or simply gone due to hotremove.
> >No, this is just too error prone. Consider that struct page pointer
> >itself could get invalid in the meantime. Please always keep robustness
> >in mind first. Optimizations are nice but it is even not clear whether
> >the simple variant will cause any problems.
>
>
> how about this:
>
> for_each_populated_zone(zone) {
> for_each_migratetype_order_decend(min_order, order, type) {
> do {
> => spin_lock_irqsave(&zone->lock, flags);
> ret = report_free_page_block(zone, order, type,
> &page)) {
> pfn = page_to_pfn(page);
> nr_pages = 1 << order;
> visit(opaque1, pfn, nr_pages);
> }
> => spin_unlock_irqrestore(&zone->lock, flags);
> } while (!ret)
> }
>
> In this way, we can still keep the lock granularity at one free page block
> while having the struct page operated under the lock.

How can you continue iteration of free_list after the lock has been
dropped? If you want to keep the lock held for each migrate type then
why not. Just push the lock inside for_each_migratetype_order loop from
my example.

--
Michal Hocko
SUSE Labs