Re: [PATCH v2] mm: vmscan: fix the page state calculation in too_many_isolated

From: Vlastimil Babka
Date: Thu Jan 22 2015 - 10:16:30 EST

Next message: Colin King: "[PATCH] drm/vmwgfx: Correctly NULLify dma buffer pointer on failure"
Previous message: Bryan O'Donoghue: "Re: [PATCH v2 1/1] x86: Add Isolated Memory Regions for Quark X1000"
In reply to: Michal Hocko: "Re: [PATCH v2] mm: vmscan: fix the page state calculation in too_many_isolated"
Next in thread: Christoph Lameter: "Re: [PATCH v2] mm: vmscan: fix the page state calculation in too_many_isolated"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 01/21/2015 03:39 PM, Michal Hocko wrote:

On Mon 19-01-15 09:57:08, Vinayak Menon wrote:

On 01/18/2015 01:18 AM, Christoph Lameter wrote:

On Sat, 17 Jan 2015, Vinayak Menon wrote:

which had not updated the vmstat_diff. This CPU was in idle for around 30
secs. When I looked at the tvec base for this CPU, the timer associated with
vmstat_update had its expiry time less than current jiffies. This timer had
its deferrable flag set, and was tied to the next non-deferrable timer in the

We can remove the deferrrable flag now since the vmstat threads are only
activated as necessary with the recent changes. Looks like this could fix
your issue?

Yes, this should fix my issue.

Does it? Because I would prefer not getting into un-synced state much
more than playing around one specific place which shows the problems
right now.

But I think we may need the fix in too_many_isolated, since there can still
be a delay of few seconds (HZ by default and even more because of reasons
pointed out by Michal) which will result in reclaimers unnecessarily
entering congestion_wait. No ?

I think we can solve this as well. We can stick vmstat_shepherd into a
kernel thread with a loop with the configured timeout and then create a
mask of CPUs which need the update and run vmstat_update from
IPI context (smp_call_function_many).
We would have to drop cond_resched from refresh_cpu_vm_stats of
course. The nr_zones x NR_VM_ZONE_STAT_ITEMS in the IPI context
shouldn't be excessive but I haven't measured that so I might be easily
wrong.

Anyway, that should work more reliably than the current scheme and
should help to reduce pointless wakeups which the original patchset was
addressing. Or am I missing something?

Maybe to further reduce wakeups, a CPU could check and update its counters before going idle? (unless that already happens)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Colin King: "[PATCH] drm/vmwgfx: Correctly NULLify dma buffer pointer on failure"
Previous message: Bryan O'Donoghue: "Re: [PATCH v2 1/1] x86: Add Isolated Memory Regions for Quark X1000"
In reply to: Michal Hocko: "Re: [PATCH v2] mm: vmscan: fix the page state calculation in too_many_isolated"
Next in thread: Christoph Lameter: "Re: [PATCH v2] mm: vmscan: fix the page state calculation in too_many_isolated"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]