Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu()to set percpu pageset fields.

From: Cody P Schafer
Date: Mon Apr 08 2013 - 15:50:49 EST

On 04/08/2013 10:28 AM, Cody P Schafer wrote:
On 04/08/2013 05:20 AM, Gilad Ben-Yossef wrote:
On Fri, Apr 5, 2013 at 11:33 PM, Cody P Schafer
<cody@xxxxxxxxxxxxxxxxxx> wrote:
In free_hot_cold_page(), we rely on pcp->batch remaining stable.
Updating it without being on the cpu owning the percpu pageset
potentially destroys this stability.

Change for_each_cpu() to on_each_cpu() to fix.

Are you referring to this? -

This was the case I noticed.

1329 if (pcp->count >= pcp->high) {
1330 free_pcppages_bulk(zone, pcp->batch, pcp);
1331 pcp->count -= pcp->batch;
1332 }

I'm probably missing the obvious but won't it be simpler to do this in
free_hot_cold_page() -

1329 if (pcp->count >= pcp->high) {
1330 unsigned int batch = ACCESS_ONCE(pcp->batch);
1331 free_pcppages_bulk(zone, batch, pcp);
1332 pcp->count -= batch;
1333 }

Potentially, yes. Note that this was simply the one case I noticed,
rather than certainly the only case.

I also wonder whether there could be unexpected interactions between
->high and ->batch not changing together atomically. For example, could
adjusting this knob cause ->batch to rise enough that it is greater than
the previous ->high? If the code above then runs with the previous
->high, ->count wouldn't be correct (checking this inside
free_pcppages_bulk() might help on this one issue).

Now the batch value used is stable and you don't have to IPI every CPU
in the system just to change a config knob...

Is this really considered an issue? I wouldn't have expected someone to
adjust the config knob often enough (or even more than once) to cause
problems. Of course as a "It'd be nice" thing, I completely agree.

Would using schedule_on_each_cpu() instead of on_each_cpu() be an improvement, in your opinion?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at