On Mon 19-01-15 09:57:08, Vinayak Menon wrote:
On 01/18/2015 01:18 AM, Christoph Lameter wrote:
On Sat, 17 Jan 2015, Vinayak Menon wrote:
which had not updated the vmstat_diff. This CPU was in idle for around 30
secs. When I looked at the tvec base for this CPU, the timer associated with
vmstat_update had its expiry time less than current jiffies. This timer had
its deferrable flag set, and was tied to the next non-deferrable timer in the
We can remove the deferrrable flag now since the vmstat threads are only
activated as necessary with the recent changes. Looks like this could fix
your issue?
Yes, this should fix my issue.
Does it? Because I would prefer not getting into un-synced state much
more than playing around one specific place which shows the problems
right now.
But I think we may need the fix in too_many_isolated, since there can still
be a delay of few seconds (HZ by default and even more because of reasons
pointed out by Michal) which will result in reclaimers unnecessarily
entering congestion_wait. No ?
I think we can solve this as well. We can stick vmstat_shepherd into a
kernel thread with a loop with the configured timeout and then create a
mask of CPUs which need the update and run vmstat_update from
IPI context (smp_call_function_many).
We would have to drop cond_resched from refresh_cpu_vm_stats of
course. The nr_zones x NR_VM_ZONE_STAT_ITEMS in the IPI context
shouldn't be excessive but I haven't measured that so I might be easily
wrong.
Anyway, that should work more reliably than the current scheme and
should help to reduce pointless wakeups which the original patchset was
addressing. Or am I missing something?