Re: Make vmstat deferrable again (was Re: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable() checks)
From: Christoph Lameter
Date: Fri Oct 23 2015 - 10:12:56 EST
On Fri, 23 Oct 2015, Sergey Senozhatsky wrote:
> On (10/23/15 06:43), Christoph Lameter wrote:
> > Is this ok?
>
> kernel/sched/loadavg.c: In function âcalc_load_enter_idleâ:
> kernel/sched/loadavg.c:195:2: error: implicit declaration of function âquiet_vmstatâ [-Werror=implicit-function-declaration]
> quiet_vmstat();
> ^
Oww... Not good to do that in the scheduler. Ok new patch follows that
does the call from tick_nohz_stop_sched_tick. Hopefully that is the right
location to call quiet_vmstat().
> > + if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
> > + cancel_delayed_work(this_cpu_ptr(&vmstat_work));
>
> shouldn't preemption be disable for smp_processor_id() here?
Preemption is disabled when quiet_vmstat() is called.
Subject: Fix vmstat: make vmstat_updater deferrable again and shut down on idle V2
V1->V2
- Call vmstat_quiet from tick_nohz_stop_sched_tick() instead.
Currently the vmstat updater is not deferrable as a result of commit
ba4877b9ca51f80b5d30f304a46762f0509e1635. This in turn can cause multiple
interruptions of the applications because the vmstat updater may run at
different times than tick processing. No good.
Make vmstate_update deferrable again and provide a function that
shuts down the vmstat updater when we go idle by folding the differentials.
Shut it down from the load average calculation logic introduced by nohz.
Note that the shepherd thread will continue scanning the differentials
from another processor and will reenable the vmstat workers if it
detects any changes.
Fixes: ba4877b9ca51f80b5d30f304a46762f0509e1635 (do not use deferrable delay)
Signed-off-by: Christoph Lameter <cl@xxxxxxxxx>
Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1395,6 +1395,20 @@ static void vmstat_update(struct work_st
}
/*
+ * Switch off vmstat processing and then fold all the remaining differentials
+ * until the diffs stay at zero. The function is used by NOHZ and can only be
+ * invoked when tick processing is not active.
+ */
+void quiet_vmstat(void)
+{
+ do {
+ if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
+ cancel_delayed_work(this_cpu_ptr(&vmstat_work));
+
+ } while (refresh_cpu_vm_stats());
+}
+
+/*
* Check if the diffs for a certain cpu indicate that
* an update is needed.
*/
@@ -1426,7 +1440,7 @@ static bool need_update(int cpu)
*/
static void vmstat_shepherd(struct work_struct *w);
-static DECLARE_DELAYED_WORK(shepherd, vmstat_shepherd);
+static DECLARE_DEFERRABLE_WORK(shepherd, vmstat_shepherd);
static void vmstat_shepherd(struct work_struct *w)
{
Index: linux/include/linux/vmstat.h
===================================================================
--- linux.orig/include/linux/vmstat.h
+++ linux/include/linux/vmstat.h
@@ -211,6 +211,7 @@ extern void __inc_zone_state(struct zone
extern void dec_zone_state(struct zone *, enum zone_stat_item);
extern void __dec_zone_state(struct zone *, enum zone_stat_item);
+void quiet_vmstat(void);
void cpu_vm_stats_fold(int cpu);
void refresh_zone_stat_thresholds(void);
@@ -272,6 +273,7 @@ static inline void __dec_zone_page_state
static inline void refresh_cpu_vm_stats(int cpu) { }
static inline void refresh_zone_stat_thresholds(void) { }
static inline void cpu_vm_stats_fold(int cpu) { }
+static inline void quiet_vmstat(void) { }
static inline void drain_zonestat(struct zone *zone,
struct per_cpu_pageset *pset) { }
Index: linux/kernel/time/tick-sched.c
===================================================================
--- linux.orig/kernel/time/tick-sched.c
+++ linux/kernel/time/tick-sched.c
@@ -667,6 +667,7 @@ static ktime_t tick_nohz_stop_sched_tick
*/
if (!ts->tick_stopped) {
nohz_balance_enter_idle(cpu);
+ quiet_vmstat();
calc_load_enter_idle();
ts->last_tick = hrtimer_get_expires(&ts->sched_timer);