Re: [LKP] [SUNRPC] c4a7ca77494: +6.0% fsmark.time.involuntary_context_switches, no primary result change

From: Huang Ying
Date: Sun Feb 15 2015 - 19:28:47 EST


On Sun, 2015-02-15 at 13:02 -0500, Trond Myklebust wrote:
> Hi guys,
>
> On Sun, Feb 15, 2015 at 2:57 AM, Huang Ying <ying.huang@xxxxxxxxx> wrote:
> > FYI, we noticed the below changes on
> >
> > commit c4a7ca774949960064dac11b326908f28407e8c3 ("SUNRPC: Allow waiting on memory allocation")
> >
> >
> > testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-f2fs-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd
> >
> > 127b21b89f9d8ba0 c4a7ca774949960064dac11b32
> > ---------------- --------------------------
> > %stddev %change %stddev
> > \ | \
> > 52524 Â 0% +6.0% 55672 Â 0% fsmark.time.involuntary_context_switches
> > 436 Â 14% +54.9% 676 Â 20% sched_debug.cfs_rq[0]:/.tg_load_contrib
> > 433 Â 15% +54.7% 670 Â 21% sched_debug.cfs_rq[0]:/.blocked_load_avg
> > 8348 Â 7% +27.0% 10602 Â 9% sched_debug.cfs_rq[0]:/.min_vruntime
> > 190081 Â 13% +32.7% 252269 Â 13% sched_debug.cpu#0.sched_goidle
> > 205783 Â 12% +30.2% 267903 Â 13% sched_debug.cpu#0.ttwu_local
> > 464065 Â 11% +26.6% 587524 Â 12% sched_debug.cpu#0.nr_switches
> > 464278 Â 11% +26.6% 587734 Â 12% sched_debug.cpu#0.sched_count
> > 15807 Â 11% +19.6% 18910 Â 12% sched_debug.cpu#4.nr_load_updates
> > 300041 Â 8% +20.3% 360969 Â 10% sched_debug.cpu#0.ttwu_count
> > 1863 Â 9% +18.1% 2201 Â 10% sched_debug.cfs_rq[4]:/.exec_clock
> >
> > testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-btrfs-nfsv4-8K-400M-fsyncBeforeClose-16d-256fpd
> >
> > 127b21b89f9d8ba0 c4a7ca774949960064dac11b32
> > ---------------- --------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > 52184 Â 0% +5.6% 55122 Â 0% fsmark.time.involuntary_context_switches
> > 557 Â 19% +21.5% 677 Â 9% sched_debug.cfs_rq[5]:/.blocked_load_avg
> > 217 Â 19% -42.9% 124 Â 21% sched_debug.cfs_rq[2]:/.load
> > 45852 Â 14% -39.4% 27773 Â 24% sched_debug.cpu#7.ttwu_local
> > 457 Â 18% +50.1% 686 Â 20% sched_debug.cfs_rq[0]:/.tg_load_contrib
> > 455 Â 18% +46.7% 668 Â 19% sched_debug.cfs_rq[0]:/.blocked_load_avg
> > 66605 Â 10% -26.7% 48826 Â 14% sched_debug.cpu#7.sched_goidle
> > 78249 Â 9% -22.5% 60678 Â 11% sched_debug.cpu#7.ttwu_count
> > 153506 Â 9% -22.7% 118649 Â 12% sched_debug.cpu#7.nr_switches
> > 153613 Â 9% -22.7% 118755 Â 12% sched_debug.cpu#7.sched_count
> > 15806 Â 6% +19.2% 18833 Â 18% sched_debug.cpu#4.nr_load_updates
> > 2171 Â 5% +15.6% 2510 Â 13% sched_debug.cfs_rq[4]:/.exec_clock
> > 9924 Â 11% -27.0% 7244 Â 25% sched_debug.cfs_rq[3]:/.min_vruntime
> > 3156 Â 4% -13.4% 2734 Â 8% sched_debug.cfs_rq[7]:/.min_vruntime
> >
> > testbox/testcase/testparams: nhm4/fsmark/performance-1x-32t-1HDD-ext4-nfsv4-9B-400M-fsyncBeforeClose-16d-256fpd
> >
> > 127b21b89f9d8ba0 c4a7ca774949960064dac11b32
> > ---------------- --------------------------
> > 104802 Â 0% +7.7% 112883 Â 0% fsmark.time.involuntary_context_switches
> > 471755 Â 0% -1.3% 465592 Â 0% fsmark.time.voluntary_context_switches
> > 1977 Â 36% +90.8% 3771 Â 8% sched_debug.cpu#4.curr->pid
> > 2 Â 34% +80.0% 4 Â 24% sched_debug.cpu#6.cpu_load[1]
> > 4 Â 33% +83.3% 8 Â 31% sched_debug.cpu#6.cpu_load[0]
> > 193 Â 17% +48.0% 286 Â 19% sched_debug.cfs_rq[2]:/.blocked_load_avg
> > 196 Â 17% +47.5% 290 Â 19% sched_debug.cfs_rq[2]:/.tg_load_contrib
> > 96 Â 18% +40.6% 135 Â 11% sched_debug.cfs_rq[7]:/.load
> > 97 Â 18% +38.5% 135 Â 11% sched_debug.cpu#7.load
> > 2274 Â 7% -16.5% 1898 Â 3% proc-vmstat.pgalloc_dma
> > 319 Â 6% -29.7% 224 Â 24% sched_debug.cfs_rq[1]:/.tg_load_contrib
> > 314 Â 5% -29.4% 222 Â 25% sched_debug.cfs_rq[1]:/.blocked_load_avg
> > 621 Â 10% +41.9% 881 Â 37% sched_debug.cfs_rq[4]:/.avg->runnable_avg_sum
> >
> > nhm4: Nehalem
> > Memory: 4G
> >
> >
> >
> >
> > fsmark.time.involuntary_context_switches
> >
> > 114000 ++-----------------------------------------------------------------+
> > 113000 O+ O O O O O O O O O O O O O O O O O |
> > | O O O O O
> > 112000 ++ |
> > 111000 ++ |
> > | |
> > 110000 ++ |
> > 109000 ++ |
> > 108000 ++ |
> > | |
> > 107000 ++ |
> > 106000 ++ |
> > | |
> > 105000 *+.*..*..*..*..*..*..*..*..*..*...*..*..*..*..*..*..*..* |
> > 104000 ++-----------------------------------------------------------------+
> >
> >
> > [*] bisect-good sample
> > [O] bisect-bad sample
> >
> > To reproduce:
> >
> > apt-get install ruby
> > git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> > cd lkp-tests
> > bin/setup-local job.yaml # the job file attached in this email
> > bin/run-local job.yaml
>
> So this is on a loopback NFS setup (i.e. the server resides on the
> same node as the client, which just mounts from the loopback IP
> address 127.0.0.1)?

Yes. This is on a loopback NFS setup.

> That's a fairly quirky setup as far as memory management goes. In low
> memory situations, you have very a nasty feedback mechanism whereby
> the NFS server ends up pushing the client to write back more data,
> increasing the memory pressure on the NFS server, etc.
> It is quite possible that allowing the NFS client to block more
> aggressively in low memory situations could worsen that feedback
> mechanism, however that's not our main target platform; we actively
> discourage people from using loopback NFS in production systems.
>
> Is there any way you could confirm this performance change using a
> remote NFS server instead of the loopback NFS?

We are working on a remote NFS setup now :)

Best Regards,
Huang, Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/