Re: [PATCH] add some drop_caches documentation and info messsge

From: Andrew Morton
Date: Tue Oct 23 2012 - 19:45:39 EST


On Fri, 12 Oct 2012 14:57:08 +0200
Michal Hocko <mhocko@xxxxxxx> wrote:

> Hi,
> I would like to resurrect the following Dave's patch. The last time it
> has been posted was here https://lkml.org/lkml/2010/9/16/250 and there
> didn't seem to be any strong opposition.
> Kosaki was worried about possible excessive logging when somebody drops
> caches too often (but then he claimed he didn't have a strong opinion
> on that) but I would say opposite. If somebody does that then I would
> really like to know that from the log when supporting a system because
> it almost for sure means that there is something fishy going on. It is
> also worth mentioning that only root can write drop caches so this is
> not an flooding attack vector.
> I am bringing that up again because this can be really helpful when
> chasing strange performance issues which (surprise surprise) turn out to
> be related to artificially dropped caches done because the admin thinks
> this would help...
>
> I have just refreshed the original patch on top of the current mm tree
> but I could live with KERN_INFO as well if people think that KERN_NOTICE
> is too hysterical.
> ---
> >From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001
> From: Dave Hansen <dave@xxxxxxxxxxxxxxxxxx>
> Date: Fri, 12 Oct 2012 14:30:54 +0200
> Subject: [PATCH] add some drop_caches documentation and info messsge
>
> There is plenty of anecdotal evidence and a load of blog posts
> suggesting that using "drop_caches" periodically keeps your system
> running in "tip top shape". Perhaps adding some kernel
> documentation will increase the amount of accurate data on its use.
>
> If we are not shrinking caches effectively, then we have real bugs.
> Using drop_caches will simply mask the bugs and make them harder
> to find, but certainly does not fix them, nor is it an appropriate
> "workaround" to limit the size of the caches.
>
> It's a great debugging tool, and is really handy for doing things
> like repeatable benchmark runs. So, add a bit more documentation
> about it, and add a little KERN_NOTICE. It should help developers
> who are chasing down reclaim-related bugs.
>
> ...
>
> + printk(KERN_NOTICE "%s (%d): dropped kernel caches: %d\n",
> + current->comm, task_pid_nr(current), sysctl_drop_caches);

urgh. Are we really sure we want to do this? The system operators who
are actually using this thing will hate us :(


More friendly alternatives might be:

- Taint the kernel. But that will only become apparent with an oops
trace or similar.

- Add a drop_caches counter and make that available in /proc/vmstat,
show_mem() output and perhaps other places.

I suspect the /proc/vmstat counter will suffice - if someone is having
vm issues, we'll be seeing their /proc/vmstat at some stage and if the
drop_caches counter is high, that's enough to get suspicious?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/