Re: [PATCH 0/4] dcache: make Oracle more scalable on large systems

From: Waiman Long
Date: Thu Feb 28 2013 - 15:39:22 EST

On 02/22/2013 07:13 PM, Andi Kleen wrote:
That seems to me like an application problem - poking at what the
kernel is doing via diagnostic interfaces so often that it gets in
the way of the kernel actually doing stuff is not a problem the
kernel can solve.
I agree with you that the application shouldn't be doing that, but
if there is a cheap way to lower the d_path overhead that is also
attractive. There will be always applications doing broken things.
Any scaling problem less in the kernel is good.

But the real fix in this case is to fix the application.

Further investigation into the d_path() bottleneck revealed some
interesting facts about Oracle. First of all, the invocation of the
d_path() kernel function is concentrated in a few processes only
rather than distributed across many of them. On a 1 minute test run,
the following three standard long-running Oracle processes will call
indo d_path():

1. MMNL - Memory monitor light (gathers and stores AWR statistics) [272]
2. CKPT - Checkpoint process [17]
3. DBRM - DB resource manager (new in 11g) [16]

The numbers within [] are the number of times d_path() will be
called, which are not much for a 1-minutes interval. Beyond those
standard processes, Oracle also seems to spawn transient processes
(last a few seconds) periodically to issue a bunch of d_path() calls
(about 1000) within a short time before they die. I am not sure what
the purpose of those processes are. In an one minute interval, 2-7
of those transient processes may be spawned depending probably on the
activity level. Most of the d_path() call last for about 1ms. There
are a couple of those that last for more than 10ms.

Other system daemons that call into d_open() include irqbalance and
automount. irqbalance issues about 2000 d_path() call in a minute in
a bursty fashion. The contribution of automount is only about 50 in
the same time period which is not really significant. Regular commands
like cp, ps may also issue a couple of d_path() calls per invocation.

As I was using "perf record --call-graph" command to profile the Oracle
application, I found out that another major user of the d_path()
function happens to be perf_event_mmap_event() of the perf-event
subsystem. It took about 10% of the total d_path() calls. So the
metrics that I collected were skewed a bit because of that.

I am thinking that the impact of my patch on Oracle write performance
is probably due to its impact on the open() system call which has
to update the reference counts on dentry. On the collected perf
traces, a certain portion of the spinlock time was consumed by
dput() and path_get(). The 2 major consumers of those calls are
the d_path() and the open() system call. On test run with no writer,
I saw significantly less open() call in the perf trace and hence much
less impact on Oracle performance.

I do agree that Oracle should probably fix the application to issue less
calls to the d_path() function. However, I would argue that my patch will
still be useful for the following reasons:

1. Changing how the reference counting works (patch 1 of 4) will certainly
help in situation when processes are issuing intensive batches of file
system operations as is the case here.
2. Changing the rename_lock to use a sequence r/w lock (patches 2-4 of 4)
will help to minimize the overhead of the perf-event subsystem when it
is activated with the call-graph feature which is pretty common.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at