Re: [RFC][PATCH] Reduce ext3 allocate-with-reservation locklatencies

From: Mingming Cao
Date: Fri Apr 29 2005 - 11:43:52 EST


On Thu, 2005-04-28 at 11:34 -0700, Mingming Cao wrote:
> On Thu, 2005-04-28 at 12:12 -0400, Lee Revell wrote:
> > On Thu, 2005-04-28 at 00:37 -0700, Mingming Cao wrote:
> > > On Wed, 2005-04-27 at 23:45 -0400, Lee Revell wrote:
> > > > On Fri, 2005-04-22 at 15:10 -0700, Mingming Cao wrote:
> > > > > Please review. I have tested on fsx on SMP box. Lee, if you got time,
> > > > > please try this patch.
> > > >
> > > > I have tested and this does fix the problem. I ran my tests and no ext3
> > > > code paths showed up on the latency tracer at all, it never went above
> > > > 33 usecs.
> > > >
> > > Thanks, Lee.
> > >
> > > The patch survived on many fsx test over 20 hours on a 2cpu machine.
> > > Tested the patch on the same machine with tiobench (1-64 threads), and
> > > untar a kernel tree test, no regression there.
> > > However I see about 5-7% throughput drop on dbench with 16 threads. It
> > > probably due to the cpu cost that we have discussed.
> >
> > Hmm, I guess someone needs to test it on a bigger system. AFAICT this
> > should improve SMP scalability quite a bit. Maybe that lock is rarely
> > contended.
>
> Probably. I will try it on a relatively full filesystem(where bitmap
> scan to find a free block takes time), and on a 4 cpu box with many
> threads allocating at the same time.

I run 64 threads dbench on a 4way box on a 90% full filesystem, patched
kernel is slightly faster. I think part of the reason is on a 90% full
filesystem, block bitmap scan will take a long time. Thus drop the lock
before bitmap scan will help, although there are some overhead of
drop/re-grab lock and extra window search/link/unlink before we found a
window with free block. On a freshly created filesystem where the
regression is seen, the bitmap scan takes shorter time, thus the
overhead of drop/re-grab lock and extra window search/link/unlink
probably is probably be outstanding.

I also run 1000 dd tests in parallel, each of them writing to different
file(2M size). Seems no gain or no regression.

Anyway, we should not hold the global reservation lock while scanning
the bitmap, that's a scalability issue and a latency issue. To drop the
lock during bitmap scan we need to do extra work, I don't think it's
avoidable. Andrew, could you add it to mm tree? I will re-send the
patch with the change log.

Thanks,
Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/