Re: [patch] speed up single bio_vec allocation

From: Nate Diller
Date: Thu Dec 07 2006 - 16:46:28 EST


On 12/7/06, Chen, Kenneth W <kenneth.w.chen@xxxxxxxxx> wrote:
Nate Diller wrote on Thursday, December 07, 2006 11:22 AM
> > > I still can't help but think we can do better than this, and that this
> > > is nothing more than optimizing for a benchmark. For high performance
> > > I/O, you will be doing > 1 page bio's anyway and this patch wont help
> > > you at all. Perhaps we can just kill bio_vec slabs completely, and
> > > create bio slabs instead with differing sizes. So instead of having 1
> > > bio slab and 5 bio_vec slabs, change that to 5 bio slabs that leave room
> > > for the bio_vec list at the end. That would always eliminate the extra
> > > allocation, at the cost of blowing the 256-page case into a order 1 page
> > > allocation (256*16 + sizeof(*bio) > PAGE_SIZE) for the 4kb 64-bit archs,
> > > which is something I've always tried to avoid.
> >
> > I took a quick query of biovec-* slab stats on various production machines,
> > majority of the allocation is on 1 and 4 segments, usages falls off quickly
> > on 16 or more. 256 segment biovec allocation is really rare. I think it
> > makes sense to heavily bias towards smaller biovec allocation and have
> > separate biovec allocation for really large ones.
>
> what file system? have you tested with more than one? have you
> tested with file systems that build their own bio's instead of using
> get_block() calls? have you tested with large files or streaming
> workloads? how about direct I/O?
>
> i think that a "heavy bias" toward small biovecs is FS and workload
> dependent, and that it's irresponsible to make such unjustified
> changes just to show improvement on your particular benchmark.

It is no doubt that the above data is just a quick estimate on one
usage model. There are tons of other usage in the world. After all,
any algorithm in the kernel has to be generic and self tune to
specific environment.

On very large I/O, the relative overhead in allocating biovec will
decrease because larger I/O needs more code to do setup, more code
to perform I/O completion, more code in the device driver etc. So
time spent on one mempool alloc will amortize over the size of I/O.
On a smaller I/O size, the overhead is more visible and thus makes
sense to me to cut down that relative overhead.

In fact, the large I/O already have unfair advantage. If you do 1MB
I/O, only 1 call to mempool to get a 256 segment bio. However if you
do two 512K I/O, two calls to mempool is made. So in some sense,
current scheme is unfair to small I/O.

so that's a fancy way of saying "no, i didn't do any benchmarks", yes?

the current code is straightforward and obviously correct. you want
to make the alloc/dealloc paths more complex, by special-casing for an
arbitrary limit of "small" I/O, AFAICT. of *course* you can expect
less overhead when you're doing one large I/O vs. two small ones,
that's the whole reason we have all this code to try to coalesce
contiguous I/O, do readahead, swap page clustering, etc. we *want*
more complexity if it will get us bigger I/Os. I don't see why we
want more complexity to reduce the *inherent* penalty of doing smaller
ones.

btw, i am happy to see that you are working on performance, especially
to the degree you reduce overhead by cleaning things up and removing
cruft and unneeded complexity...

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/