Re: Distributed storage.
From: Daniel Phillips
Date: Sun Aug 12 2007 - 19:37:41 EST
On Tuesday 07 August 2007 13:55, Jens Axboe wrote:
> I don't like structure bloat, but I do like nice design. Overloading
> is a necessary evil sometimes, though. Even today, there isn't enough
> room to hold bi_rw and bi_flags in the same variable on 32-bit archs,
> so that concern can be scratched. If you read bio.h, that much is
> obvious.
Sixteen bits in bi_rw are consumed by queue priority. Is there a reason
this lives in struct bio instead of struct request?
> If you check up on the iommu virtual merging, you'll understand the
> front and back size members. They may smell dubious to you, but
> please take the time to understand why it looks the way it does.
Virtual merging is only needed at the physical device, so why do these
fields live in struct bio instead of struct request?
> Changing the number of bvecs is integral to how bio buildup current
> works.
Right, that is done by bi_vcnt. I meant bi_max_vecs, which you can
derive efficiently from BIO_POOL_IDX() provided the bio was allocated
in the standard way. This leaves a little bit of clean up to do for
bios not allocated from a standard pool.
Incidentally, why does the bvl need to be memset to zero on allocation?
bi_vcnt already tells you which bvecs are valid and the only field in a
bvec that can reasonably default to zero is the offset, which ought to
be set set every time a bvec is initialized anyway.
> > bi_destructor could be combined. I don't see a lot of users of
> > bi_idx,
>
> bi_idx is integral to partial io completions.
Struct request has a remaining submission sector count so what does
bi_idx do that is different?
> > that looks like a soft target. See what happened to struct page
> > when a couple of folks got serious about attacking it, some really
> > deep hacks were done to pare off a few bytes here and there. But
> > struct bio as a space waster is not nearly in the same ballpark.
>
> So show some concrete patches and examples, hand waving and
> assumptions is just a waste of everyones time.
Average struct bio memory footprint ranks near the bottom of the list of
things that suck most about Linux storage. At idle I see 8K in use
(reserves); during updatedb it spikes occasionally to 50K; under a
heavy load generated by ddsnap on a storage box it sometimes goes to
100K with bio throttling in place. Really not moving the needle.
On the other hand, vm writeout deadlock ranks smack dab at the top of
the list, so that is where the patching effort must go for the
forseeable future. Without bio throttling, the ddsnap load can go to
24 MB for struct bio alone. That definitely moves the needle. in
short, we save 3,200 times more memory by putting decent throttling in
place than by saving an int in struct bio.
That said, I did a little analysis to get an idea of where the soft
targets are in struct bio, and to get to know the bio layer a little
better. Maybe these few hints will get somebody interested enough to
look further.
> > It would be interesting to see if bi_bdev could be made read only.
> > Generally, each stage in the block device stack knows what the next
> > stage is going to be, so why do we have to write that in the bio?
> > For error reporting from interrupt context? Anyway, if Evgeniy
> > wants to do the patch, I will happily unload the task of convincing
> > you that random fields are/are not needed in struct bio :-)
>
> It's a trade off, otherwise you'd have to pass the block device
> around a lot.
Which costs very little, probably less than trashing an extra field's
worth of cache.
> And it's, again, a design issue. A bio contains
> destination information, that means device/offset/size information.
> I'm all for shaving structure bytes where it matters, but not for the
> sake of sacrificing code stability or design. I consider struct bio
> quite lean and have worked hard to keep it that way. In fact, iirc,
> the only addition to struct bio since 2001 is the iommu front/back
> size members. And I resisted those for quite a while.
You did not comment on the one about putting the bio destructor in
the ->endio handler, which looks dead simple. The majority of cases
just use the default endio handler and the default destructor. Of the
remaining cases, where a specialized destructor is needed, typically a
specialized endio handler is too, so combining is free. There are few
if any cases where a new specialized endio handler would need to be
written.
As far as code stability goes, current kernels are horribly unstable in
a variety of contexts because of memory deadlock and slowdowns related
to the attempt to fix the problem via dirty memory limits. Accurate
throttling of bio traffic is one of the two key requirements to fix
this instability, the other other is accurate writeout path reserve
management, which is only partially addressed by BIO_POOL.
Nice to see you jumping in Jens. Now it is over to the other side of
the thread where Evgeniy has posted a patch that a) grants your wish to
add no new field in struct bio and b) does not fix the problem.
Regards,
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/