Re: jbd2 inside a device mapper module
From: Alberto Bertogli
Date: Mon Dec 29 2008 - 16:31:57 EST
On Sat, Dec 27, 2008 at 02:29:50PM -0500, Theodore Tso wrote:
> On Sat, Dec 27, 2008 at 01:00:20AM -0200, Alberto Bertogli wrote:
> > I have a couple of alternatives in mind, the most decent one at the
> > moment is having two metadatas (M1 and M2) for the each block, and
> > update M1 on the first write to the given block, M2 on the second, M1 on
> > the third, and so on.
>
> I don't see how this would help. You still have to do synchronous
> writes for safety, which is what is going to kill your performance.
I was thinking of queueing the writes to the metadata, and then queue
the writes of the data marked with bio_barrier(); when the data write
completes I end the original bio.
Although if they metadata is on a different device, I do have to wait
for the metadata to be written because the barrier is useless; but OTOH
if I use a journal I can't split my data and metadata in two different
devices, can I? (without using two journals or doing more complex
stuff).
> What you want to do is to batch as many writes as possible. Until the
> underlying filesystem requests a flush, you can afford to hold off
> writing the block to disk. Otherwise, you'll end up turning each 4k
I think I can't do this at the device-mapper layer. There's a .flush
function pointer, but I think it's suspend-related; and in any case I
gave it a try and it's never called during normal operation.
> > > Why not just use the ext3/4 external journal format?
> >
> > Wouldn't that lead to confusion, because people can think the device
> > holds an ext3/4 external journal, while it actually holds a
> > device-mapper backing device that happens to contain a journal?
>
> Not really; the external journal has a label and uuid, and the journal
> superblock has a place to store the uuid of the "client" of the
> journal. So there is plenty of information available to tie an
> external journal to some device-mapper backing device.
>
> > What would be the advantages of using the ext3/4 journal format, over a
> > simple initial sector and the journal following?
>
> There already existing tools to find the external journal, using the
> blkid library. So you only have to store the UUID of the journal in
> the superblock of the device-mapper backing device, and then you can
> easily find the external journal as follows:
>
> journal_fn = blkid_get_devname(ctx->blkid, "UUID", uuid);
Thanks a lot for the suggestions!
As I said in the other email, I'll give the writes a try and see how it
goes. If their performance suck (what, from what you tell me, it's
likely) at least I'll have something that works.
Thanks a lot,
Alberto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/