Re: (fwd) Re: [RFC] mount flag "direct"

From: Peter T. Breuer (ptb@it.uc3m.es)
Date: Fri Sep 06 2002 - 09:25:16 EST


"A month of sundays ago Anton Altaparmakov wrote:"
> The procedure described is _identical_ if you want to access 1TiB at a
> time, because the request is broken down by the VFS into 512 byte size

I think you said "1byte". And aren't these 4096B, or whatever the
blocksize is?

> units and I think I explained that, too. And for _each_ 512 byte sized unit
> of those 1TiB you would have to repeat the _whole_ of the described

Why? Doesn't the inode usually point at a contiguous lump of many
blocks? Are you perhaps looking at the worst case, not the usual case?

> procedure. So just replace 1 byte with 512 bytes in my post and then repeat
> the procedure as many times as it takes to make up the 1TiB. Surely you
> should know this... just fundamental Linux kernel VFS operation.

Well, you seem to have improved the situation by a factor of 512 in
just a few lines. Perhaps you can improve it again ...?

> It is not clear to me he understands the concept at all. He thinks that you

Well, "let it be clear to you".

> need to read the disk inode just once and then you immediately read all the

No, I think that's likely. I don't "think it". But yes, i assume that
in the case of a normal file it is quite likely that all the info
involved is in the inode, and we don't need to go hunting elsewhere.

Wrong?

> 1TiB of data and somehow all this magic is done by the VFS. This is

No, the fs reads. But yes, the inode is "looked up once" on average, I
believe, and if it says there's 1TB of data on disk here, here and here,
then I am going to tell you that I think it's locked in the fs while we
go look up the data on disk in the fs.

What I am not clear about now is exactly when the data is looked up - I
get the impression from what I have seen of the code that the VFS passes
down a complete request for 1TB, and that the the FS then goes and locks
the inode and chases down the data. What you are saying above gives me
the impression that things are broken down into 512B lumps FIRST in or
above VFS, and then sent to the fs as individual requests with no inode
or fs locking. And I simply don't buy that as a strategy.

> complete crap and is _NOT_ how the Linux kernel works. The VFS breaks every
> request into super block->s_blocksize sized units and _each_ and _every_

Well, that's 4096 (or 1024), not 512.

> request is _individually_ looked up as to where it is on disk.

Then that's crazy, but, shrug, if you want to do things that way, it
means that locking the inode becomes even more important, so that you
can cache it safely for a while. I'm quite happy with that. Just tell
me about it .. after all, I want to issue an appropriate "lock"
instruction from vfs before every vfs op. I would also like to remove
the dcache after every vfs opn, as we unlock. I'm asking for insight
into doing this ...

> Each of those lookups requires a lot of seeks, reads, memory allocations,
> etc. Just _read_ my post...

No they don't. You don't seem to realize that the remote disk server is
doing all that and has the data cached. What we, the client kernels
get, is latency in accessing that info, and not even that if we lock,
cache, and unlock.

> Please, I really wouldn't have expected someone like you to come up with a
> statement like that... You really should know better...

If you know the person involved then try and entertain the idea that
they're not crazy, and suspect that other people are not crazy (or
illogical) either!

Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 07 2002 - 22:00:28 EST