Re: Confused about RAW devices ...

Stephen C. Tweedie (sct@redhat.com)
Fri, 22 Oct 1999 11:27:21 +0100 (BST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Alan Cox: "Re: Linux 2.2.13ac1 - clarification"
Previous message: David Weinehall: "Re: 2.2.13 compile error"
In reply to: cyco: "Kernel 2.3.22"

Hi,

On Mon, 18 Oct 1999 17:47:03 +0200 (MESZ), "Dr. Michael Weller"
<eowmob@exp-math.uni-essen.de> said:

>> I'm confused a bit ? What are those raw devices ?

> Well, you just wrote down the difference. Now, linux (apparently this
> is going to change) has no difference between raw and non raw
> devices.

ftp://ftp.uk.linux.org/pub/linux/sct/fs/raw-io

for raw IO on 2.2.12. The 2.3 kernels already include the kernel
support. Red Hat 6.1 shipped with raw IO.

> A more classical Unix has block and char devices (the first can only
> write blocks of data, often in a random seek manner, the latter are
> more like streams of data).

Umm, Unix block devices can write arbitrary bytes of data. Because the
block device access is buffered, the kernel is able to deal with the
underlying block structure entirely transparently to the application.
Raw IO is restricted to entire blocks, however, since it does not have
the buffer cache to assist it.

> Then for block devices, there are the 'non-raw' devices which use the
> cache, and there are the raw once which don't.

Nope. All raw devices are character devices. All block devices
necessarily use the buffer cache: a block device is defined internally
as something which can service buffer cache IO requests. Raw devices
cannot be presented as block devices.

> A current linux has no raw devices.

See above. :)

> Now, some people say they don't want buffering for a disk, or their
> application is more smart to do that, so they want the raw devices.
> Theoretically you can use O_SYNC, but I dunno how well this is
> implemented. Of course, it has at least to go through some layer to decide
> not to buffer or to immediately flush it, so a raw device could be
> faster.

O_SYNC is still going to be cached: it will just write the cache back to
disk immediately the cache has been updated. The copy in memory still
stays there.

> And there are always the voices that say that the kernel buffer is sooo
> smart and better in buffering than any application code so a raw device is
> pointless. Apparently some database gurus now pushed hard enough to have
> raw devices added to a new kernel (if there really will be raw devices in
> the future (1st time I heard that)).

They are there now. There are basically three advantages for raw IO.
First of all there is performance: by doing DMA directly from the user
process's data buffer to disk, we avoid the CPU cost of doing an extra
copy when we write to a raw device. Secondly there is memory usage:
avoiding the buffer cache means we are using memory more efficiently if
the application knows that there is no benefit from O/S caching.

Thirdly, however, there is a major class of applications which simply
cannot work on cached devices. Consider a shared-disk clustered
database where multiple nodes are accessing the same disk at once. If
you are reading from cache, you may read the wrong data --- there is a
chance that another node has modified the disk since you read your copy
into cache. Raw IO is absolutely necessary to support such applications.

>> I did a small test :
>> (2) fd=open("/dev/cdrom",O_RDONLY);
>> read(fd,buffer,1);
>> wait_a_second();
>> read(fd,buffer,1);
>>
>> When the second read is done , the CD-drive spins the CD up and the drive LED
>> blinks. No sign of caching. ( kernel v2.2.12 )

If the reads are truly sequential and only 1 byte long, then the only
time this should happen is if the buffer cache is busy enough that the
block has been discarded from the cache between the reads.

> If you really did this, the second read does not read the same as the
> first. So, it could not be satisfied from the cache.

Yes it will be: the cache reads entire blocks/sectors at a time, necessarily.

> A 2nd read from the hardware is ok. What you could claim is: there is
> no read ahead.

There is, but it will only be triggered on the next actual IO in this
case.

> Now, you say, but I only read 1 byte, the next one is on the same block,
> so in cache already. Well, but /dev/cdrom is a block device, you can't
> read one byte at all.

Yes you can. The kernel will cache the rest of the block for you.

> All that said; In addition: a cdrom is removable. The kernel has to check
> the CD didn't change.

No it doesn't: the kernel locks the drive door while the file is open,
and only tests disk change when it is closed and then reopened.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Alan Cox: "Re: Linux 2.2.13ac1 - clarification"
Previous message: David Weinehall: "Re: 2.2.13 compile error"
In reply to: cyco: "Kernel 2.3.22"