Re: Confused about RAW devices ...

Dr. Michael Weller (eowmob@exp-math.uni-essen.de)
Mon, 18 Oct 1999 17:47:03 +0200 (MESZ)


On Mon, 18 Oct 1999, DAVID BALAZIC wrote:

> Supposedly raw devices are going into kernel 2.4
> ( http://linuxtoday.com/story.php3?sn=10698 )
>
> I'm confused a bit ? What are those raw devices ?
>
> The mentioned article says :
> > A raw device is one whose accesses are not handled through the caching
> > layer and whose actions are immediately and always synchronous with the
> > "hard" data on the disk or elsewhere.
>
> How is this different from normal devices ?

Well, you just wrote down the difference. Now, linux (apparently this is
going to change) has no difference between raw and non raw devices. A more
classical Unix has block and char devices (the first can only write blocks
of data, often in a random seek manner, the latter are more like streams
of data).

Problems with this scheme arise already for tapes. They are typically non
random access and still only transfer blocks of data, not separate bytes.

So, it's never clear on an OS if the tape is a char or a block device.
Then for block devices, there are the 'non-raw' devices which use the
cache, and there are the raw once which don't. A current linux has no raw
devices. All block devices are 'non-raw' and go through the cache. All
'raw' devices are char devices (like a tape). As a side effect, you can
not mount a fs on a tape. If it were a (non-buffered) block device you
might be able to do it, although it's really not recommended to do it that
way.

Now, some people say they don't want buffering for a disk, or their
application is more smart to do that, so they want the raw devices.
Theoretically you can use O_SYNC, but I dunno how well this is
implemented. Of course, it has at least to go through some layer to decide
not to buffer or to immediately flush it, so a raw device could be faster.

And there are always the voices that say that the kernel buffer is sooo
smart and better in buffering than any application code so a raw device is
pointless. Apparently some database gurus now pushed hard enough to have
raw devices added to a new kernel (if there really will be raw devices in
the future (1st time I heard that)).

> I did a small test :
> (2) fd=open("/dev/cdrom",O_RDONLY);
> read(fd,buffer,1);
> wait_a_second();
> read(fd,buffer,1);
>
> When the second read is done , the CD-drive spins the CD up and the drive LED
> blinks. No sign of caching. ( kernel v2.2.12 )

If you really did this, the second read does not read the same as the
first. So, it could not be satisfied from the cache. A 2nd read from the
hardware is ok. What you could claim is: there is no read ahead.

Now, you say, but I only read 1 byte, the next one is on the same block,
so in cache already. Well, but /dev/cdrom is a block device, you can't
read one byte at all. If you check return code, you'll probably notice the
read failed.

All that said; In addition: a cdrom is removable. The kernel has to check
the CD didn't change. If you notice the cdrom stops in that period, when
asked if the cd did change after spin down, the CD might want to spin up
first before being sure it was not changed. Now, the 'not-changed' check
might be done by locking the drawer when the device is opened (if
possible) or something like that, but I only know SCSI so I can't tell how
this is done with EIDE cdroms. So, the CDrom really might be a bad example
here. Anyway, caching is to optimize quick accesses, the time until a
drive spins down is long. Many things can happen in that time.

> And even if there were, using the O_SYNC flag should tell the driver :
> "Don't cache this , dude !" and there is no need for new device files.
>
> A related question : Why are accesses to devices ( /dev/cdrom and /dev/fd0 are
> the ones I tested ) NOT cached ?

Accesses to /dev/cdrom and /dev/fd0 ARE cached. Esp. /dev/fd0. Did you
ever read/list a tar from a diskette? It needs some time. If you
list/restore it again without replacing the diskette the answer is
typically instanteous. In all cases, something else going on on your
system might have cause some of these buffers to be cleaned.

> Am I missing something important ?

To me, this rather looks like: read(fd, buffer, 1) starts the disk then
realizes you don't read a whole buffer and fails w/o doing anything. The
second has no buffer to satisfy the request, so starts the disk then sees
you don't read a buffer and fails.

So, you can say, read(fd, buffer, 1) should immediately fail and not start
the device at all. But probably it makes no sense to optimize the system
for broken code and one can also argue that there might be another problem
with the device (no/unreadable media) and the read should check that first
b4 returning an error (or just report: 0 bytes read) about the unsupported
block size.

First suggestion: retry your test code with a multiple of the block size.

Michael.

--

Michael Weller: eowmob@exp-math.uni-essen.de, eowmob@ms.exp-math.uni-essen.de, or even mat42b@spi.power.uni-essen.de. If you encounter an eowmob account on any machine in the net, it's very likely it's me.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/