Re: Should raw I/O be added to the kernel?

Gerard Roudier (groudier@club-internet.fr)
Thu, 21 Jan 1999 21:39:26 +0100 (MET)


On Thu, 21 Jan 1999, Stephen C. Tweedie wrote:

> Hi,
>
> On Mon, 21 Dec 1998 22:33:06 +0100 (MET), Gerard Roudier
> <groudier@club-internet.fr> said:
>
> > If you want performances, then you also want to queue several IOs to the
> > device at a time, each time it is possible. Imagine now, that you got an
> > IO error on some IO currently being processed by the device. ...
>
> >> Exactly right. Without raw disk IO, you couldn't guarantee a database
> >> to always be in a consistent state on the disk (i.e. it isn't very
>
> > I reply you that even with raw disk IO, it is not that easy to _really_
> > guarantee such a consistency without being _really_ aware of what may
> > _really_ happen with _real_ IOs, and that _real_ IO system services
> > are generally too poor to allow full control on what _really_ happens
> > with IOs.
>
> Yes you can. The way these applications work is to write all of the

Providing integrity using only a synchronous IO semantic is so costly for
performances that I donnot even want to think for a second to such an
approach. With 'full control' I meant 'full control on actual IO ordering
requirements to maintain integrity and consistency. I want to think of an
IO sub-system that allows to ask for 'ordering' requirements of the
reality of IOs. For example, I think of logical IO services that allows
to provide order attibutes that look like those available in SCSI physical
IO management. e.g. : SIMPLE, ORDERED and HEAD of QUEUE. This would allow
the logical IO layers in the O/S and the device to implement optimizations
_and_ also to provide the ordering required by the application.

> data for a commit, and to wait for it to complete before finally writing
> a commit record. That way, provided the OS does not acknowledge the

An IO that has been completed by a device may still be in some device
buffer. If you want to be guarantee that the data are really written to
the device, then you also need to allow the application to have control
via the logical IO layer on the buffering of the physical datas that
are involved in the logical operations the application has asked for.

> write before it actually reaches the disk, the application can guarantee
> enough about the write ordering to be sure about data consistency. Raw
> IO, O_SYNC and fsync() provide enough support for the application to get
> this right, but fsync() does not provide guaranteed IO error
> notification so is not sufficient in the presence of errors.

Raw IO, O_SYNC and 'but fsync() ;-)' is what is common at the moment or
most UNIX systems in order to guarantee consistency of data systems. I
understand that one may want to implement boring commercial applications
using that semantic and that it may work enough for most applications.
Current database managers, for example, want kernels to just be wrappers
to physical IO services and memory management, because they want to have
portable code in user land and they have poorly optimized their software
using the commonly available SYNC/RAW IO services. If their portability
issues, probably motivated by some market domination strategy, lead to
such abominations, then I am not interested in their stuff.

Obviously, I think that any O/S should provide some direct IO services,
since there are applications that really need these services in order
to work correctly, or will make a good use of this feature.
But I disagree strongly with the assertion of Raw IO or sync IO to be
some panacea for dealing with data consistency. It is just what some
dinausors, old-minded or $$-interested people want us to believe, in my
opinion.

Regards,
Gerard.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/