Re: PATCH: Raw device IO for 2.1.131

Khimenko Victor (khim@sch57.msk.ru)
Mon, 21 Dec 1998 21:27:07 +0300 (MSK)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: WCOEKAER.US.ORACLE.COM: "Re: PATCH: Raw device IO for 2.1.131"
Previous message: Jamie Seeman: "Reboot on "calibrating the delay loop""

In <19981221180256.B27974@arthur.zdv.Uni-Mainz.DE> Dominik Kubla (dominik.kubla@uni-mainz.de) wrote:
DK> On Wed, Dec 16, 1998 at 11:01:54AM +0300, Khimenko Victor wrote:
>> > OK, a possible "technical" problem is, I want to have 2 linux boxes(or more)
>> > connected to the same scsi disks. (twin tailed or what have you). I have
>> > running 2 instances of the same software both accessing those disks. For
>> > obvious reasons, load balancing, spread load of jobs, and failover, if a
>> > node fails, at least the other instance still has access to the disk and can
>> > RECOVER the data. Because my logfiles are also 'shared' so I can access the
>> > other node's logfiles and recover from that.
>>
>> I'm could not see how this all will work without specially designed software
>> and hardware !

DK> So what? That just tells you that you don't know everything: A lot of people
DK> could show you how it is done reliably with of the shelf hardware and software.

"Specially designed hardware" != "not shelf hardware" :-) BTW hardware part is
more flexible here then software part ...

>> Since this problem is not raised in this thread yet :-)) IMO the only clear
>> solution would be changes in ext2fs or may be special filesystem.
DK> [...]
>> > There is also the fact that raw io for databases IS faster. Whatever type
>> > filesystem you design, doesn't matter since we know which blocks to write
>> > where. An index entry points to a specific block/file/slot so its easy to
>> > calculate the offset in the 'file' ;) And except for full table scans, the
>> > data is spread allover the place, so read-ahead into buffercache doesn't do
>> > didley squad in that case.
>>
>> But you still should keep track of space used for different tables in
>> database :-)) This is EXACTLY filesystem work. Of course you could make
>> internal filesystem in database but of course much more clear way is to
>> fix/extend existing filesystem.

DK> No it is not, ever heard of OS/400?

Of course. There are database as core of the system. *nix is not designed this
way. Both designs has benefits but analogy: both swiss roll and stockfish
are nice food while swiss roll with stockfish is ...

DK> And in addition a filesystem can not do all the things databases would like
DK> it to do unless the filesystem was specially tailored for the specific
DK> database APPLICATION.

DK> RAW DEVICES are simply a short cut of the system to allow databases to use
DK> "filesystems" specially tailored for the specific application (the on site
DK> application, not the database as distributed by the vendor). Not more and not
DK> less. So why don't they use the VFS API provided by most (but not all!)
DK> systems? Simple: because this makes the database system-dependant and
DK> unnecessary complex. And complexity kills software.

Exactly. And raw devices support makes kernel unnecessary complex and (even more
important!) much more flexible. "And complexity kills software" :-)) And since
this is not needed by 99% (more like 99.99% :-) of users mainstream kernel will
not include raw devices support :-))

DK> Above you were stating that special filesystems would be the solutions
DK> to the problem. They are and the only system independant way to do this are
DK> raw devices.

Create module for raw access. Create special filesystem (better way IMO). Do not
pollute mainstream kernel with stuff needed by only 1% (more like 0.01%) of
users just as "temporary solution". Linux is not Solaris or HP-UX: it could be
tweacked as needed and when needed.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: WCOEKAER.US.ORACLE.COM: "Re: PATCH: Raw device IO for 2.1.131"
Previous message: Jamie Seeman: "Reboot on "calibrating the delay loop""