Re: "raw" block devices?

Ingo Molnar (mingo@pc5829.hil.siemens.at)
Thu, 17 Oct 1996 20:03:39 +0100 (MET)


On Thu, 17 Oct 1996, Linus Torvalds wrote:

> > one not-so obvious problem is that an RDBMS >has< to implement a
> > write-cache for itself. Thus if the block device would be buffered too (in
> > the kernel), then we had double buffering. [as it is buffered now]
>
> Not strictly true. [...]

> You can handle write ordering by using a log-based database (never overwrite
> any old data, so write ordering doesn't matter), and do a "fsync()" on the
> file when you commit. [...]

[ i really dont want to flame ... IMHO it's a very interesting topic which
should be cleared up ]

This brings up problems like locality. A log-based RDBMS has to give up
locality only because the kernel cant guarantee ordering? Log based
filesytems and RDBMSs write fast and read slow. [this is an access pattern
thing. A typical RDMBS application does more reads than writes]

So we have two conflicting constraints [if we accept the current
non-ordered write-cache as our only cache]: locality and ordering. I would
say rather lets change the cache behaviour, and lets force ordering at
that level. And this is how Oracle works [i might be wrong: i have never
seen their code, i can only judge based on documented things].

[ ... i'm ready to stand corrected ]

> device accesses do to the kernel and device layer, and that insight allows me
> to call raw devices a bad idea. I suspect that whatever can be done with raw
> devices can generally be done better (often in another way: I'm not saying
> "done better the SAME way") with a filesystem approach.

how would you achieve locality with a non-ordering write cache?

> The problem is generally the fact that people don't wan to do the better
> way, they want to do it the way they are used to ;)

IMHO, this one is a conceptual problem. You cannot have both transaction
safe and physically localized databases with the current caching scheme.

> And yes, my opinions are definitely coloured by the fact that I don't like
> raw devices. Don't take the above as gospel truth, but rather take it as the
> reason why the raw devices don't exist..

true raw devices are very ugly. Current RDMBS servers are "kernels
implementing a buffer cache and a filesystem by themselves", which is both
ugly and inefficient. But i can see no other way currently.

Ingo