> > one not-so obvious problem is that an RDBMS >has< to implement a
> > write-cache for itself. Thus if the block device would be buffered too (in
> > the kernel), then we had double buffering. [as it is buffered now]
>
> Not strictly true. [...]
> You can handle write ordering by using a log-based database (never overwrite
> any old data, so write ordering doesn't matter), and do a "fsync()" on the
> file when you commit. [...]
[ i really dont want to flame ... IMHO it's a very interesting topic which
should be cleared up ]
This brings up problems like locality. A log-based RDBMS has to give up
locality only because the kernel cant guarantee ordering? Log based
filesytems and RDBMSs write fast and read slow. [this is an access pattern
thing. A typical RDMBS application does more reads than writes]
So we have two conflicting constraints [if we accept the current
non-ordered write-cache as our only cache]: locality and ordering. I would
say rather lets change the cache behaviour, and lets force ordering at
that level. And this is how Oracle works [i might be wrong: i have never
seen their code, i can only judge based on documented things].
[ ... i'm ready to stand corrected ]
> device accesses do to the kernel and device layer, and that insight allows me
> to call raw devices a bad idea. I suspect that whatever can be done with raw
> devices can generally be done better (often in another way: I'm not saying
> "done better the SAME way") with a filesystem approach.
how would you achieve locality with a non-ordering write cache?
> The problem is generally the fact that people don't wan to do the better
> way, they want to do it the way they are used to ;)
IMHO, this one is a conceptual problem. You cannot have both transaction
safe and physically localized databases with the current caching scheme.
> And yes, my opinions are definitely coloured by the fact that I don't like
> raw devices. Don't take the above as gospel truth, but rather take it as the
> reason why the raw devices don't exist..
true raw devices are very ugly. Current RDMBS servers are "kernels
implementing a buffer cache and a filesystem by themselves", which is both
ugly and inefficient. But i can see no other way currently.
Ingo