Re: DEVFSv50 and /dev/fb? (or /dev/fb/? ???)

Wim Coekaerts (wcoekaer@earthlink.net)
Wed, 05 Aug 1998 22:16:36 -0700


Chris Wedgwood wrote :

>The syncing of the data hard to disk isn't the issue - its the copy
>cost and sometimes have multiple copies of the data around in
>unnecessary and wasteful of memory - and memory can be very precious
>for database performance.

Well the syncing of data hard to disk is a real issue, the only way we
can provide guarantee that logging information is written to disk is by
a hard sync. use of O_SYNC to logfiles. You don't want a database that
just writes to cache and then doesn't worry anymore what happens. If the
write() call returns you better be sure that its ON disk. Also, if you
have a high OLTP database, the logfiles are written to all the time so
part of the performance depends on how fast things are ON disk. so you
want them on raw partitions...

logfiles are written to sequentially and sequential read/write on raw
devices is a ton faster than on a file system...

The other thing is, the database obviously caches datablocks in the
buffercache, the database buffer cache, and also can/will do read-ahead
of blocks, so the unix buffer cache in this case is useless. You can
make better use of memory by giving it to the database engine rather
than the unix system, whats the point in double caching...

very often there is a selective read of blocks, and well, then the unix
buffercache is kind of useless and in the way. Since the db already
caches. using raw devices, you eliminate that layer so its faster, the
generic idea is that this was definitely true on older unixsystems
because it wasn't as optimized and these days teh unix cache is quite
optimized so the raw device gain is smaller but well 10% is a lot whne
we talk about 1000s of users on a single box...

imagine reading block 1,10000,400,255,9000... thats only 5 blocks, but
in the buffercache you will probably do some read ahead and all, for
waht use ? no one needs 2,3,4,5... and even if the diskheads are on the
spot anyways, it uses up cache memory, and is more processing time to
scan and add blcoks and what not. overhead is overhead.

async IO would be useful, for writes... to datafiles.

>Also, smart database will limit the amount of cache a particular user
>or process can pollute, this is really useful because it means if joe
>user is doing a table scan on a multi-gigabyte table, he won't be
>able to trash more than x% of the cache and adversely affect other
>processes which are running.

Yes, usually, full tablescans are directly loaded in user processes
rather than in the buffercache. At least for Oracle. Like you say, this
prevents a user that does FTS to not trash the cache. And I guess its a
tuneable parameter but thats not the issue here :)

>For killer database speed, people should use raw devices, so I don't >think the 2GB limit is a problem.

well depends on where you want to go, large files are useful for large
database systems, if you want to be able to support vldb systems, well,
you need > 2gb files. its still easier to configure files rather than
rawdevices because they are more flexible... Raw devices would be more
useful if they would be dynamically resizable... LVM for linux ?...AIX
can do this.

>I would be really nice to hear from someone at Oracle or Informix
>about this. Presumably they know more about database performance and
>requirements than most of us here.

;)

Lots of customers that have highend optimized databases today use raw
devices... even if the gain is 'only' 10%, that is a huge difference for
certain types of applications outthere...

optimization is done through good partitioning of data, having enough
controllers and disks and well spread IO. But very much so for logfiles,
rawdevices would provide a very big help... 1- direct write, returns
with an ACK that its on disk, 2- sequential access. 3- bypass
buffercache, no delays no nothing.

not sure if this is really part of the thread thats going on but oh
well.

there is more to it but I guess it kinda quickly sums up what has to be
said...

cheers
wim

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html