Re: [big performances boost for DataBases] Re: cache killer memory

Andrea Arcangeli (andrea@e-mind.com)
Mon, 26 Apr 1999 00:38:01 +0200 (CEST)


On 25 Apr 1999, Harvey J. Stein wrote:

>Interesting. I'm surprised it makes such a difference. First of all,

I am not surprised instead. Your code is writing on a very small region of
disk but it's writing, seeking, rewriting to it all the time (and it's
incredibly buffer-query-structure intensive). The disk-working-set is
quite small and it fits all in the buffer cache without problems (it's of
the order of 10mbyte) and if you avoid to sync, you'll never get _one_
disk access for all the run.

>Secondly, I think I originally put in the gdbm_sync calls in an effort
>to force things to disk so that the kernel *would* be able to discard
>pages and not get into a memory bind. I'm surprised taking it out

If you fsync you won't discard pages, but it will make pages clean so such
pages will be recycled very fast and more other pages will be allowed to
be dirty at the same time without need to flush something to disk early.
Syncing tells the kernel that such buffers are more appropriate to go to
disk than others.

It make tons of sense to use fsync, but only when you won't need to
_write_ to such region of the file anymore. After the _last_ write if you
fsync you'll probably improve the global VM performances of the system and
you'll also get data integrity (at least supposing you'll run your proggy
only at once).

>Now that I think about it, I think I put in the gdbm_sync calls
>because the system was freezing up for a long time when the program
>was running. Maybe it was freezing up in gdbm_close, which does an

Make tons of sense. The old buffer.c was really quite broken about
flushtime and while handling dirty buffers (this is why I am a bit worried
having to get right also the case of dirty pages in the page cache, it's
not such trivial to tell the kernel what to do with dirty pages that
works well in all possible scenarios...).

>fsync, so the repeated gdbm_sync calls were probably causing lots of
>little freezes instead of one big freeze.

Exactly. With my code this kind of ugly workaround is not needed and
instead it harm a _lot_ performances.

> > Here it takes 9 sec compared to the 2:50 of 2.2.5 and 1:24 of
> > 2.2.5_arca12 ;). But I also changed your app and not only the kenrel (and
> > you must also consider that it's running on a PII 450 and not on a P5).
>
>Not to mention that you also have 128mb of memory & I'm running tests
>on a 32mb machine.

Yes of course. Following this point I suggest you to consider the
possibility that maybe making one machine with 64mbyte of ram from two
machines with 32mbyte of ram, you'll get better global performances ;).
All depends on your I/O throughtput.

> > long time (this doesn't apply to a gdbm_sync at the end of the
> > computation, that is still a good idea).
>
>Given that I'm just building the file & exiting, if the kernel behaves
>well, then I don't see any reason then to do any gdbm_sync calls at
>all. Closing the dbase does an fsync & forces everything to disk
>anyway. Why do you think the additional gdbm_sync calls are a good
>idea?

Closing the DB doesn't do a fsync. That's the whole point. If you do that
you'll get data integrity on disk ASAP (instead you won't get better data
integrity doing sync in the middle of the proggy if you write to disk
after the sync, that's also one of the reason the current buffer.c is
broken according to me). But if you need to run the dbase executables many
times you can also drop the gdbm_sync before exit.

> > I thank you very much for your really nice testcase that allowed me to
> > optimize the kernel for database activities (I bet all db (also DBMS
> > servers) will get a _big_ boost with my new code).
>
>No problem! You don't know how happy I am that this is getting looked
>into.

I am happy too ;).

>But, it's not clear to me that DBMS servers will behave similarly -
>gdbm uses hash tables.

I expect them to perform better at least as a kernel compile performs
better.

But yes maybe the boost won't be of the order of gdbm databases. Maybe
dbms uses logging storage and so they never rewrite to the same region of
disk two times (well I don't know DBMS internals). But if they rewrite to
the same region of disk many times as gdbm does, the boost will be
_impressive_ too (removing all possible fsync in the middle of two
consecutive writes of course ;).

Anyway I think it worth to try. My code is at least not obvously buggy and
I think that it worth to try it out on a real DBMS and doing some
benchmark.

> > Ah and I am using rb-trees in both page cache and buffer cache (no
> > hashtable anymore), and your proggy (as every db software I think) stress
> > very _much_ find_buffer looking the profiling results I showed you above.
> >
> > All tests above are been run here over 2.2.6_andrea2.bz2, so I would be
> > glad if you would try my new code and feedback on your machines too ;).
> >
> > ftp://e-mind.com/pub/andrea/kernel/2.2.6_andrea2.bz2
>
>Downloading now. Dying to try it out. Why is it _andrea again

Ah, forget to tell ;) there are two fast mirrors (but maybe the are not
uptodate yet, tomorrow will be though):

ftp://ftp.linux.it/pub/People/andrea/
(Italy, Thanks to linux.it guys)

ftp://master.softaplic.com.br/pub/andrea/
(Brazil, Thanks to Edesio Costa e Silva <edesio@acm.org>, 2MBits/sec)

>instead of _arca? And what exactly is the difference btw 2.2.6_arca1

Because you all call me Andrea ;). All my friends instead call me `arca'
since I was at mid school (from the surname obviously ;). And I also seen
that there's just some software company called arca and so I preferred to
call it andrea to avoid any kind of mess. But the name won't make
differences at all, the patch-set is the same.

>& 2.2.6_andrea2? I already tested with 2.2.6_arca1 (vs 2.2.6). Here
>are the full results:

Thanks for your great feedback.

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/