Re: All the problems with 2.2.8/2.3.x and bdflush/update

Steve Willer (willer@interlog.com)
Thu, 13 May 1999 21:51:48 -0400 (EDT)


[I'm really unsure about this huge cc list, but the kernel list is giving
me 12-hour delays, so I thought I'd keep it for now]

On Fri, 14 May 1999, Andrea Arcangeli wrote:

> I think that we shouldn't kill update in 2.2.x. I bet many people will
> continue to run it and they will harm performances.
>
> There is _no_ one good point in killing update other than saving some page
> of memory for the update task (kernel stack and some minor thing).

Ideally, the flushing algorithm would be tightly coupled to the wakeup
time calculation. update is quite simplistic, and also isn't part of the
kernel proper. This makes it harder to improve buffer performance.

> >2. If you want your disks to spin down, try mounting filesystems with
> >noatime. I do not like the idea of postponing writes indefinitely at
> >all.
>
> I don't understand exactly why you talk about noatime...

Having an atime means that a system (on a laptop) that reads files, but
doesn't write to any, will still want to wake up the disk. Some people
suggested that turning off writes by killing update is a great way to let
some guy sleep (I guess hard drives keep him up), but this has some pretty
big recoverability issues as in ("you lose everything if it crashes").

> >3. The appended patch should correct the buffer backlog problems
> >observed by Steve Willer. It is NOT ready to go into the official
>
> What is this backlog problem?

The per-wakeup write limit was too small, and there was no hard limit of
dirty buffer age. The result was that in situations where there's a lot of
write I/O happening, the disk would be underutilized, but then the buffer
would eventually hit its max dirty buffer percentage and it would start
writing like crazy. It was really bad with update running, doing its sync
every 30s. System capacity would drop to 50% for 5s every 30s. Very bad.
In short, the writes weren't being paced.

The problem was alleviated by increasing the write limit to 5000, but the
whole algorithm is pretty simplistic. I would love to see it scale up its
writes depending on how dirty the buffer is, the amount of idle time left,
and how old the dirty buffers are. That's hard to implement in the kernel
when update is controlling part of the algorithm.

I guess you can file this particular problem under 'scalability'. Buffer
performance is critical, but so is consistent performance. Particularly on
systems that periodically do heavy writes, and have maybe 200MB buffer
sizes.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/