Re: fsync on large files

Zygo Blaxell (uixjjji1@umail.furryterror.org)
16 Feb 1999 18:13:57 -0500


In article <19990214061302.7472.qmail@defiant.cqc.com>,
Alan Curry <pacman-kernel@cqc.com> wrote:
>Simon Kirby responded:
>>Hmm...I have yet to find the fsync()ing useful...I agree that it
>>could help log something helpful, but I don't see disabling fsync() as a
>>kludge, though. Perhaps you should try remote logging and disabling
>
>Someone wiser than me puts a safety feature into a critical daemon, and I
>take it out because it's inconvenient? That's a kludge.

It's only a kludge if you assume that the people who wrote syslogd
are in fact smarter than you. I have reason to disagree.

The people who wrote syslogd with that "safety" feature were university
grad students who have consistently demonstrated a lack of understanding
of real-world computer issues including performance, security, and
administration over the last decade. Their coding was atrocious and they
failed to do some basic research on the API's they were using in a few
notorious cases.

In fact, I think the wiser people are the ones who implemented the
feature to _disable_ the fsync() mode, which actually occurred after the
source code had been bouncing around the Linux world for a few years.

I've found syslogd in fsync() mode next to useless for crash analysis.
For me Linux systems crash three ways:

1. They crash over a period of several hours, days, weeks,
or months. I've had Linux systems that ran for six months after
part of the networking subsystem died a horrible death.

2. They crash by going into a kernel panic instantly, and sometimes
crash before even that.

3. The SCSI subsystem (or a device on it) fails. Really.
This happens to me more often than power failures.

In case #1, there's plenty of time for the normal filesystem to sync.

In case #2, there's not enough time for syslogd to record the messages--
the kernel stops running any processes at all, so syslogd is irrelevant.
The serial console doesn't have this problem. If you want to log crash
messages, use that.

In case #3, sync() on syslogd only exacerbates the problem (that is,
although it doesn't cause the problem in and of itself, it provides extra
opportunities for the problem to occur). Quiz: What's the worst thing
you can do if one of your SCSI disks has bad firmware and can't handle
the load? Send it _more_ I/O requests.

The cost in I/O performance for the unlikely event that syslog without
fsync() will lose a message that syslog with fsync() would _not_ lose
(and there's a window of only a few seconds here) is incredibly high.
There are cases where fsync() still won't help you: it's likely that
anything that can _deliberately_ bring down your system can be turned
into something that can destroy the log files too.

If you have really an entire disk spindle to spend on syslog messages,
put that spindle into a separate dedicated log server (on a separate UPS)
and store your log messages there.

-- 
Zygo Blaxell, Linux Engineer, Corel Corporation, zygob@corel.ca (work),
zblaxell@furryterror.org (play).  It's my opinion, I tell you! Mine! All MINE!
Size of 'diff -Nurw [...] winehq corel' as of Tue Feb 16 17:14:00 EST 1999
Lines/files:  In 277 / 3, Out 20653 / 255, Both 20923 / 256

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/