Re: Linux should better cope with power failure

From: Jeremy Jackson (jerj@coplanar.net)
Date: Mon Mar 19 2001 - 17:15:38 EST


"Richard B. Johnson" wrote:

> On Mon, 19 Mar 2001, Brian Gerst wrote:
> [SNIPPED...]
>
> >
> > At the very least the disk should be consistent with memory. If the
> > dirty pages aren't written back to the disk (but not necessarily removed
> > from memory) after a reasonable idle period, then there is room for
> > improvement.
> >
>
> Hmmm. Now think about it a minute. You have a database operation
> with a few hundred files open, most of which will be deleted after
> a sort/merge completes. At the same time, you've got a few thousand
> directories with their ATIME being updated. Also, you have thousands
> of temporary files being created in /tmp during a compile that didn't
> use "-pipe".
>
> If you periodically write everything to disk, you don't have many
> CPU cycles available to do anything useful.
>
> It is up to the application to decide if anything is 'precious'.
> If you've got some database running, it's got to be checkpointed
> with some "commitable" file-system so it can be interrupted at any time.
>
> If you make your file-systems up of "slices", you can mount the
> executable stuff read/only. Currently, only /var and /tmp need to
> be writable for normal use, plus any user "slices", of course.
> -- Yes I know you need to modify /etc/stuff occasionally (startup
> and shutdown, plus an occasional password change). I proposed
> a long time ago that /etc/mtab get moved to /var.

so how does mount update /var/mtab when mounting var? he he.

Actually, I think /etc/mtab is not needed at all. Originally, UNIX
used to put as much onto the disk (and not in "core") as possible.
so much state information related only to one boot-cycle was
taken out of kernel and stored on disk. /var/run/utmp, /etc/mtab,
, rmtab, and many others. all are invalidated by a reboot, and are yet
stored
in non-volatile storage. kernel memory is not swappable, so they manually
separated out the minimum needed in core.

Linux currently has a lot of this info in core, and maintains the disk files
for backwards compatibility. in the case of /etc/mtab, I believe
/proc/mounts
has the same info. It appears to be in the same format as /etc/mtab,
so most of the groundwork has already been done.
i've considered trying just changing /etc/mtab to /proc/mounts in some
utilities, to remove the need for read-write root. This (and other cases)
would guarantee consistency (look at /etc/mtab after restart in single
user more - ugh)

I wonder if embedded folks would like to at least keep the old behaviour
as a compile-time option; they're in almost the same boat as early (64k
core memory) unix folks.

My favorite compromise between journaling and performance is the
compaq smart array controllers, with a battery-backed sram
write cache; they can do (fast)lazy writes and still be perfectly reliable.
plus they keep *everything* reliable, not just metadata.

I find this a fascinating topic... the ultimate would be to use the nvram
(it's mostly empty if using LinuxBIOS) to store a clean shutdown flag,
and/or a system heartbeat timestamp (like syslogd's)... only this timestamp
would let the hdd spin down (not hit every 20 minutes or so with a timestamp
log entry) and not wear out a flash disk based system.

Regards,

Jeremy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Mar 23 2001 - 21:00:13 EST