Re: linux-2.4.0 breaks grub install into partition

From: Khimenko Victor (khim@sch57.msk.ru)
Date: Sun Jul 09 2000 - 13:17:56 EST


In <Pine.LNX.4.21.0007091807560.723-100000@inspiron.random> Andrea Arcangeli (andrea@suse.de) wrote:
AA> [ disclaimer: I have no idea what GRUS is ]

Ok. Then I'll write short explanation (enough to understood discussed
problem). GRUB is bootloader. It DOES NOT use linux's kernel to parse
partition tables and access disk via /dev/hda1 or /dev/hda10. Instead
it has it's own mechanism to find list of partitions and placement of
partitions on disk (it needs such mechanism anyway since THE SAME CODE
is used on bootup when there are no /dev/hda1 but only BIOS's disk 0x80
instead of /dev/hda :-). So even if you ask it to install inself on
linux's partition it'll NOT use /dev/hda1. It will write to /dev/hda in
known (from partition table) place. The question is: how to ask linux
to re-read this new information and not trash it afterwards ? Or there
are just no way to access boot sector of MOUNTED (you can not unmount
root partition, right?) /dev/hda1 via /dev/hda and then ask linux kernel
to update internal tables. Program will do it from root, of course so it
can do privileged oprations just fine. If there are NO way to do so
via /dev/hda1 it'll be acceptable if it's reliable doable via /dev/hda1 at
least (it'll require quite a lot of changes in GRUB but if it's the only
choice then...).

AA> On Sun, 9 Jul 2000, OKUJI Yoshinori wrote:

>> I don't think what GRUB does is a wrong thing basically. Some types
>>of software always need (or want) to access raw devices, for example,
>>FDISK programs, filesystem resizers, and fast database servers. So,

AA> fdisk works on the partition table and the kernel never touch in write
AA> mode the partition table so as far as you run only 1 fdisk/lilo at once,
AA> you're obviously safe. (that's a simple userspace/admin issue)

cfdisk can alter partition label, for example. Partition label sits not
in partition table but in filesystem, so for fdisk this problem also exists.

>>AFAIK, all the realistic operating systems export raw devices to
>>user-level programs and support one or more system calls to keep
>>anything in the kernel consistent.

AA> We need callbacks in the filesystem for sure in order to do live snapshots
AA> of a logical volume and reiserfs should provide it soon IIRC (even if by
AA> doing tricks to allow to mount a not cleanly unmounted journaling fs in
AA> read only, or doing writeable snapshots (to allow journal reply on mount)
AA> we could avoid such callbacks for snapshotting a journaling fs, but we'll
AA> for sure need it for ext2 for example). For ext2 the thing looks pretty
AA> trivial we'll probably only need to grab the superblock lock while doing
AA> the snapshot (so we theorically don't even need the callback in the ext2
AA> case but we need a callback to allow other fs to potentially do other
AA> things of course).

>> For now, the grub shell calls sync() (twice before any operation)
>>and ioctl(fd, BLKFLSBUF, 0) (after and before operations) under

AA> Both things are useless if your object is to read consistent data from a
AA> live fs, so you can remove them. If your object is to write to a live fs
AA> via raw device (note I'm not talking about rawio device here) then you
AA> can't achive your object unless you are also able to tell the fs what you
AA> changed (some metadata can be cached in dentries or also in 2.4.x the
AA> data is never in buffer cache so the fs won't notice your changes).

It worked in 2.2 (with ugly kludge but WORKED). We need SOME way to do some
with 2.4 (it's needed for LiLo as well if you are installing LiLo in /dev/hda1
and not in /dev/hda).

>>Linux. I thought that was enough, since sync should make filesystems
>>and buffer caches consistent, and BLKFLSBUF should flush buffer caches

AA> In 2.2.x when you read from raw device or when you write to raw device
AA> you're assured to see the same data that the fs is seeing too because of
AA> page-cache/buffer-cache costly synchronization.

You are NOT installing GRUB (or LiLo) 10 times per minute. No matter how
costly synchronization is if we have way to force it we are saved.

AA> However on a live filesystem the fs can change from under you while you
AA> take a page fault during the read(/dev/hda) syscall for example

mlock will help here, right ?

AA> and if you're not holding the superblock lock (in the ext2 case) you may
AA> read not consistent data anyway.

I'm not know how to hold superblock lock from userspace unfortunatelly :-/

AA> In 2.4.x buffer cache is completly unsynchronized with page cache so
AA> running BLKFLSBUF would make some more sense there to make sure that the
AA> next time you'll read data from the raw blockdevice you'll see the data
AA> that the kernel _was_ (not _is_) seeing at the time of the BLKFLSBUF but
AA> between the ioctl(BLKFLSBUF) and the read(/dev/hda) you'll be rescheduled
AA> and somebody will write to the page-cache again... (for metadata ext2
AA> 2.2.x and 2.4.x are the same here I was only talking about data here)

:-((

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Jul 15 2000 - 21:00:10 EST