Re: [Lguest] lguest, virtio_blk, id is not a head!
From: Andre Grueneberg
Date: Mon Dec 15 2008 - 22:24:41 EST
Hi Rusty,
Rusty Russell wrote:
> > Lately I upgraded my host and guest from kernel 2.6.25 (Debian) to
> > vanilla 2.6.27.8. I have two virtual machines and in both I recognised
> > the same issue after a while.
> >
> > After running for some hours the first guest showed the following
> > message on the console:
> > virtio_blk virtio1: id 1 is not a head!
> This is interesting. There hasn't been a great deal of change between
> those versions on the lguest side. This sounds like a race, and the only
> thing I can see which would have introduced this is the change below.
I removed the "lguest: turn Waker into a thread, not a process" patch as
proposed. Short result: The problem still exists.
Longer story:
As this issue is extremly hard to reproduce, I made an attempt to cause
many file system reads. If I understood things correctly this is when
the guest side of virtio_blk is likely to call vring_get_buf?! I did so
by creating a large file (300MiB vs. 256MiB memory) in the guest and
reading it multiple times in 1k pieces (dd).
To verify that it works, I ran this loop on 2.6.27.8 guest with the
original launcher: failure after ~200 iterations.
So I went on with the patched version of lguest (launcher with waker
process) and the same 2.6.27.8 guest: failure after ~600 iterations.
Next step was a 2.6.25 guest with the original launcher: I stopped it
after ~1600 iterations did not show a fault.
As it occured to me that using a file in a guest's file system might not
be the optimal target, I have now modified my loop to use the first
300MiB of the block device itself. With the original combination, I now
got the failure after ~750 iterations.
I am now running this test with 2.6.25 guest again. As the host is also
serving some serious purpose, I can only run this kind of long lasting
tests during night time. So I'll be forced to stop this test soon (but
I'll move into bed first).
My next step will be to use some non-root block device, to have a chance
to access the guest after the crash. Do you have any idea what to look
at then?
One thing, I just recognised:
While my 2.6.25 guest kernel is configured with "NOHz mode", the
2.6.27.8 guest is using "high resolution mode".
Thanks so far,
André
--
Reality is a crutch for people who can't handle drugs.
Attachment:
signature.asc
Description: Digital signature