Re: [RFC] Unify KVM kernel-space and user-space code into a singleproject

From: Avi Kivity
Date: Mon Mar 22 2010 - 16:16:22 EST


On 03/22/2010 10:06 PM, Ingo Molnar wrote:
* Avi Kivity<avi@xxxxxxxxxx> wrote:

On 03/22/2010 09:20 PM, Ingo Molnar wrote:
* Avi Kivity<avi@xxxxxxxxxx> wrote:

Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by
Anthony. There's numerous ways that this can break:
I don't like it either. We have libvirt for enumerating guests.
Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution,
obviously.
It doesn't follow. The libvirt daemon could/should own guests from all
users. I don't know if it does so now, but nothing is preventing it
technically.
It's hard for me to argue against a hypothetical implementation, but all
user-space driven solutions for resource enumeration i've seen so far had
weaknesses that kernel-based solutions dont have.

Correct. kernel-based solutions also have issues.

If qemu hangs, the guest hangs a few milliseconds later.
I think you didnt understand my point. I am talking about 'perf kvm top'
hanging if Qemu hangs.

Use non-blocking I/O, report that guest as dead. No point in profiling it, it isn't making any progress.

With a proper in-kernel enumeration the kernel would always guarantee the
functionality, even if the vcpu does not make progress (i.e. it's "hung").

With this implemented in Qemu we lose that kind of robustness guarantee.

If qemu has a bug in the resource enumeration code, you can't profile one guest. If the kernel has a bug in the resource enumeration code, the system either panics or needs to be rebooted later.

And especially during development (when developers use instrumentation the
most) is it important to have robust instrumentation that does not hang along
with the Qemu process.

It's nice not to have kernel oopses either. So when code can be in userspace, that's where it should be.

If qemu fails, you lose your guest. If libvirt forgets about a
guest, you can't do anything with it any more. These are more
serious problems than 'perf kvm' not working. [...]
How on earth can you justify a bug ("perf kvm top" hanging) with that there
are other bugs as well?

There's no reason for 'perf kvm top' to hang if some process is not responsive. That would be a perf bug.

Basically you are arguing the equivalent that a gdb session would be fine to
become unresponsive if the debugged task hangs. Fortunately ptrace is
kernel-based and it never 'hangs' if the user-space process hangs somewhere.

Neither gdb nor perf should hang.

This is an essential property of good instrumentation.

So the enumeration method you suggested is a poor, sub-part solution, simple
as that.

Or, you misunderstood it.

[...] Qemu and libvirt have to be robust anyway, we can rely on them. Like
we have to rely on init, X, sshd, and a zillion other critical tools.
We can still profile any of those tools without the profiler breaking if the
debugged tool breaks ...

You can't profile without qemu.

By your argument it would be perfectly fine to implement /proc purely via
user-space, correct?
I would have preferred /proc to be implemented via syscalls called directly
from tools, and good tools written to expose the information in it. When
computers were slower 'top' would spend tons of time opening and closing all
those tiny files and parsing them. Of course the kernel needs to provide
the information.
(Then you'll be enjoyed to hear that perf has enabled exactly that, and that we
are working towards that precise usecase.)

Are you exporting /proc/pid data via the perf syscall? If so, I think that's a good move.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/