Re: Getting rid of inside_vm in intel8x0
From: Takashi Iwai
Date: Sat Apr 02 2016 - 01:34:16 EST
On Sat, 02 Apr 2016 00:28:31 +0200,
Luis R. Rodriguez wrote:
>
> On Fri, Apr 01, 2016 at 07:34:10AM +0200, Takashi Iwai wrote:
> > On Fri, 01 Apr 2016 00:26:18 +0200,
> > Luis R. Rodriguez wrote:
> > >
> > > On Wed, Mar 30, 2016 at 08:07:04AM +0200, Takashi Iwai wrote:
> > > > On Tue, 29 Mar 2016 23:37:32 +0200,
> > > > Andy Lutomirski wrote:
> > > > >
> > > > > Would it be possible to revert:
> > > > >
> > > > > commit 228cf79376f13b98f2e1ac10586311312757675c
> > > > > Author: Konstantin Ozerkov <kozerkov@xxxxxxxxxxxxx>
> > > > > Date: Wed Oct 26 19:11:01 2011 +0400
> > > > >
> > > > > ALSA: intel8x0: Improve performance in virtual environment
> > > > >
> > > > > Presumably one or more of the following is true:
> > > > >
> > > > > a) The inside_vm == true case is just an optimization and should apply
> > > > > unconditionally.
> > > > >
> > > > > b) The inside_vm == true case is incorrect and should be fixed or disabled.
> > > > >
> > > > > c) The inside_vm == true case is a special case that makes sense then
> > > > > IO is very very slow but doesn't make sense when IO is fast. If so,
> > > > > why not literally measure the time that the IO takes and switch over
> > > > > to the "inside VM" path when IO is slow?
> > >
> > > BTW can we simulate this on bare metal by throttling an IO bus, or
> > > perhaps mucking with the scheduler ?
> > >
> > > I ask as I wonder if similar type of optimization may be useful
> > > to first simulate with other types of buses for other IO devices
> > > we might use in virtualization environments. If so, I'd be curious
> > > to know if similar type of optimizations might be possible for
> > > other sounds cards, or other IO devices.
> >
> > There aren't so many sound devices requiring such a workaround.
>
> Why not, what makes this special?
The hardware buggyness.
> > > > More important condition is rather that the register updates of CIV
> > > > and PICB are atomic.
> > >
> > > To help with this can you perhaps elaborate a bit more on what the code
> > > does? As I read it snd_intel8x0_pcm_pointer() gets a pointer to some
> > > sort of audio frame we are in and uses two values to see if we are
> > > going to be evaluating the right frame, we use an optimization of
> > > some sort to skip one check for virtual environments. We seem to need
> > > this given that on a virtual environment it is assumed that the sound
> > > card is emulated, and as such an IO read there is rather expensive.
> > >
> > > Can you confirm and/or elaborate a bit more what this does ?
> > >
> > > To try to help understand what is going on can you describe what CIV
> > > and PICB are exactly ?
> >
> > CIV and PICB registers are a pair and we calculate the linear position
> > in a ring buffer from both two.
>
> What type of ring buffer is this ?
A normal PCM ring buffer via PCI DMA transfer.
> > However, they are divorced sometimes
> > under stress, and the position calculated from such values may go
> > backward wrongly. For avoiding it, there is the second read of the
> > PICB register and compare with the previous value, and loop until it
> > matches. This check is skipped on VM.
>
> I see. Is this a software emulation *bug*, or an IO issue due to
> virtualization? I tried to read what the pointer() (that's what its called)
> callback does but since there is no documentation for any of the callbacks I
> have no clue what so ever.
>
> If the former, could a we somehow detect an emulated device other than through
> this type of check ? Or could we *add* a capability of some sort to detect it
> on the driver ? This would not address the removal, but it could mean finding a
> way to address emulation issues.
>
> If its an IO issue -- exactly what is the causing the delays in IO ?
Luis, there is no problem about emulation itself. It's rather an
optimization to lighten the host side load, as I/O access on a VM is
much heavier.
> > > > This is satisfied mostly only on VM, and can't
> > > > be measured easily unlike the IO read speed.
> > >
> > > Interesting, note the original patch claimed it was for KVM and
> > > Parallels hypervisor only, but since the code uses:
> > >
> > > +#if defined(__i386__) || defined(__x86_64__)
> > > + inside_vm = inside_vm || boot_cpu_has(X86_FEATURE_HYPERVISOR);
> > > +#endif
> > >
> > > This makes it apply also to Xen as well, this makes this hack more
> > > broad, but does is it only applicable when an emulated device is
> > > used ? What about if a hypervisor is used and PCI passthrough is
> > > used ?
> >
> > A good question. Xen was added there at the time from positive
> > results by quick tests, but it might show an issue if it's running on
> > a very old chip with PCI passthrough. But I'm not sure whether PCI
> > passthrough would work on such old chipsets at all.
>
> If it did have an issue then that would have to be special cased, that
> is the module parameter would not need to be enabled for such type of
> systems, and heuristics would be needed. As you note, fortunately this
> may not be common though...
Actually this *is* module parametered. If set to a boolean value, it
can be applied / skipped forcibly. So, if there has been a problem on
Xen, this should have been reported. That's why I wrote it's no
common case. This comes from the real experience.
> but if this type of work around may be
> taken as a precedent to enable other types of hacks in other drivers
> I'm very fearful of more hacks later needing these considerations as
> well.
>
> > > > > There are a pile of nonsensical "are we in a VM" checks of various
> > > > > sorts scattered throughout the kernel, they're all a mess to maintain
> > > > > (there are lots of kinds of VMs in the world, and Linux may not even
> > > > > know it's a guest), and, in most cases, it appears that the correct
> > > > > solution is to delete the checks. I just removed a nasty one in the
> > > > > x86_32 entry asm, and this one is written in C so it should be a piece
> > > > > of cake :)
> > > >
> > > > This cake looks sweet, but a worm is hidden behind the cream.
> > > > The loop in the code itself is already a kludge for the buggy hardware
> > > > where the inconsistent read happens not so often (only at the boundary
> > > > and in a racy way). It would be nice if we can have a more reliably
> > > > way to know the hardware buggyness, but it's difficult,
> > > > unsurprisingly.
> > >
> > > The concern here is setting precedents for VM cases sprinkled in the kernel.
> > > The assumption here is such special cases are really paper'ing over another
> > > type of issue, so its best to ultimately try to root cause the issue in
> > > a more generalized fashion.
> >
> > Well, it's rather bare metal that shows the buggy behavior, thus we
> > need to paper over it. In that sense, it's other way round; we don't
> > tune for VM. The VM check we're discussing is rather for skipping the
> > strange workaround.
>
> What is it exactly about a VM that enables this work around to be skipped?
> I don't quite get it yet.
VM -- at least the full one with the sound hardware emulation --
doesn't have the hardware bug. So, the check isn't needed.
> > You may ask whether we can reduce the whole workaround instead. It's
> > practically impossible. We don't know which models doing so and which
> > not. And, the hardware in question are (literally) thousands of
> > variants of damn old PC mobos. Any fundamental change needs to be
> > verified on all these machines...
>
> What if we can come up with algorithm on the ring buffer that would
> satisfy both cases without special casing it ? Is removing this VM
> check impossible really?
Yes, it's impossible practically, see my comment above.
Whatever you change, you need to verify it on real machines. And it's
very difficult to achieve.
thanks,
Takashi