Re: [patch 5/9] x86/ioport: Reduce ioperm impact for sane usage further

From: Brian Gerst
Date: Thu Nov 07 2019 - 21:12:52 EST


On Thu, Nov 7, 2019 at 8:12 PM H. Peter Anvin <hpa@xxxxxxxxx> wrote:
>
> On 2019-11-07 13:44, Linus Torvalds wrote:
> > On Thu, Nov 7, 2019 at 1:00 PM Brian Gerst <brgerst@xxxxxxxxx> wrote:
> >>
> >> There wouldn't have to be a flush on every task switch.
> >
> > No. But we'd have to flush on any switch that currently does that memcpy.
> >
> > And my point is that a tlb flush (even the single-page case) is likely
> > more expensive than the memcpy.
> >
> >> Going a step further, we could track which task is mapped to the
> >> current cpu like proposed above, and only flush when a different task
> >> needs the IO bitmap, or when the bitmap is being freed on task exit.
> >
> > Well, that's exactly my "track the last task" optimization for copying
> > the thing.
> >
> > IOW, it's the same optimization as avoiding the memcpy.
> >
> > Which I think is likely very effective, but also makes it fairly
> > pointless to then try to be clever..
> >
> > So the basic issue remains that playing VM games has almost
> > universally been slower and more complex than simply not playing VM
> > games. TLB flushes - even invlpg - tends to be pretty slow.
> >
> > Of course, we probably end up invalidating the TLB's anyway, so maybe
> > in this case we don't care. The ioperm bitmap is _technically_
> > per-thread, though, so it should be flushed even if the VM isn't
> > flushed...
> >
>
> One option, probably a lot saner (if we care at all, after all, copying 8K
> really isn't that much, but it might have some impact on real-time processes,
> which is one of the rather few use cases for direct I/O) would be to keep the
> bitmask in a pre-formatted TSS (ioperm being per thread, so no concerns about
> the TSS being in use on another processor), and copy the TSS fields (88 bytes)
> over if and only if the thread has been migrated to a different CPU, then
> switch the TSS rather than switching For the common case (no ioperms) we use
> the standard per-cpu TSS.
>
> That being said, I don't actually know that copying 88 bytes + LTR is any
> cheaper than copying 8K.

I don't think that can work. The TSS has to be at a fixed address in
the cpu_entry_area so that it is visible when running in usermode
(thanks to Meltdown).

--
Brian Gerst