Re: [patch 5/9] x86/ioport: Reduce ioperm impact for sane usage further

From: Andy Lutomirski
Date: Sun Nov 10 2019 - 12:17:44 EST


On 11/7/19 12:25 AM, Ingo Molnar wrote:
>
> * Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
>> On Wed, Nov 6, 2019 at 12:57 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>>>
>>> Calculate both the position of the first zero bit and the last zero bit to
>>> limit the range which needs to be copied. This does not solve the problem
>>> when the previous tasked had only byte 0 cleared and the next one has only
>>> byte 65535 cleared, but trying to solve that would be too complex and
>>> heavyweight for the context switch path. As the ioperm() usage is very rare
>>> the case which is optimized is the single task/process which uses ioperm().
>>
>> Hmm.
>>
>> I may read this patch wrong, but from what I can tell, if we really
>> have just one process with an io bitmap, we're doing unnecessary
>> copies.
>>
>> If we really have just one process that has an iobitmap, I think we
>> could just keep the bitmap of that process entirely unchanged. Then,
>> when we switch away from it, we set the io_bitmap_base to an invalid
>> base outside the TSS segment, and when we switch back, we set it back
>> to the valid one. No actual bitmap copies at all.
>>
>> So I think that rather than the "begin/end offset" games, we should
>> perhaps have a "what was the last process that used the IO bitmap for
>> this TSS" pointer (and, I think, some sequence counter, so that when
>> the process updates its bitmap, it invalidates that case)?
>>
>> Of course, you can do *nboth*, but if we really think that the common
>> case is "one special process", then I think the begin/end offset is
>> useless, but a "last bitmap process" would be very useful.
>>
>> Am I missing something?
>
> In fact on SMP systems this would result in a very nice optimization:
> pretty quickly *all* TSS's would be populated with that single task's
> bitmap, and it would persist even across migrations from CPU to CPU.
>
> I'd love to get rid of the offset caching and bit scanning games as well
> - it doesn't really help in a number of common scenarios and it
> complicates this non-trivial piece of code a *LOT* - and we probably
> don't really have the natural testing density of this code anymore to
> find any regressions quickly.

I think we should not over-optimize this. I am all for penalizing
ioperm() and iopl() users as much as is convenient for us. There is
simply no legitimate use case. Sorry, DPDK, but "virtio-legacy sucks,
let's optimize the crap out of something that is slow anyway and use
iopl()" is not a good excuse. Just use the %*!7 syscall to write to
/sys/.../resource0 and suck up the probably negligible performance hit.
And tell your customers to upgrade their hypervisors. And quite
kvetching about performance of the control place on an old
software-emulated NIC while you're at it.

For the TLB case, it's worth tracking who last used which ASID and
whether it's still up to date, since *everyone* uses the MMU. For
ioperm, I don't really believe this is worth it.

>
> So intuitively I'd suggest we gravitate towards the simplest
> implementation, with a good caching optimization for the single-task
> case.

I agree with the first bit, but caching on an SMP system is necessarily
subtle. Some kind of invalidation is needed.

>
> I.e. the model I'm suggesting is that if a task uses ioperm() or iopl()
> then it should have a bitmap from that point on until exit(), even if
> it's all zeroes or all ones. Most applications that are using those
> primitives really need it all the time and are using just a few ioports,
> so all the tracking doesn't help much anyway.
>
> On a related note, another simplification would be that in principle we
> could also use just a single bitmap and emulate iopl() as an ioperm(all)
> or ioperm(none) calls. Yeah, it's not fully ABI compatible for mixed
> ioperm()/iopl() uses, but is that ABI actually being relied on in
> practice?
>

Let's please keep the ABI. Or rather, let's attempt to eventually
remove the ABI, but let's not change it in the mean time please.