Re: [dpdk-dev] Please stop using iopl() in DPDK
From: Andy Lutomirski
Date: Fri Oct 25 2019 - 20:28:09 EST
> On Oct 25, 2019, at 9:13 AM, Stephen Hemminger <stephen@xxxxxxxxxxxxxxxxxx> wrote:
>
> ïOn Thu, 24 Oct 2019 21:45:56 -0700
> Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
>> Hi all-
>>
>> Supporting iopl() in the Linux kernel is becoming a maintainability
>> problem. As far as I know, DPDK is the only major modern user of
>> iopl().
>>
>> After doing some research, DPDK uses direct io port access for only a
>> single purpose: accessing legacy virtio configuration structures.
>> These structures are mapped in IO space in BAR 0 on legacy virtio
>> devices.
>
> Yes. Legacy virtio seems to have been designed without consideration
> of how to use it in userspace. Xen, Vmware and Hyper-V all use memory
> as a doorbell mechanism which is easier to use from userspace.
>
>
>> There are at least three ways you could avoid using iopl(). Here they
>> are in rough order of quality in my opinion:
>>
>> 1. Change pci_uio_ioport_read() and pci_uio_ioport_write() to use
>> read() and write() on resource0 in sysfs.
>
> The cost of entering the kernel for a doorbell mechanism is too
> expensive and would kill performance.
>
>
>> 2. Use the alternative access mechanism in the virtio legacy spec:
>> there is a way to access all of these structures via configuration
>> space.
>
> There is no way to use memory doorbell on older versions of virtio.
> Users want to run DPDK on old stuff like RHEL6 and even older
> kernel forks. There are even use cases where virtio is used for
> a non-Linux host; such as GCP.
>
>
>> 3. Use ioperm() instead of iopl().
>
> Ioperm has the wrong thread semantics. All DPDK applications have
> multiple threads and the initialization logic needs to work even
> if the thread is started later; threads can also be started by
> the user application.
>
> Iopl applies to whole process so this is not an issue.
This is not true. ioperm() and iopl() have identical thread semantics.
I think what youâre seeing is that you can set iopl(3) early without
knowing which port range to request. You could alternatively set
ioperm() early and ask for a very wide range. In principle, we could
make ioperm() be per thread, but Iâm not sure we should add that kind
of complexity to support a mostly obsolete use case like this.
There's actually an argument to be made that per-mm ioperm would be
easier to handle in the kernel than per-task due to the vagaries of
KPTI.
All this being said, what are the actual performance implications of
write() to /sys/.../resource0? Off the top of my head, I would guess
that the actual OUTB or OUTL instruction itself is incredibly slow due
to being trapped and emulated and that virtio-legacy hypervisors
aren't particularly fast to begin with and that, as a result, the
write() might not actually matter that much.
>
>>
>>
>> We are considering changes to the kernel that will potentially harm
>> the performance of any program that uses iopl(3) -- in particular,
>> context switches will become more expensive, and the scheduler might
>> need to explicitly penalize such programs to ensure fairness. Using
>> ioperm() already hurts performance, and the proposed changes to iopl()
>> will make it even worse. Alternatively, the kernel could drop iopl()
>> support entirely. I will certainly make a change to allow
>> distributions to remove iopl() support entirely from their kernels,
>> and I expect that distributions will do this.
>>
>> Please fix DPDK.
>
> Please fix virtio.
Done, with the new version of virtio :)