Re: [PATCH v4 bpf-next 0/8] bpf_prog_pack followup

From: Aaron Lu
Date: Mon Jun 20 2022 - 23:26:29 EST


On Mon, Jun 20, 2022 at 07:51:24PM -0700, Song Liu wrote:
> On Mon, Jun 20, 2022 at 6:32 PM Aaron Lu <aaron.lu@xxxxxxxxx> wrote:
> >
> > On Mon, Jun 20, 2022 at 09:03:52AM -0700, Song Liu wrote:
> > > Hi Aaron,
> > >
> > > On Mon, Jun 20, 2022 at 4:12 AM Aaron Lu <aaron.lu@xxxxxxxxx> wrote:
> > > >
> > > > Hi Song,
> > > >
> > > > On Fri, May 20, 2022 at 04:57:50PM -0700, Song Liu wrote:
> > > >
> > > > ... ...
> > > >
> > > > > The primary goal of bpf_prog_pack is to reduce iTLB miss rate and reduce
> > > > > direct memory mapping fragmentation. This leads to non-trivial performance
> > > > > improvements.
> > > > >
> > > > > For our web service production benchmark, bpf_prog_pack on 4kB pages
> > > > > gives 0.5% to 0.7% more throughput than not using bpf_prog_pack.
> > > > > bpf_prog_pack on 2MB pages 0.6% to 0.9% more throughput than not using
> > > > > bpf_prog_pack. Note that 0.5% is a huge improvement for our fleet. I
> > > > > believe this is also significant for other companies with many thousand
> > > > > servers.
> > > > >
> > > >
> > > > I'm evaluationg performance impact due to direct memory mapping
> > > > fragmentation and seeing the above, I wonder: is the performance improve
> > > > mostly due to prog pack and hugepage instead of less direct mapping
> > > > fragmentation?
> > > >
> > > > I can understand that when progs are packed together, iTLB miss rate will
> > > > be reduced and thus, performance can be improved. But I don't see
> > > > immediately how direct mapping fragmentation can impact performance since
> > > > the bpf code are running from the module alias addresses, not the direct
> > > > mapping addresses IIUC?
> > >
> > > You are right that BPF code runs from module alias addresses. However, to
> > > protect text from overwrites, we use set_memory_x() and set_memory_ro()
> > > for the BPF code. These two functions will set permissions for all aliases
> > > of the memory, including the direct map, and thus cause fragmentation of
> > > the direct map. Does this make sense?
> >
> > Guess I didn't make it clear.
> >
> > I understand that set_memory_XXX() will cause direct mapping split and
> > thus, fragmented. What is not clear to me is, how much impact does
> > direct mapping fragmentation have on performance, in your case and in
> > general?
> >
> > In your case, I guess the performance gain is due to code gets packed
> > together and iTLB gets reduced. When code are a lot, packing them
> > together as a hugepage is a further gain. In the meantime, direct
> > mapping split (or not) seems to be a side effect of this packing, but it
> > doesn't have a direct impact on performance.
> >
> > One thing I can imagine is, when an area of direct mapping gets splited
> > due to permission reason, when that reason is gone(like module unload
> > or bpf code unload), those areas will remain fragmented and that can
> > cause later operations that touch these same areas using more dTLBs
> > and that can be bad for performance, but it's hard to say how much
> > impact this can cause though.
>
> Yes, we have data showing the direct mapping remaining fragmented
> can cause non-trivial performance degradation. For our web workload,
> the difference is in the order of 1%.

Many thanks for the info, really appreciate it.

Regards,
Aaron