Re: [PATCH bpf-next v4 0/3] bpf, arm64: use BPF prog pack allocator in BPF JIT

From: Alexei Starovoitov
Date: Thu Aug 03 2023 - 12:16:21 EST


On Thu, Aug 3, 2023 at 4:13 AM Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> Hi Alexei,
>
> On Wed, Aug 02, 2023 at 02:02:39PM -0700, Alexei Starovoitov wrote:
> > On Sun, Jul 30, 2023 at 10:22 AM Puranjay Mohan <puranjay12@xxxxxxxxx> wrote:
> > >
> > > Hi Mark,
> > > I am really looking forward to your feedback on this series.
> > >
> > > On Mon, Jul 17, 2023 at 9:50 AM Puranjay Mohan <puranjay12@xxxxxxxxx> wrote:
> > > >
> > > > Hi Mark,
> > > >
> > > > On Mon, Jul 3, 2023 at 7:15 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > > > >
> > > > > On Mon, Jul 03, 2023 at 06:40:21PM +0200, Daniel Borkmann wrote:
> > > > > > Hi Mark,
> > > > >
> > > > > Hi Daniel,
> > > > >
> > > > > > On 6/26/23 10:58 AM, Puranjay Mohan wrote:
> > > > > > > BPF programs currently consume a page each on ARM64. For systems with many BPF
> > > > > > > programs, this adds significant pressure to instruction TLB. High iTLB pressure
> > > > > > > usually causes slow down for the whole system.
> > > > > > >
> > > > > > > Song Liu introduced the BPF prog pack allocator[1] to mitigate the above issue.
> > > > > > > It packs multiple BPF programs into a single huge page. It is currently only
> > > > > > > enabled for the x86_64 BPF JIT.
> > > > > > >
> > > > > > > This patch series enables the BPF prog pack allocator for the ARM64 BPF JIT.
> > > > >
> > > > > > If you get a chance to take another look at the v4 changes from Puranjay and
> > > > > > in case they look good to you reply with an Ack, that would be great.
> > > > >
> > > > > Sure -- this is on my queue of things to look at; it might just take me a few
> > > > > days to get the time to give this a proper look.
> > > > >
> > > > > Thanks,
> > > > > Mark.
> > > >
> > > > I am eagerly looking forward to your feedback on this series.
> >
> > Mark, Catalin, Florent, KP,
> >
> > This patch set was submitted on June 26 !
>
> I appreciate this was sent a while ago, but I have been stuck on some urgent
> bug-fixing for the last few weeks, and my review bandwidth is therfore very
> limited.
>
> Given Puranjay had previously told me he was doing this as a side project for
> fun, and given no-one had told me this was urgent, I assumed that this wasn't a
> major blocker and could wait.
>
> I should have sent a holding reply to that effect; sorry.
>
> The series addresses my original concern. However, in looking at it I think
> there may me a wider potential isssue w.r.t. the way instruction memory gets
> reused, because as writtten today the architecture doesn't seem to have a
> guarantee on when instruction fetches are completed and therefore when it's
> safe to modify instruction memory. Usually we're saved by TLB maintenance,
> which this series avoids by design.
>
> I unfortunately haven't had the time to dig into that, poke our architects,
> etc.
>
> So how urgent is this?

The performance wins are substantial.
We'd like to realize them sooner than later.