Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core

From: Toke Høiland-Jørgensen

Date: Mon Jun 29 2026 - 07:13:33 EST

Ralf Lici <ralf@xxxxxxxxxxxxx> writes:

> On Tue, 23 Jun 2026 21:59:44 +0200, Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>> Ralf Lici <ralf@xxxxxxxxxxxxx> writes:
>> > On the BPF point specifically: I agree a BPF program should be able to
>> > decide whether to translate. What I am less sure about is whether
>> > redirecting to a netdevice is the best way to expose that. A TC action
>> > (yet another model, I know :)) gives you the same thing in-pipeline and
>> > more directly:
>> >
>> > tc filter add dev wwan0 egress \
>> > bpf obj match.o action ipxlat4to6 domain clat0
>> >
>> > Let BPF make the policy decision, with the native action doing the
>> > translation work that the current BPF CLAT implementations have trouble
>> > with: fragmentation, checksum corner cases, and ICMP error inner
>> > headers (as explained by Beniamino).
>> >
>> > So TC clsact looks like the natural in-kernel replacement for today's
>> > TC-BPF CLAT programs: no extra netdev, you attach to the existing
>> > uplink, direction is explicit, and on egress you sit on the real route
>> > dst, so the synthetic-dst and double-routing problems above just don't
>> > arise. The cost is more moving parts than a single bpf_redirect since
>> > userspace has to manage clsact, filters, priorities and action
>> > lifecycle/cleanup.
>>
>> Hmm, so no one really uses the bpf filter mechanism, since you can just
>> do everything from an action anyway (and with TCX attachment, you can
>> even avoid the overhead of the TC filter/action infrastructure
>> entirely). However, point taken wrt how to integrate this with BPF. I
>> guess the most flexible thing would be to expose the functionality
>> directly (as a kfunc callable from a BPF program). Which also fits with
>> your point below:
>>
>
> Ah, I see, the cls_bpf example was dated, and I like the kfunc angle
> better than a new TC action.
>
> I would probably keep that as the minimal per-packet interface: BPF can
> decide whether a packet should be translated, and the kfunc can do the
> actual translation work for packets whose translated form still fits the
> output MTU. The full 4->6 fragmentation case still looks like
> output-path/harness territory to me, since it is a 1->N fan-out
> operation.

Yeah, that would probably be fine; I would expect that in most cases
you'd want to configure your MTU to avoid fragmentation anyway :)

>> > For a gateway translator, though, I still think a device-bound model is
>> > less natural. There the translation point is more like a forwarding
>> > decision across routes and nexthops, so a route/LWT attachment, or
>> > possibly a netfilter attachment seems easier to reason about. Also, as
>> > you already pointed out while discussing LWT, an admin setting up NAT64
>> > is more likely to reach for an nft rule than for a clsact filter on a
>> > specific device.
>> >
>> > Taking a step back, ipxlat is really a generic translation engine plus a
>> > thin harness around it. So rather than pick one attachment, it might be
>> > worth structuring the engine so different harnesses can drive it.
>> > There's interesting precedent for this shape:
>> >
>> > - ILA, again, is the closest sibling: stateless IPv6 address translation
>> > with a shared core in ila_common.c, driven both by an LWT frontend in
>> > ila_lwt.c and by an inline netfilter hook with a netlink-configured
>> > mapping table in ila_xlat.c.
>> >
>> > - act_ct is the precedent for the TC side specifically: a TC action that
>> > reuses the netfilter conntrack engine rather than reimplementing it.
>> >
>> > And act_nat is the cautionary counter-example: a standalone TC
>> > reimplementation of stateless NAT that shares no code with nf_nat, and
>> > carries a "would be nice to share code" comment :)
>> >
>> > So I am wondering whether the right direction is to factor the
>> > translation engine cleanly, land it with one harness first, and keep the
>> > other attachment points as follow-up work once the core semantics are
>> > settled.
>> >
>> > Does that direction seem reasonable to you?
>>
>> Yes, reusable functionality that can be called from multiple places
>> sounds like a good fit; let's try to structure it that way!
>>
>
> Great, that's the direction I'll take then.
>
>> As for which hook to start with, well, let's see if we hear back from
>> the netfilter devs, but either netfilter or the routing subsystem (LWT
>> style) would be OK for me I think.
>>
>
> Works for me. The engine factoring is common to all of them, so I'll
> start there. Once it's in shape I can sketch a harness against it to
> sanity-check the interface.

Awesome, sounds good!

-Toke