Re: [PATCH net-next] net: Implement fault injection forcing skb reallocation

From: Breno Leitao
Date: Tue Oct 08 2024 - 07:11:54 EST

Next message: Russell King (Oracle): "Re: [PATCH net-next] net: phy: realtek: check validity of 10GbE link-partner advertisement"
Previous message: kernel test robot: "Re: [PATCH v10 7/7] remoteproc: stm32: Add support of an OP-TEE TA to load the firmware"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Oct 07, 2024 at 07:00:28PM +0100, Pavel Begunkov wrote:
> On 10/7/24 18:09, Breno Leitao wrote:
> > Hello Pavel,
> >
> > On Mon, Oct 07, 2024 at 05:48:39PM +0100, Pavel Begunkov wrote:
> > > On 10/7/24 17:20, Breno Leitao wrote:
> > > > On Sat, Oct 05, 2024 at 01:38:59PM +0900, Akinobu Mita wrote:
> > > > > 2024年10月2日(水) 20:37 Breno Leitao <leitao@xxxxxxxxxx>:
> > > > > >
> > > > > > Introduce a fault injection mechanism to force skb reallocation. The
> > > > > > primary goal is to catch bugs related to pointer invalidation after
> > > > > > potential skb reallocation.
> > > > > >
> > > > > > The fault injection mechanism aims to identify scenarios where callers
> > > > > > retain pointers to various headers in the skb but fail to reload these
> > > > > > pointers after calling a function that may reallocate the data. This
> > > > > > type of bug can lead to memory corruption or crashes if the old,
> > > > > > now-invalid pointers are used.
> > > > > >
> > > > > > By forcing reallocation through fault injection, we can stress-test code
> > > > > > paths and ensure proper pointer management after potential skb
> > > > > > reallocations.
> > > > > >
> > > > > > Add a hook for fault injection in the following functions:
> > > > > >
> > > > > > * pskb_trim_rcsum()
> > > > > > * pskb_may_pull_reason()
> > > > > > * pskb_trim()
> > > > > >
> > > > > > As the other fault injection mechanism, protect it under a debug Kconfig
> > > > > > called CONFIG_FAIL_SKB_FORCE_REALLOC.
> > > > > >
> > > > > > This patch was *heavily* inspired by Jakub's proposal from:
> > > > > > https://lore.kernel.org/all/20240719174140.47a868e6@xxxxxxxxxx/
> > > > > >
> > > > > > CC: Akinobu Mita <akinobu.mita@xxxxxxxxx>
> > > > > > Suggested-by: Jakub Kicinski <kuba@xxxxxxxxxx>
> > > > > > Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
> > > > >
> > > > > This new addition seems sensible. It might be more useful to have a filter
> > > > > that allows you to specify things like protocol family.
> > > >
> > > > I think it might make more sense to be network interface specific. For
> > > > instance, only fault inject in interface `ethx`.
> > >
> > > Wasn't there some error injection infra that allows to optionally
> > > run bpf? That would cover the filtering problem. ALLOW_ERROR_INJECTION,
> > > maybe?
> >
> > Isn't ALLOW_ERROR_INJECTION focused on specifying which function could
> > be faulted? I.e, you can mark that function as prone for fail injection?
> >
> > In my the case I have in mind, I want to pass the interface that it
> > would have the error injected. For instance, only inject errors in
> > interface eth1. In this case, I am not sure ALLOW_ERROR_INJECTION will
> > help.
>
> I've never looked into it and might be wrong, but I view
> ALLOW_ERROR_INJECTION'ed functions as a yes/no (err code) switch on
> steroids enabling debug code but not doing actual failing. E.g.

Right. I think there are two things here:

1) A function that could fail depending on your failure injection
request. For instance, you can force ALLOW_ERROR_INJECTION functions to
fail in certain conditions. See the documentation:

/*
* Whitelist generating macro. Specify functions which can be error-injectable
* using this macro. (ALLOW_ERROR_INJECTION)

For instance, you can mark any random function as part error injectable.
This is not the case for the problem this patch is solving.

2) There are helpers that will query the fault injection mechanism to
decide if a given function should fail or not. This is exactly what
should_fail_bio() does. These are helpers that will eventually call
should_fail().

in my patch, this is done by skb_might_realloc() function, where it
calls should_fail(), and if the fault injection mechanism says it is
time to "fail", then it does (in this patch context, failure means
forcing the skb to be reallocated).

That said, it is unclear to me how ALLOW_ERROR_INJECTION could help to
solve the skb reallocation mechanism.

Next message: Russell King (Oracle): "Re: [PATCH net-next] net: phy: realtek: check validity of 10GbE link-partner advertisement"
Previous message: kernel test robot: "Re: [PATCH v10 7/7] remoteproc: stm32: Add support of an OP-TEE TA to load the firmware"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]