Re: [PATCH v2] net: fix sk_page_frag() recursion from memory reclaim

From: David Miller
Date: Mon Oct 28 2019 - 19:18:33 EST

From: Tejun Heo <tj@xxxxxxxxxx>
Date: Thu, 24 Oct 2019 13:50:27 -0700

> sk_page_frag() optimizes skb_frag allocations by using per-task
> skb_frag cache when it knows it's the only user. The condition is
> determined by seeing whether the socket allocation mask allows
> blocking - if the allocation may block, it obviously owns the task's
> context and ergo exclusively owns current->task_frag.
> Unfortunately, this misses recursion through memory reclaim path.
> Please take a look at the following backtrace.
> In [0], tcp_send_msg_locked() was using current->page_frag when it
> called sk_wmem_schedule(). It already calculated how many bytes can
> be fit into current->page_frag. Due to memory pressure,
> sk_wmem_schedule() called into memory reclaim path which called into
> xfs and then IO issue path. Because the filesystem in question is
> backed by nbd, the control goes back into the tcp layer - back into
> tcp_sendmsg_locked().
> nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes
> sense - it's in the process of freeing memory and wants to be able to,
> e.g., drop clean pages to make forward progress. However, this
> confused sk_page_frag() called from [2]. Because it only tests
> whether the allocation allows blocking which it does, it now thinks
> current->page_frag can be used again although it already was being
> used in [0].
> After [2] used current->page_frag, the offset would be increased by
> the used amount. When the control returns to [0],
> current->page_frag's offset is increased and the previously calculated
> number of bytes now may overrun the end of allocated memory leading to
> silent memory corruptions.
> Fix it by adding gfpflags_normal_context() which tests sleepable &&
> !reclaim and use it to determine whether to use current->task_frag.
> v2: Eric didn't like gfp flags being tested twice. Introduce a new
> helper gfpflags_normal_context() and combine the two tests.
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>

Applied and queued up for -stable, thanks Tejun.