Re: [PATCH] btrfs: remove BUG_ON used as assertions

From: Josef Bacik
Date: Wed Dec 18 2019 - 14:54:50 EST


On 12/18/19 11:47 AM, David Sterba wrote:
On Wed, Dec 18, 2019 at 11:38:18AM -0500, Josef Bacik wrote:
On 12/15/19 12:12 PM, Aditya Pakki wrote:
alloc_extent_state_atomic() allocates extents via GFP_ATOMIC flag
and cannot fail. There are multiple invocations of BUG_ON on the
return value to check for failure. The patch replaces certain
invocations of BUG_ON by returning the error upstream.

Signed-off-by: Aditya Pakki <pakki001@xxxxxxx>

I already tried this a few months ago and gave up. There are a few things if
you want to tackle something like this

1) use bpf's error injection thing to make sure you handle every path that can
error out. This is the script I wrote to do just that

https://github.com/josefbacik/debug-scripts/blob/master/error-injection-stress.py

2) We actually can't fail here. We would need to go back and make _all_ callers
of lock_extent_bits() handle the allocation error. This is theoretically
possible, but a giant pain in the ass. In general we can make allocations here
and we need to be able to make them.

3) We should probably mark this path with __GFP_NOFAIL because again, this is
locking and we need locking to succeed.

NOFAIL can introduce loops that could lead to deadlocks, if not used
carefully. __set_extent_bit is not just locking, so if one thread wants
to set bits, allocate, wait, allocator goes to write some memory

eg.

set_extent_bit on some range
alloc state (NOFAIL)
allocator wants to flush dome dirty data
------------------------------>
set_extent_bit
alloc state (NOFAIL)
(wait)


Yes obviously I just want it for EXTENT_LOCKED. But we could even just use a mempool to be really safe, since most places are going to use GFP_KERNEL or something else related, we only really need the safety in a few critical areas. Thanks,

Josef