Re: [PATCH v3 3/8] ext4: use __GFP_NOFAIL if allocating extents_status cannot fail

From: Baokun Li
Date: Sun Apr 23 2023 - 23:45:30 EST

Next message: Jason Wang: "Re: [PATCH v3] vhost_vdpa: fix unmap process in no-batch mode"
Previous message: Jason Wang: "Re: [RFC 0/2] vduse: add support for networking devices"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2023/4/13 18:30, Jan Kara wrote:

On Wed 12-04-23 20:41:21, Baokun Li wrote:

If extent status tree update fails, we have inconsistency between what is
stored in the extent status tree and what is stored on disk. And that can
cause even data corruption issues in some cases.

For extents that cannot be dropped we use __GFP_NOFAIL to allocate memory.
And with the above logic, the undo operation in __es_remove_extent that
may cause inconsistency if the split extent fails is unnecessary, so we
remove it as well.

Suggested-by: Jan Kara<jack@xxxxxxx>
Signed-off-by: Baokun Li<libaokun1@xxxxxxxxxx>

When I was looking through this patch, I've realized there's a problem with
my plan :-|. See below...

static struct extent_status *
ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len,
- ext4_fsblk_t pblk)
+ ext4_fsblk_t pblk, int nofail)
{
struct extent_status *es;
- es = kmem_cache_alloc(ext4_es_cachep, GFP_ATOMIC);
+ gfp_t gfp_flags = GFP_ATOMIC;
+
+ if (nofail)
+ gfp_flags |= __GFP_NOFAIL;
+
+ es = kmem_cache_alloc(ext4_es_cachep, gfp_flags);
if (es == NULL)
return NULL;

I have remembered that the combination of GFP_ATOMIC and GFP_NOFAIL is
discouraged because the kernel has no sane way of refilling reserves for
atomic allocations when in atomic context. So this combination can result
in lockups.

Indeed. GFP_NOFAIL is only applicable to sleepable allocations,

GFP_ATOMIC will ignore it. I didn't notice that.

So what I think we'll have to do is that we'll just have to return error
from __es_insert_extent() and __es_remove_extent() and in the callers we
drop the i_es_lock, allocate needed status entries (one or two depending on
the desired operation) with GFP_KERNEL | GFP_NOFAIL, get the lock again and
pass the preallocated entries into __es_insert_extent /
__es_remove_extent(). It's a bit ugly but we can at least remove those
__es_shrink() calls which are not pretty either.

Honza

Yes, there's really no better way, thank you very much for your review!
I've sent a patch for v4 as you suggested.
Thanks again!

--
With Best Regards,
Baokun Li
.

Next message: Jason Wang: "Re: [PATCH v3] vhost_vdpa: fix unmap process in no-batch mode"
Previous message: Jason Wang: "Re: [RFC 0/2] vduse: add support for networking devices"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]