Re: [PATCH] mm: workingset: fix NULL ptr dereference

From: Jan Kara
Date: Tue Apr 10 2018 - 04:51:01 EST


On Tue 10-04-18 11:59:03, Minchan Kim wrote:
> On Mon, Apr 09, 2018 at 07:41:52PM -0700, Matthew Wilcox wrote:
> > On Tue, Apr 10, 2018 at 11:33:39AM +0900, Minchan Kim wrote:
> > > @@ -522,7 +532,7 @@ EXPORT_SYMBOL(radix_tree_preload);
> > > */
> > > int radix_tree_maybe_preload(gfp_t gfp_mask)
> > > {
> > > - if (gfpflags_allow_blocking(gfp_mask))
> > > + if (gfpflags_allow_blocking(gfp_mask) && !(gfp_mask & __GFP_ZERO))
> > > return __radix_tree_preload(gfp_mask, RADIX_TREE_PRELOAD_SIZE);
> > > /* Preloading doesn't help anything with this gfp mask, skip it */
> > > preempt_disable();
> >
> > No, you've completely misunderstood what's going on in this function.
>
> Okay, I hope this version clear current concerns.
>
> From fb37c41b90f7d3ead1798e5cb7baef76709afd94 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@xxxxxxxxxx>
> Date: Tue, 10 Apr 2018 11:54:57 +0900
> Subject: [PATCH v3] mm: workingset: fix NULL ptr dereference
>
> It assumes shadow entries of radix tree rely on the init state
> that node->private_list allocated newly is list_empty state
> for the working. Currently, it's initailized in SLAB constructor
> which means node of radix tree would be initialized only when
> *slub allocates new page*, not *slub alloctes new object*.
>
> If some FS or subsystem pass gfp_mask to __GFP_ZERO, that means
> newly allocated node can have !list_empty(node->private_list)
> by memset of slab allocator. It ends up calling NULL deference
> at workingset_update_node by failing list_empty check.
>
> This patch fixes it.

The patch looks good. I'd just rephrase the changelog to be more
understandable. Something like:

GFP mask passed to page cache functions (often coming from
mapping->gfp_mask) is used both for allocation of page cache page and for
allocation of radix tree metadata necessary to add the page to the page
cache. When the mask contains __GFP_ZERO (as is the case for some f2fs
metadata mappings), this breaks radix tree code as that code expects
allocated radix tree nodes to be properly initialized by the slab
constructor and not zeroed. In particular node->private_list is failing
list_empty() check and the following list operation in
workingset_update_node() will dereference NULL.

Fix the problem by removing __GFP_ZERO from the mask for radix tree
allocations. Also warn if __GFP_ZERO gets passed to __radix_tree_preload()
to avoid silent breakage in the future for other radix tree users.

With that fixed you can add:

Reviewed-by: Jan Kara <jack@xxxxxxx>

Honza
>
> Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Jan Kara <jack@xxxxxxx>
> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> Cc: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
> Cc: Chao Yu <yuchao0@xxxxxxxxxx>
> Cc: Christopher Lameter <cl@xxxxxxxxx>
> Cc: linux-fsdevel@xxxxxxxxxxxxxxx
> Cc: stable@xxxxxxxxxxxxxxx
> Reported-by: Chris Fries <cfries@xxxxxxxxxx>
> Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
> ---
> lib/radix-tree.c | 9 +++++++++
> mm/filemap.c | 5 +++--
> 2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index da9e10c827df..7569e637dbaa 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -470,6 +470,15 @@ static __must_check int __radix_tree_preload(gfp_t gfp_mask, unsigned nr)
> struct radix_tree_node *node;
> int ret = -ENOMEM;
>
> + /*
> + * New allocate node must have node->private_list as INIT_LIST_HEAD
> + * state by workingset shadow memory implementation.
> + * If user pass __GFP_ZERO by mistake, slab allocator will clear
> + * node->private_list, which makes a BUG. Rather than going Oops,
> + * just fix and warn about it.
> + */
> + if (WARN_ON(gfp_mask & __GFP_ZERO))
> + gfp_mask &= ~__GFP_ZERO;
> /*
> * Nodes preloaded by one cgroup can be be used by another cgroup, so
> * they should never be accounted to any particular memory cgroup.
> diff --git a/mm/filemap.c b/mm/filemap.c
> index ab77e19ab09c..b6de9d691c8a 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -786,7 +786,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
> VM_BUG_ON_PAGE(!PageLocked(new), new);
> VM_BUG_ON_PAGE(new->mapping, new);
>
> - error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
> + error = radix_tree_preload(gfp_mask & ~(__GFP_HIGHMEM | __GFP_ZERO));
> if (!error) {
> struct address_space *mapping = old->mapping;
> void (*freepage)(struct page *);
> @@ -842,7 +842,8 @@ static int __add_to_page_cache_locked(struct page *page,
> return error;
> }
>
> - error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
> + error = radix_tree_maybe_preload(gfp_mask &
> + ~(__GFP_HIGHMEM | __GFP_ZERO));
> if (error) {
> if (!huge)
> mem_cgroup_cancel_charge(page, memcg, false);
> --
> 2.17.0.484.g0c8726318c-goog
>
>
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR