Re: [BUG] fs/super: a possible sleep-in-atomic bug in put_super

From: Al Viro
Date: Sat Oct 07 2017 - 17:15:10 EST


On Sat, Oct 07, 2017 at 06:06:51PM +0100, Al Viro wrote:
> On Sat, Oct 07, 2017 at 02:56:40PM +0300, Vladimir Davydov wrote:
> > Hello,
> >
> > On Fri, Oct 06, 2017 at 11:06:04AM +0200, Michal Hocko wrote:
> > > On Fri 06-10-17 16:59:18, Jia-Ju Bai wrote:
> > > > According to fs/super.c, the kernel may sleep under a spinlock.
> > > > The function call path is:
> > > > put_super (acquire the spinlock)
> > > > __put_super
> > > > destroy_super
> > > > list_lru_destroy
> > > > list_lru_unregister
> > > > mutex_lock --> may sleep
> > > > memcg_get_cache_ids
> > > > down_read --> may sleep
> > > >
> > > > This bug is found by my static analysis tool and my code review.
> >
> > This is false-positive: by the time we get to destroy_super(), the lru
> > lists have already been destroyed - see deactivate_locked_super() - so
> > list_lru_destroy() will retrun right away without attempting to take any
> > locks. That's why there's no lockdep warnings regarding this issue.
> >
> > I think we can move list_lru_destroy() to destroy_super_work() to
> > suppress this warning. Not sure if it's really worth the trouble though.
>
> It's a bit trickier than that (callers of destroy_super() prior to superblock
> getting reachable via shared data structures do not have that lru_list_destroy()
> a no-op, but they are not called under spinlocks).
>
> Locking in mm/list_lru.c looks excessive, but then I'm not well familiar with
> that code.

It *is* excessive.

1) coallocate struct list_lru and array of struct list_lru_node
hanging off it. Turn all existing variables and struct members of that
type into pointers. init would allocate and return a pointer, destroy
would free (and leave it for callers to clear their pointers, of course).

2) store the value of memcg_nr_cache_ids as of the last creation
or resize in list_lru. Pass that through to memcg_destroy_list_lru_node().
Result: no need for memcg_get_cache_ids() in list_lru_destroy().

3) add static list_lru *currently_resized, protected by list_lru_mutex.
NULL when memcg_update_all_list_lrus() is not run, points to currently
resized list_lru when it is.

4) have lru_list_destroy() check (under list_lru_mutex) whether it's
being asked to kill the currently resized one. If it is, do
victim->list.prev->next = victim->list.next;
victim->list.next->prev = victim->list.prev;
victim->list.prev = NULL;
and bugger off, otherwise act as now. Turn the loop in
memcg_update_all_list_lrus() into
mutex_lock(&list_lrus_mutex);
lru = list_lrus.next;
while (lru != &list_lrus) {
currently_resized = list_entry(lru, struct list_lru, list);
mutex_unlock(&list_lrus_mutex);
ret = memcg_update_list_lru(lru, old_size, new_size);
mutex_lock(&list_lrus_mutex);
if (unlikely(!lru->prev)) {
lru = lru->next;
free currently_resized as list_lru_destroy() would have
continue;
}
if (ret)
goto fail;
lru = lru->next;
}
currently_resized = NULL;
mutex_unlock(&list_lrus_mutex);

5) replace list_lrus_mutex with a spinlock.

At that point list_lru_destroy() becomes non-blocking. I think it should work,
but that's from several hours of looking through that code, so I might be
missing something subtle here...