Re: [BUG] fs/super: a possible sleep-in-atomic bug in put_super

From: Al Viro
Date: Sat Oct 07 2017 - 20:57:08 EST

On Sat, Oct 07, 2017 at 10:14:44PM +0100, Al Viro wrote:

> 1) coallocate struct list_lru and array of struct list_lru_node
> hanging off it. Turn all existing variables and struct members of that
> type into pointers. init would allocate and return a pointer, destroy
> would free (and leave it for callers to clear their pointers, of course).

Better yet, keep list_lru containing just the pointer to list_lru_node
array. And put that array into the tail of struct list_lru_nodes. That
way normal accesses are kept exactly as-is and we don't need to update
the users of that thing at all.

> 4) have lru_list_destroy() check (under list_lru_mutex) whether it's
> being asked to kill the currently resized one. If it is, do
> victim->list.prev->next = victim->;
> victim->>prev = victim->list.prev;
> victim->list.prev = NULL;

Doesn't work, unfortunately - it needs to stay on the list and be marked
in some other way.

> and bugger off, otherwise act as now. Turn the loop in
> memcg_update_all_list_lrus() into
> mutex_lock(&list_lrus_mutex);
> lru =;
> while (lru != &list_lrus) {
> currently_resized = list_entry(lru, struct list_lru, list);
> mutex_unlock(&list_lrus_mutex);
> ret = memcg_update_list_lru(lru, old_size, new_size);
> mutex_lock(&list_lrus_mutex);
> if (unlikely(!lru->prev)) {
> lru = lru->next;

... because this might very well be pointing to already freed object.

> free currently_resized as list_lru_destroy() would have
> continue;

What's more, we need to be careful about resize vs. drain. Right now it's
on list_lrus_mutex, but if we drop that around actual resize of an individual
list_lru, we'll need something else. Would there be any problem if we
took memcg_cache_ids_sem shared in memcg_offline_kmem()?

The first problem is not fatal - we can e.g. use the sign of the field used
to store the number of ->memcg_lrus elements (i.e. stashed value of
memcg_nr_cache_ids at allocation or last resize) to indicate that actual
freeing is left for resizer...