Re: [PATCH] mm: kmemleak: Fix crashing during kmemleak disabling

From: Catalin Marinas
Date: Thu Jun 04 2015 - 05:36:11 EST


Hi Andrew,

On Thu, Jun 04, 2015 at 12:29:36AM +0100, Andrew Morton wrote:
> On Wed, 3 Jun 2015 16:42:56 +0100 Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
>
> > With the current implementation, if kmemleak is disabled because of an
> > error condition (e.g. fails to allocate metadata), alloc/free calls are
> > no longer tracked. Usually this is not a problem since the kmemleak
> > metadata is being removed via kmemleak_do_cleanup(). However, if the
> > scanning thread is running at the time of disabling, kmemleak would no
> > longer notice a potential vfree() call and the freed/unmapped object may
> > still be accessed, causing a fault.
> >
> > This patch separates the kmemleak_free() enabling/disabling from the
> > overall kmemleak_enabled nob so that we can defer the disabling of the
> > object freeing tracking until the scanning thread completed. The
> > kmemleak_free_part() is deliberately ignored by this patch since this is
> > only called during boot before the scanning thread started.
>
> I'm having trouble with this. afacit, kmemleak_free() can still be
> called while kmemleak_scan() is running on another CPU.
> kmemleak_free_enabled hasn't been cleared yet so the races remain.

It's not about kmemleak_free() racing with kmemleak_scan().
kmemleak_free() (and __delete_object()) is meant to race with the
scanning thread (which can run for minutes at a time), the locking is
done on individual kmemleak_object items.

> However your statement "if the scanning thread is running at the time
> of disabling" implies that the race is between kmemleak_scan() and
> kmemleak_disable(). Yet the race avoidance code is placed in
> kmemleak_free().

The race is indeed between kmemleak_disable() and kmemleak_scan(). Since
the former may be called in atomic contexts, we cannot issue a
kthread_stop() and wait for the scanning thread to finish. This is
deferred to the kmemleak_do_cleanup() work queue.

> All confused. A more detailed description of the race would help.

I'll try to improve it and re-post.

> Also, the words "kmemleak would no longer notice a potential vfree()
> call" aren't sufficiently specific. kmemleak is a big place - what
> *part* of kmemleak are you referring to here?

What I meant is that without any patch, kmemleak_free() simply returns
on !kmemleak_enabled and kmemleak_scan() does not notice that objects it
is scanning are being freed (__delete_object() no longer called). This
is worse with vfree() as the object is no longer mapped.

There are other ways of fixing this like adding heavier locking but I
found that simply allowing kmemleak_free() to get through to
__delete_object() until the kmemleak_scan stopped is the simplest.

> Finally, I'm concerned that a bare
>
> kmemleak_free_enabled = 0;
>
> lacks sufficient synchronization with respect to the
> kmemleak_free_enabled readers from a locking/reordering point of view.

I thought about this as well and I didn't see an issue initially. From
an atomicity perspective, I'm not sure using atomic_t has any more
benefits (we used to have atomics here until commit 8910ae896c8c
"kmemleak: change some global variables to int").

As for the ordering, we need to ensure the visibility of the
kmemleak_free_enabled = 0 update to other CPUs in two cases:

1. after kmemleak_scan() is stopped. This is done by calling
kthread_stop() -> put_task_struct() -> atomic_dec_and_test(). The
latter implies barriers on each side of the operation, so I think
this case is safe.

2. before __kmemleak_do_cleanup(). The risk here is that a
delete_object_full() call from __kmemleak_do_cleanup() races with the
same call from kmemleak_free(). The object_tree_root look-up (via
find_get_object) is covered by the kmemleak_lock. However, it looks
to me like two delete_object_full() calls on the same object can race
to __delete_object() and call rb_erase() and list_del_rcu() twice on
the same object.

Second case is trickier. A barrier after kmemleak_free_enabled = 0 does
not help, we need locking from object look-up all the way to removing
the object from the object list and tree. Alternatively, I could take
the object->lock for the whole __delete_object() function and use the
OBJECT_ALLOCATED flag to decide whether to call rb_erase and
list_del_rcu. Something like below, but untested yet (I'm off most of
the day); I would need to pass it through lock proving as well:

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 11d6f8015896..27e2e0b688a9 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -607,20 +607,25 @@ static void __delete_object(struct kmemleak_object *object)
{
unsigned long flags;

+ /*
+ * Locking here ensures that the corresponding memory block cannot be
+ * freed when it is being scanned. It also avoids __delete_object()
+ * race when being called form __kmemleak_do_cleanup().
+ */
+ spin_lock_irqsave(&object->lock, flags);
+ if (!(object->flags & OBJECT_ALLOCATED))
+ goto out;
+
write_lock_irqsave(&kmemleak_lock, flags);
rb_erase(&object->rb_node, &object_tree_root);
list_del_rcu(&object->object_list);
write_unlock_irqrestore(&kmemleak_lock, flags);

- WARN_ON(!(object->flags & OBJECT_ALLOCATED));
WARN_ON(atomic_read(&object->use_count) < 2);

- /*
- * Locking here also ensures that the corresponding memory block
- * cannot be freed when it is being scanned.
- */
- spin_lock_irqsave(&object->lock, flags);
object->flags &= ~OBJECT_ALLOCATED;
+
+out:
spin_unlock_irqrestore(&object->lock, flags);
put_object(object);
}

--
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/