Re: tty: memory leak in tty_register_driver

From: Catalin Marinas
Date: Mon Feb 29 2016 - 06:34:59 EST

On Mon, Feb 29, 2016 at 11:22:58AM +0100, Dmitry Vyukov wrote:
> On Mon, Feb 29, 2016 at 12:47 AM, Catalin Marinas
> <catalin.marinas@xxxxxxx> wrote:
> > On Sun, Feb 28, 2016 at 05:42:24PM +0100, Dmitry Vyukov wrote:
> >> On Mon, Feb 15, 2016 at 11:42 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >> > When I am running the following program in a parallel loop, kmemleak
> >> > starts reporting memory leaks of objects allocated in
> >> > tty_register_driver during boot. These leaks start popping up
> >> > chaotically and as you can see they originate in different drivers
> >> > (synclinkmp_init, isdn_init, chr_dev_init, sysfs_init).
> >> >
> >> > On commit 388f7b1d6e8ca06762e2454d28d6c3c55ad0fe95 (4.5-rc3).
> > [...]
> >> > unreferenced object 0xffff88006708dc20 (size 8):
> >> > comm "swapper/0", pid 1, jiffies 4294672590 (age 930.839s)
> >> > hex dump (first 8 bytes):
> >> > 74 74 79 53 4c 4d 38 00 ttySLM8.
> >> > backtrace:
> >> > [<ffffffff81765d10>] __kmalloc_track_caller+0x1b0/0x320 mm/slub.c:4068
> >> > [<ffffffff816b37a9>] kstrdup+0x39/0x70 mm/util.c:53
> >> > [<ffffffff816b3826>] kstrdup_const+0x46/0x60 mm/util.c:74
> >> > [<ffffffff8194e5bb>] __kernfs_new_node+0x2b/0x2b0 fs/kernfs/dir.c:536
> >> > [<ffffffff81951c70>] kernfs_new_node+0x80/0xe0 fs/kernfs/dir.c:572
> >> > [<ffffffff81957223>] kernfs_create_link+0x33/0x150 fs/kernfs/symlink.c:32
> >> > [<ffffffff81959c4b>] sysfs_do_create_link_sd.isra.2+0x8b/0x120
> > [...]
> >> +Catalin (kmemleak maintainer)
> >>
> >> I am noticed a weird thing. I am not 100% sure but it seems that the
> >> leaks are reported iff I run leak checking concurrently with the
> >> programs running. And if I run the program several thousand times and
> >> then run leak checking, then no leaks reported.
> >>
> >> Catalin, it is possible that it is a kmemleak false positive?
> >
> > Yes, it's possible. If you run kmemleak scanning continuously (or at
> > very short intervals) and especially in parallel with some intensive
> > tasks, it will miss pointers that may be stored in registers (on other
> > CPUs) or moved between task stacks, other memory locations. Linked lists
> > are especially prone to such false positives.
> >
> > Kmemleak tries to work around this by checksumming each object, so it
> > will only be reported if it hasn't changed on two consecutive scans.
> > Since the default scanning is 10min, it is very unlikely to trigger
> > false positives in such scenarios. However, if you reduce the scanning
> > time (or trigger it manually in a loop), you can hit this condition.
> >
> >> I see that kmemleak just scans thread stacks one-by-one. I would
> >> expect that kmemleak should stop all threads, then scan all stacks and
> >> all registers of all threads, and then restart threads. If it does not
> >> scan registers or does not stop threads, then I think it should be
> >> possible that a pointer value can sneak off kmemleak. Does it make
> >> sense?
> >
> > Given how long it takes to scan the memory, stopping the threads is not
> > really feasible. You could do something like stop_machine() only for
> > scanning the current stack on all CPUs but it still wouldn't catch
> > pointers being moved around in memory unless you stop the system
> > completely for a full scan. The heuristic about periodic scanning and
> > checksumming seems to work fine in normal usage scenarios.
> >
> > For your tests, I would recommend that you run the tests for a long(ish)
> > time and only do two kmemleak scans at the end after they finished (and
> > with a few seconds delay between them). Continuous scanning is less
> > reliable.
> Let me describe my usage scenario first. I am running automatic
> testing 24x7. Currently a VM executes a dozen of small programs (a
> dozen of syscalls each), then I run manual leak scanning.

IIUC, you said that the leak reporting happens iff you run leak checking
concurrently with the test programs running. If you run the kmemleak
scanning afterwards, there are no leaks reported. As I explained, that's
the normal usage I would expect.

> I can't run significantly more programs between scans, because then I
> won't be able to restore reproducers for bugs and they will be
> unactionable. I could run leak checking after each program, but it
> will increase overhead significantly. So a dozen of programs is a
> trade-off. And I disable automatic scanning.

That's fine, not an issue here.

> False positives are super unpleasant in automatic testing. If a tool
> false positive rate if high, I just disable it, it is unusable. It is
> not that bad for leak checking. But each false positive consumes human
> (my) time.

Indeed. I've spent a significant amount of time in the early kmemleak
days just trying to prove whether it's a real leak or not but these days
with the automatic scanning, false positives seem to be very low ratio.

> So I need to run scanning twice, because the first one never reports leaks.

That's because of the checksumming. For example, you have objects stored
in a list. On some delete or insert, the list_head is temporarily
modified, possibly stored in CPU registers on another CPU. For this
brief time, kmemleak may no longer detect a pointer to the rest of the
list, hence reporting a big part of it as leaked objects.

Since list deletion/insertion requires a modification of an object
list_head, it's checksum changes, hence if this value has changed since
the previous scan, kmemleak will not report the object, assuming it is
something transient. If you run two scans in quick succession, it
possible that the transient condition hasn't cleared yet, so you risk a
false positive. Of course, I could place some random delay in kmemleak
between successive scans but I assumed that most people would leave it
running on the default 10min scan.

I had other heuristics like object age but checksumming proved to be the
most efficient, with the minor drawback that you'd have to run the
scanning twice before reporting. And any kind of delayed reporting (e.g.
X secs since the last checksum modification) would most likely break
your testing workflow.

> For the false positives due to registers/pointer jumping, will it help
> if I run scanning one more time if leaks are detected? I mean: run
> scanning twice, if leaks are found sleep for several seconds and run
> scanning third time. Since leaks are usually not detected I can afford
> to sleep more and do one or two additional scans.

This should work.

> The question here: will kmemleak _remove_ an object for leaked
> objects, if it discovered reachable or contents change on subsequent
> scans?

Yes, it will remove them from /sys/kernel/debug/kmemleak even if they
were previously reported.

> Regarding stopping all threads and doing proper scan, why is not it
> feasible? Will kernel break if we stall all CPUs for seconds? In
> automatic testing scenarios a stalled for several seconds machine is
> not a problem. But on the other hand, absence of false positives is a
> must. And it would improve testing bandwidth, because we don't need
> sleep and second scan.

Scanning time is the main issue with it taking minutes on some slow ARM
machines (my primary testing target). Such timing was significantly
improved with commit 93ada579b0ee ("mm: kmemleak: optimise kmemleak_lock
acquiring during kmemleak_scan") but even if it is few seconds, it is
not suitable for a live, interactive system.

What we could do though, since you already trigger the scanning
manually, is to add a "stopscan" command that you echo into
/sys/kernel/debug/kmemleak and performs a stop_machine() during memory
scanning. If you have time, please feel free to give it a try ;).