Re: tty: memory leak in tty_register_driver

From: Dmitry Vyukov
Date: Mon Feb 29 2016 - 05:24:48 EST

On Mon, Feb 29, 2016 at 11:22 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Mon, Feb 29, 2016 at 12:47 AM, Catalin Marinas
> <catalin.marinas@xxxxxxx> wrote:
>> On Sun, Feb 28, 2016 at 05:42:24PM +0100, Dmitry Vyukov wrote:
>>> On Mon, Feb 15, 2016 at 11:42 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>>> > When I am running the following program in a parallel loop, kmemleak
>>> > starts reporting memory leaks of objects allocated in
>>> > tty_register_driver during boot. These leaks start popping up
>>> > chaotically and as you can see they originate in different drivers
>>> > (synclinkmp_init, isdn_init, chr_dev_init, sysfs_init).
>>> >
>>> > On commit 388f7b1d6e8ca06762e2454d28d6c3c55ad0fe95 (4.5-rc3).
>> [...]
>>> > unreferenced object 0xffff88006708dc20 (size 8):
>>> > comm "swapper/0", pid 1, jiffies 4294672590 (age 930.839s)
>>> > hex dump (first 8 bytes):
>>> > 74 74 79 53 4c 4d 38 00 ttySLM8.
>>> > backtrace:
>>> > [<ffffffff81765d10>] __kmalloc_track_caller+0x1b0/0x320 mm/slub.c:4068
>>> > [<ffffffff816b37a9>] kstrdup+0x39/0x70 mm/util.c:53
>>> > [<ffffffff816b3826>] kstrdup_const+0x46/0x60 mm/util.c:74
>>> > [<ffffffff8194e5bb>] __kernfs_new_node+0x2b/0x2b0 fs/kernfs/dir.c:536
>>> > [<ffffffff81951c70>] kernfs_new_node+0x80/0xe0 fs/kernfs/dir.c:572
>>> > [<ffffffff81957223>] kernfs_create_link+0x33/0x150 fs/kernfs/symlink.c:32
>>> > [<ffffffff81959c4b>] sysfs_do_create_link_sd.isra.2+0x8b/0x120
>> [...]
>>> +Catalin (kmemleak maintainer)
>>> I am noticed a weird thing. I am not 100% sure but it seems that the
>>> leaks are reported iff I run leak checking concurrently with the
>>> programs running. And if I run the program several thousand times and
>>> then run leak checking, then no leaks reported.
>>> Catalin, it is possible that it is a kmemleak false positive?
>> Yes, it's possible. If you run kmemleak scanning continuously (or at
>> very short intervals) and especially in parallel with some intensive
>> tasks, it will miss pointers that may be stored in registers (on other
>> CPUs) or moved between task stacks, other memory locations. Linked lists
>> are especially prone to such false positives.
>> Kmemleak tries to work around this by checksumming each object, so it
>> will only be reported if it hasn't changed on two consecutive scans.
>> Since the default scanning is 10min, it is very unlikely to trigger
>> false positives in such scenarios. However, if you reduce the scanning
>> time (or trigger it manually in a loop), you can hit this condition.
>>> I see that kmemleak just scans thread stacks one-by-one. I would
>>> expect that kmemleak should stop all threads, then scan all stacks and
>>> all registers of all threads, and then restart threads. If it does not
>>> scan registers or does not stop threads, then I think it should be
>>> possible that a pointer value can sneak off kmemleak. Does it make
>>> sense?
>> Given how long it takes to scan the memory, stopping the threads is not
>> really feasible. You could do something like stop_machine() only for
>> scanning the current stack on all CPUs but it still wouldn't catch
>> pointers being moved around in memory unless you stop the system
>> completely for a full scan. The heuristic about periodic scanning and
>> checksumming seems to work fine in normal usage scenarios.
>> For your tests, I would recommend that you run the tests for a long(ish)
>> time and only do two kmemleak scans at the end after they finished (and
>> with a few seconds delay between them). Continuous scanning is less
>> reliable.
> Thanks for the explanation, Catalin!
> Let me describe my usage scenario first. I am running automatic
> testing 24x7. Currently a VM executes a dozen of small programs (a
> dozen of syscalls each), then I run manual leak scanning. I can't run
> significantly more programs between scans, because then I won't be
> able to restore reproducers for bugs and they will be unactionable. I
> could run leak checking after each program, but it will increase
> overhead significantly. So a dozen of programs is a trade-off. And I
> disable automatic scanning.
> False positives are super unpleasant in automatic testing. If a tool
> false positive rate if high, I just disable it, it is unusable. It is
> not that bad for leak checking. But each false positive consumes human
> (my) time.
> So I need to run scanning twice, because the first one never reports leaks.
> For the false positives due to registers/pointer jumping, will it help
> if I run scanning one more time if leaks are detected? I mean: run
> scanning twice, if leaks are found sleep for several seconds and run
> scanning third time. Since leaks are usually not detected I can afford
> to sleep more and do one or two additional scans. The question here:
> will kmemleak _remove_ an object for leaked objects, if it discovered
> reachable or contents change on subsequent scans?
> Regarding stopping all threads and doing proper scan, why is not it
> feasible? Will kernel break if we stall all CPUs for seconds? In
> automatic testing scenarios a stalled for several seconds machine is
> not a problem. But on the other hand, absence of false positives is a
> must. And it would improve testing bandwidth, because we don't need
> sleep and second scan.

Paul, regarding this particular leak, let's consider it kmemleak false
positive (until proven otherwise).