Re: net/core: BUG in copy_net_ns()

From: Dmitry Vyukov
Date: Tue Jan 15 2019 - 05:36:22 EST


On Mon, Jan 14, 2019 at 7:30 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>
> zzoru <zzoru007@xxxxxxxxx> writes:
>
> > I think that it is exactly same to:
> > https://groups.google.com/forum/#!searchin/linux.kernel/cleanup_net$20is$20slow%7Csort:date/linux.kernel/IMJ9OzonDSI/QH86oy1PAQAJ
> > Already, patch was maded, but maybe he forgot to push it.
>
> That patch was made to address speed, and lifetime of network stack
> objects. At best it will make things go faster (a good thing), and
> reduce the memory consumption during a test (another good thing).
> The patch you point to will not correct your memory corruption.
>
> So right now the best hypothesis seems to be Dmitriy's idea that
> there is stack overflow causing corruption. You have a lot of stack
> debugging already enabled but I don't see CONFIG_VMAP_STACK enabled
> which might catch something ordinary stack overflow checking won't.
>
> Any chance you can enable CONFIG_VMAP_STACK and see if it is stack
> overflow?
>
> With a little luck you will catch the stack overflow in the act and we
> can see the problematic code path.

Most likely the stack overflow should be detectable with
CONFIG_VMAP_STACK. But CONFIG_VMAP_STACK is incompatible with KASAN:
https://bugzilla.kernel.org/show_bug.cgi?id=202009

I reproduced the other stack overflow without KASAN and without
CONFIG_VMAP_STACK and it was detected as "corrupted stack end
detected inside scheduler". We can try the same here. But without
KASAN and with CONFIG_VMAP_STACK should be more reliable.

But how I read it is if we see wb_workfn in stacks, kernel memory is
corrupted. Overflow at that async stack is not dependent on how
exactly low memory condition was provoked.