Re: exit_mmap BUG_ON in 2.6.23 (and Add qdisc __NET_XMIT_STOLEN)

From: Hugh Dickins
Date: Sat May 26 2012 - 14:07:20 EST


On Fri, 25 May 2012, Sam Portolla wrote:
>
> commit 378a2f090f7a478704a372a4869b8a9ac206234e
> Date:   Mon Aug 4 22:31:03 2008 -0700
> net_sched: Add qdisc __NET_XMIT_STOLEN flag
...
>
>  I wonder if the lack of above patch in our code base could explain the
> exit_mmap() BUG_ON as well due to memory corruption causing MMU to not
> be able to locate the page(s) it had to free. NR_PTES keeps track of
> that? Could you explain that more?

I concur with Eric in thinking it unlikely - though (unlike Eric)
I know far too little about networking to comment with authority.

I'd guess that there have been literally hundreds of fixes gone into
the kernel since 2.6.23, each more likely to be the fix to such memory
corruption than this one. And I could also be wrong in attributing
your BUG to memory corruption: perhaps I'm forgetting an mm fix.

You ask me to explain more: mm->nr_ptes keeps track of the number of
page tables that have been allocated; when we free the mm, we should
be freeing exactly the number of page tables we allocated earlier,
but a bug in the code maintaining the vmas or the page tables might
break that, hence the BUG_ON to test. But equally, if there has been
memory corruption of vmas or of higher-level page tables, we may now
be unable to locate all the page tables we allocated earlier, and so
hit the BUG_ON for that reason.

Would I be unfair to characterize this as a problem seen once at a
customer site in the 4.5 years since 2.6.23 was released?

As I said before, please just change that BUG_ON to WARN_ON, and
wait to see if more such issues come up: if they do, then you can
start to look for a pattern.

Hugh