Re: Linux 4.18-rc7

From: Linus Torvalds
Date: Wed Aug 01 2018 - 13:15:20 EST


On Tue, Jul 31, 2018 at 10:43 AM Luck, Tony <tony.luck@xxxxxxxxx> wrote:
>
> On Tue, Jul 31, 2018 at 08:03:28PM +0300, Kirill A. Shutemov wrote:
> > But it's not the only issue unfortunately. Tony reported issue with
> > booting ia64 with the patch. I have no clue why. I rechecked everything
> > ia64-specific and looks fine to me. :-/
>
> If I just revert bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives")
> then ia64 boots again.

Ok, I'd love to have more information about this, but I'm assuming
that Tony isn't running some odd ia64 version of Android, so there's
definitely something else than just the ashmem thing going on. Either
it's some odd ia64-specific special vma, or it's just something
triggered on an ia64 boot that nobody else noticed or cared about.

And I was just going to do the final revert and started this email to
say so, when I looked at the obvious candidate: the
ia64_init_addr_space() function. Trivially fixed.

But as I was doing that, I also noticed another problem with the vma
series: the vma_init() conversion of automatic variables is buggy.
Commit 2c4541e24c55 ("mm: use vma_init() to initialize VMAs on stack
and data segments") is really bad, because it never grew the memset()
that was discussed, and the patch that was applied was the original
one - so vma_init() only initializes a couple of fields. As a result,
doing things like this:

- struct vm_area_struct vma = { .vm_mm = mm };
+ struct vm_area_struct vma;

+ vma_init(&vma, mm);

is just completely wrong, because it actually initializes much less
than it used to, and leaves most of the vma filled with random stack
garbage. In particular, it now fills with garbage the fields that TLB
flushing really can care about: things like vm_flags that says "is
this perhaps an executable-only mapping that only needs to flush the
ITLB?"

I don't actually believe that we should do vma_init() on those
on-stack vma's anyway, since they aren't "real" vma's. They are
literally crafted just for TLB flushing - filling in the vm_mm (and
sometimes vm_flags) pointers so that the TLB flushing knows what to
do.

So using "vma_init()" on them is actively detrimental as things stand
right now. The reason I looked at them was that I was trying to see
who actually uses "vm_area_alloc()" and "vma_init()" right now that
would be affected by that commit bfd40eaff5ab ("mm: fix
vma_is_anonymous() false-positives") outside of actual
honest-to-goodness device file mmaps.

Anyway, the upshot of all this is that I think I know what the ia64
problem was, and John sent the patch for the ashmem case, and I'm
going to hold off reverting that vma_is_anonymous() false-positives
commit after all.

I'm still unhappy about the vma_init() ones, and I have not decided
how to go with those. Either the memset() in vma_init(), or just
reverting the (imho unnecessary) commit 2c4541e24c55. Kirill, Andrew,
comments?

Tony, can you please double-check my commit ebad825cdd4e ("ia64: mark
special ia64 memory areas anonymous") fixes things for you? I didn't
even compile it, but it really looks so obvious that I just committed
it directly. It's not pushed out yet (I'm doing the normal full build
test because of the ashmem commit first), but it should be out in
about 20 minutes when my testing has finished.

I'd like to get this sorted out asap, although at this point I still
think that I'll have to do an rc8 even though I feel like we may have
caught everything.

Linus