Re: Kmap-related crashes and memory leaks on 32bit arch (5.15+)

From: Linus Torvalds
Date: Thu Nov 04 2021 - 19:37:50 EST


On Thu, Nov 4, 2021 at 3:09 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> I'm obviously not on my laptop right now, but I did look at the btrfs
> lzo code earlier today again, and didn't see anything that looked
> remotely suspicious.

Ok, back at the laptop, and once again looking at this.

I'm stumped.

So if I understand correctly,

5.15 + 2cf3f8133bda ("btrfs: fix lzo_decompress_bio() kmap leakage")

is fine.

Also, 5.15 with the folio merge, plus the fix for that (commit
e66435936756: "mm: fix mismerge of folio page flag manipulators") is
fine too. I

The tested "tip of the day" that was bad was dcd68326d29b ("Merge tag
'devicetree-for-5.16' of git://...").

Since you already tested the folio merge, there really isn't a whole
lot of mm changes left in there. Andrew hasn't sent his patch-bombs
yet.

Doing a

gitk 49f8275c7d9247cf..037c50bfbeb33b \
mm/ include/linux/highmem* fs/btrfs/

really doesn't show anything that looks particularly suspicious.
There's some sync_bdev() changes, and some polling changes, but they
look trivial.

The only half-way relevant thing that remains is my merge, which very
much had conflicts around kmap/kunmap because of the revert problems.

So I continue to think that I must have screwed up, but I still don't
see which kmap/kunmap would be wrong.

I'll just repeat my suggestion here since the original email didn't go
to the lists.

> (a) test your side before my merge with your alternate kmap fix
> commit (the one you had in the other branch to make it allegedly
> easier to resolve)?
>
> (b) if that works, re-do the merge using that kmap pattern?

Your kmap() pattern is slightly different from mine. I tried to avoid
an unnecessary "kmap again" in copy_compressed_data_to_page(), so my
kmap lifetime is slightly longer over that loop.

Having looked at it once more, it still looks "ObviouslyCorrect(tm)"
to me, but I suspect I'm just being blind to some obvious bug.

> If (a) works, but (b) still fails, then it must be some odd
> interaction issue with something else. Which sounds unlikely, since I
> don't think we really had anything that should affect kmap or anything
> in this area, but who knows...

And bisection ends up perhaps somewhat painful, but sounds like the
way to go if there's no other path forward.

Linus