Re: [RFC 3/8] mm: Avoid using set_page_count() in set_page_recounted()

From: John Hubbard
Date: Wed Oct 27 2021 - 21:20:30 EST


On 10/27/21 11:27, Pasha Tatashin wrote:
On Wed, Oct 27, 2021 at 1:12 AM John Hubbard <jhubbard@xxxxxxxxxx> wrote:

On 10/26/21 11:21, Pasha Tatashin wrote:
It must return the same thing, if it does not we have a bug in our
kernel which may lead to memory corruptions and security holes.

So today we have this:
VM_BUG_ON_PAGE(page_ref_count(page), page); -> check ref_count is 0
< What if something modified here? Hmm..>
set_page_count(page, 1); -> Yet we reset it to 1.

With my proposed change:
VM_BUG_ON_PAGE(page_ref_count(page), page); -> check ref_count is 0
refcnt = page_ref_inc_return(page); -> ref_count better be 1.
VM_BUG_ON_PAGE(refcnt != 1, page); -> Verify that it is 1.


Yes, you are just repeating what the diffs say.

But it's still not good to have this function name doing something completely
different than its name indicates.

I see, I can rename it to: 'set_page_recounted/get_page_recounted' ?


What? No, that's not where I was going at all. The function is already
named set_page_refcounted(), and one of the problems I see is that your
changes turn it into something that most certainly does not
set_page_refounted(). Instead, this patch *increments* the refcount.
That is not the same thing.

And then it uses a .config-sensitive assertion to "prevent" problems.
And by that I mean, the wording throughout this series seems to equate
VM_BUG_ON_PAGE() assertions with real assertions. They are only active,
however, in CONFIG_DEBUG_VM configurations, and provide no protection at
all for normal (most distros) users. That's something that the wording,
comments, and even design should be tweaked to account for.




I understand where this patchset is going, but this intermediate step is
not a good move.

Also, for the overall series, if you want to change from
"set_page_count()" to "inc_and_verify_val_equals_one()", then the way to
do that is *not* to depend solely on VM_BUG*() to verify. Instead,
return something like -EBUSY if incrementing the value results in a
surprise, and let the caller decide how to handle it.

Actually, -EBUSY would be OK if the problems were because we failed to
modify refcount for some reason, but if we modified refcount and got
an unexpected value (i.e underflow/overflow) we better report it right
away instead of waiting for memory corruption to happen.


Having the caller do the BUG() or VM_BUG*() is not a significant delay.

We cannot guarantee that new callers in the future will check return
values, the idea behind this work is to ensure that we are always
protected from refcount underflow/overflow and invalid refcount
modifications by set_refcount.


I don't have a problem with putting assertions closest to where they should
fire. That's a good thing. I'm looking here for ways to fix up the problems
listed in the points above, though.

And I do want to point out another thing, though, and that is: generally, we
don't have to program to quite the level of defensiveness you seem to be at.
If return values must be checked, they usually are in the kernel--and we even
have tooling to enforce it:

/*
* gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-warn_005funused_005fresult-function-attribute
* clang: https://clang.llvm.org/docs/AttributeReference.html#nodiscard-warn-unused-result
*/
#define __must_check __attribute__((__warn_unused_result__))


Please take that into consideration when weighing tradeoffs, just sort of in
general.



thanks,
--
John Hubbard
NVIDIA