On Tue, Jul 02, 2024 at 03:21:41PM -0700, Yang Shi wrote:
On 7/1/24 12:43 PM, Catalin Marinas wrote:That's a somewhat valid argument I heard internally as well. People tend
I don't follow OpenJDK development but I heard that updates are draggingAs Christopher said there may be similar use of atomic in other
quite a lot. I can't tell whether people have picked up the
atomic_add(0) feature and whether, by the time a kernel patch would make
it into distros, they'd also move to the MADV_POPULATE_WRITE pattern.
applications, so I don't worry too much about dead code problem IMHO.
OpenJDK is just the usecase that we know. There may be unknown unknowns. And
the distros typically backport patches from mainline kernel to their kernel
so there should be combos like old kernel + backported patch + old OpenJDK.
to change or patch kernel versions more often than OpenJDK versions
because of the risk of breaking their Java stack. But, arguably, one can
backport the madvise() OpenJDK change since it seems to have other
benefits on x86 as well.
AFAICT, the users do expect similar behavior as x86 (one fault instead ofIt's not a correctness problem, only a performance one. Big part of that
two faults). Actually we noticed this problem due to a customer report.
could be mitigated by some adjustment to how THP pages are allocated on
a write fault (though we'd still have an unnecessary read fault and some
TLBI). See Ryan's sub-thread.
Do you mean this:There's a point (c) as well on the overhead of reading the faultingI think I showed benchmark data requested by Anshuman in the earlier email
instruction. I hope that's negligible but I haven't measured it.
discussion.
https://lore.kernel.org/r/328c4c86-96c8-4896-8b6d-94f2facdac9a@xxxxxxxxxxxxxxxxxxxxxx
I haven't figured out what the +24% case is in there, it seems pretty
large.
What you haven't benchmarked (I think) is the case where the instruction
is in an exec-only mapping. The subsequent instruction read will fault
and it adds to the overhead. Currently exec-only mappings are not
widespread but I heard some people planning to move in this direction as
a default build configuration.
It could be worked around with a new flavour of get_user() that uses the
non-T LDR instruction and the user mapping is readable by the kernel
(that's the case with EPAN, prior to PIE and I think we can change this
for PIE configurations as well). But it adds to the complexity of this
patch when the kernel already offers a MADV_POPULATE_WRITE solution.