There's nothing about arm64 in there and it looks like the code prefers
MADV_POPULATE_WRITE if THPs are enabled (which is the case in all
enterprise distros). I can't tell whether the change was made to work
around the arm64 behaviour, there's no commit log (it was contributed by
Ampere).
There's a separate thread with the mm folk on the THP behaviour for
pmd_none() vs pmd mapping the zero huge page but it is more portable for
OpenJDK to use madvise() than guess the kernel behaviour and touch small
pages or a single large pages. Even if one claims that atomic_add(0) is
portable across operating systems, the OpenJDK code was already treating
Linux as a special case in the presence of THP.
It would be much simpler to just merge the patch and be done with it.
Otherwise this issue will continue to cause uncountably many hours of
anguish for sysadmins and developers all over the Linux ecosystem trying to
figure out what in the world is going on with ARM.
People will be happy until one enables execute-only ELF text sections in
a distro and all that opcode parsing will add considerable overhead for
many read faults (those with a writeable vma).
I'd also like to understand (probably have to re-read the older threads)
whether the overhead is caused mostly by the double fault or the actual
breaking of a THP. For the latter, the mm folk are willing to change the
behaviour so that pmd_none() and pmd to the zero high page are treated
similarly (i.e. allocate a huge page on write fault). If that's good
enough, I'd rather not merge this patch (or some form of it) and wait
for a proper fix in hardware in the future.
Just to be clear, there are still potential issues to address (or
understand the impact of) in this patch with exec-only mappings and
the performance gain _after_ the THP behaviour changed in the mm code.
We can make a call once we have more data but, TBH, my inclination is
towards 'no' given that OpenJDK already support madvise() and it's not
arm64 specific.