From: David Daney <david.daney@xxxxxxxxxx>
Most broadcast TLB invalidations are unnecessary. So when
invalidating for a given mm/vma target the only the needed CPUs via
and IPI.
For global TLB invalidations, also use IPI.
Tested on Cavium ThunderX.
This change reduces 'time make -j48' on kernel from 139s to 116s (83%
as long).
The patch is needed because of a ThunderX Pass1 erratum: Exclusive
store operations unreliable in the presence of broadcast TLB
invalidations. The performance improvements shown make it compelling
even without the erratum workaround need.
Signed-off-by: David Daney <david.daney@xxxxxxxxxx>