ARM v8.4 extensions add new neon instructions for performing a
multiplication of each FP16 element of one vector with the corresponding
FP16 element of a second vector, and to add or subtract this without an
intermediate rounding to the corresponding FP32 element in a third vector.
This patch detects this feature and let the userspace know about it via a
HWCAP bit and MRS emulation.
Cc: Dave Martin <Dave.Martin@xxxxxxx>
Cc: Suzuki K Poulose <suzuki.poulose@xxxxxxx>
Signed-off-by: Dongjiu Geng <gengdongjiu@xxxxxxxxxx>
Reviewed-by: Dave Martin <Dave.Martin@xxxxxxx>