Re: [PATCH] arm64: Delete the space separator in __emit_inst

From: Fangrui Song
Date: Tue Apr 14 2020 - 11:43:17 EST



On 2020-04-14, Mark Rutland wrote:
Hi Fangrui,

On Sun, Apr 12, 2020 at 08:38:11PM -0700, Fangrui Song wrote:
Many instances of __emit_inst(x) expand to a directive. In a few places
it is used as a macro argument, e.g.

arch/arm64/include/asm/sysreg.h
#define __emit_inst(x) .inst (x)

arch/arm64/include/asm/sysreg.h
#define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift))

arch/arm64/kvm/hyp/entry.S
ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)

Clang integrated assembler parses `.inst (x)` as two arguments passing
to a macro. We delete the space separator so that `.inst(x)` will be
parsed as one argument.

I'm a little confused by the above; sorry if the below sounds stupid or
pedantic, but I just want to make sure I've understood the problem
correctly.

For the above, ALTERNATIVE() and SET_PSTATE_PAN() are both preprocessor
macros, so I would expect those to be expanded before either the
integrated assembler or an external assembler consumes any of the
assembly (and both would see the same expanded text). Given that, I'm a
bit confused as to why the integrated assembly would have an impact on
preprocessing.

Does compiling the pre-processed source using the integrated assembler
result in the same behaviour? Can we see the expanded text to make that
clear?

... at what stage exactly does this go wrong?

Thanks,
Mark.

Hi Mark,

The C preprocessor expands arch/arm64/kvm/hyp/entry.S
ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)

to:
alternative_insn nop, .inst (0xd500401f | ((0) << 16 | (4) << 5) | ((!!1) << 8)), 4, 1

`alternative_insn` is an assembler macro, not handled by the C preprocessor.

Both comma and space are separators, with an exception that content
inside a pair of parentheses/quotes is not split, so clang -cc1as or GNU
as x86 splits arguments this way:

nop, .inst, (0xd500401f | ((0) << 16 | (4) << 5) | ((!!1) << 8)), 4, 1

I actually feel that GNU as arm64's behavior is a little bit buggy. It
works just because GNU as has another preprocessing step `do_scrub_chars`
and its arm64 backend deletes the space before '('

alternative_insn nop,.inst(0xd500401f|((0)<<16|(4)<<5)|((!!1)<<8)),4,1

The x86 backend keeps the space before the outmost '('

alternative_insn nop,.inst (0xd500401f|((0)<<16|(4)<<5)|((!!1)<<8)),4,1

By reading its state machine, I think keeping the spaces will be the
most reasonable behavior.

If .inst were only used as arguments,

alternative_insn nop, ".inst (0xd500401f | ((0) << 16 | (4) << 5) | ((!!1) << 8))", 4, 1

would be the best to avoid parsing issues.


Note, GNU as parsing `.inst (x)` as one argument is unintentional (for
example the x86 backend will parse the construct as two arguments).
See https://sourceware.org/bugzilla/show_bug.cgi?id=25750#c10

Link: https://github.com/ClangBuiltLinux/linux/issues/939
Cc: clang-built-linux@xxxxxxxxxxxxxxxx
Signed-off-by: Fangrui Song <maskray@xxxxxxxxxx>
---
arch/arm64/include/asm/sysreg.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index ebc622432831..af21e2ec5e3e 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -49,7 +49,9 @@
#ifndef CONFIG_BROKEN_GAS_INST

#ifdef __ASSEMBLY__
-#define __emit_inst(x) .inst (x)
+// The space separator is omitted so that __emit_inst(x) can be parsed as
+// either a directive or a macro argument.
+#define __emit_inst(x) .inst(x)
#else
#define __emit_inst(x) ".inst " __stringify((x)) "\n\t"
#endif
--
2.26.0.110.g2183baf09c-goog