Re: [RFC PATCH 0/5] make use of gcc 9's "asm inline()"

From: Masahiro Yamada
Date: Thu Aug 29 2019 - 15:42:02 EST


On Fri, Aug 30, 2019 at 2:36 AM Nick Desaulniers
<ndesaulniers@xxxxxxxxxx> wrote:
>
> + Masahiro, who sent patches earlier this week also looking to clean
> up our redefinition of `inline` more.
> https://lkml.org/lkml/2019/8/28/44

I think you noticed what I was trying to do.

Yes, my work for cleaning 'inline' was initially motivated
by the 'asm inline' of GCC 9.

The ultimate goal is to stop macrofying 'inline', but
I am not sure whether or not it is achievable.



> On Thu, Aug 29, 2019 at 1:32 AM Rasmus Villemoes
> <linux@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > gcc 9 provides a way to override the otherwise crude heuristic that
> > gcc uses to estimate the size of the code represented by an asm()
> > statement. From the gcc docs
> >
> > If you use 'asm inline' instead of just 'asm', then for inlining
> > purposes the size of the asm is taken as the minimum size, ignoring
> > how many instructions GCC thinks it is.
>
> Hi Rasmus,
> Thanks for the RFC and including me in the discussion. Can you link
> me to the release notes this came from, and the bug related to this
> features development?

Just in care you are interested, this started from this thread:
https://lkml.org/lkml/2018/10/7/92



> I'm curious what "the size of the asm" means, and how it differs
> precisely from "how many instructions GCC thinks it is." I would
> think those are one and the same? Or maybe "the size of the asm"
> means the size in bytes when assembled to machine code, as opposed to
> the count of assembly instructions?
>
> It looks like LLVM estimates based on instruction count, not assembled
> byte size:
> https://github.com/llvm/llvm-project/blob/6289ee941d6f8fc222225fb6845efce477bf5094/llvm/lib/CodeGen/TargetInstrInfo.cpp#L75-L125
> (That's an arch independent version, that only has an override for hexagon)
> I'd imagine it would be possible to implement a machine-code-size
> based estimate.
>
> That one snippet alludes to a problem with the existing `asm` GNU C
> extension. (I personally have fixed bugs in LLVM where forgetting to
> measure the estimated size of inline assembly leads to instruction
> selection that can't represent the appropriate jump ranges. Knowing
> the size of inline assembly is important for the inliner's cost model,
> and AIUI gets difficult for some of the assembler directives) If the
> internal heuristic had an issue, I'm curious to know more about the
> design decision that led to the introduction of a new GNU C extension
> rather than adjustments to the existing heuristic or cost model, and I
> assume the bug would have more info? (Maybe giving developers choice
> between two different cost models? Did actual code break changing the
> existing `asm` cost model? Feels like a choice between an imprecise
> and a precise cost model, but at this point I'm speculating too far
> ahead.)
>
> In particular the reuse of the keyword `inline` in the new GNU C
> extension is problematic for our code, as your patchset clearly
> illustrates. I understand that adding new keywords to a language is
> impossibly difficult (how many different meanings does `static` have
> now, in different contexts) and thus the reuse of them in differing
> parse contexts is common (reuse of `auto` in C++11 comes to mind,
> though it's not a differing parse context). I think if the GCC
> developers were aware that we redefine the `inline` keyword via C
> preprocessor, they may have chosen a different design. But as you
> illustrate, the changes we'd have to make to the kernel are not
> insurmountable.
>
> Tangent:
> There are many GNU C extensions I love and I hope ISO will adopt.
> I've been talking with Hans Boehm about joining up ISO WG14
> specifically to help standardize them. But then again, I don't like
> standards that I can't download (the ratified version, not a draft
> version). I also think it's better for implementers to have more say
> over standards, and the W3C/WHATWG split is a very very interesting
> case study in this regard. That said, I wish more LLVM folks were
> included in the design discussion of GNU C extensions; as trying to
> reimplement new language features can flush out edge cases that allow
> us to nail down behavior in the spec (let alone have a spec) ASAP.
> The only way I find out about new GNU C extensions is when someone in
> the kernel goes to use them, and Clang doesn't support it, then the
> build is broken until we support them. :(
>
> >
> > For compatibility with older compilers, we obviously want a
> >
> > #if new enough
> > #define asm_inline asm inline
> > #else
> > #define asm_inline asm
> > #endif
>
> Requesting a feature test macro to the GCC developers would be great;
> then we could use feature detection in one place rather than brittle
> version checks in multiple places. Imagine the C preprocessor would
> define something like HAS_GNU_ASM_INLINE, then writing a guard would
> be simple, and other compilers could define the same preprocessor
> token when they add support. GCC already does this for a recent
> extension to inline asm for x86 (I forget the feature name, something
> to do with hinting at the flags or output IIRC, Redhat had a blog post
> and we recently implemented support).


If interested, my prototype-patch is here:
https://lkml.org/lkml/2018/12/13/212

I implemented the feature test in Kconfig.

config CC_HAS_ASM_INLINE
def_bool $(success,echo 'void foo(void) { asm inline (""); }' |
$(CC) -x c - -c -o /dev/null)

I am also fine with checking it by version
since we know GCC 9 started supporting it.


> >
> > But since we #define the identifier inline to attach some attributes,
> > we have to use the alternate spelling __inline__ of that
> > keyword. Unfortunately, we also currently #define that one (to
> > inline), so we first have to get rid of all (mis)uses of
> > __inline__. Hence the huge diffstat. I think patch 1 should be
> > regenerated and applied just before -rc1.
> >
> > There are a few remaining users of __inline__, in particular in uapi
> > headers, which I'm not sure what to do about (kernel-side users of
> > those will get slightly different semantics since we no longer
> > automatically attach the __attribute__((__gnu_inline__)) etc.). So RFC.

The exported headers must use '__inline__' instead of 'inline':
This is now compile-tested.
https://github.com/torvalds/linux/blob/v5.3-rc6/usr/include/Makefile#L5


At this moment, 'inline' is replaced with '__inline__' with sed:
https://github.com/torvalds/linux/blob/v5.3-rc6/scripts/headers_install.sh#L37

But, I'd like to reduce the sed patterns in the future.



I think using '__inline' for asm_inlie will be OK.




> No thoughts on uapi, but I think we should break this work logically into two:
> 1. remove our redefinition of inline
> 2. implement asm_inline for GCC
>
> I think what you have for 2 so far is ok, but I need to spend more
> time thinking about it.
>
> For 1:
> Our redefinition of inline currently looks like:
>
> 146 #define inline inline __attribute__((__always_inline__))
> __gnu_inline \
> 147 __maybe_unused notrace
>
> So we have:
> 1. always_inline attribute
> 2. gnu_inline attribute
> 3. maybe_unused
> 4. notrace
>
> I'm not convinced that always_inline works as intended. An inliner
> can still refuse to inline something if it doesn't have the machinery
> to perform all of the transformations required for inlining or it's
> not safe to do so. The C preprocessor is the one true always inline
> (and type safety be damned). It would be interesting to study the
> effects of removing that attribute. Android's Java runtime, ART uses
> this everywhere for all functions, and it's not clear that adding
> attribute always_inline everywhere is an "optimization." Research
> folks at Google are playing with finding better inlining heuristics
> via machine learning, which is quite exciting.

I want to get rid of always_inline.

My commit 9012d011660ea5cf2a623e1de207a2bc0ca6936d
is the first step towards the goal.
All architectures support CONFIG_OPTIMIZE_INLINING,
and it is stable up to now, I think.

Next, set CONFIG_OPTIMIZE_INLINING=y forcibly,
and lastly, remove CONFIG_OPTIMIZE_INLINING and always_inline
entirely.



> I introduced gnu_inline; it's like the one semantically different
> thing from C89 to C99 IIRC. I introduced it because a few places in
> the kernel were redefining KBUILD_CFLAGS, dropping `-std=gnu89`. It
> seems now that there's only a few places left that do that, and
> they're all under arch/x86/ (see:
> https://github.com/ClangBuiltLinux/linux/issues/618#issuecomment-525712390).
> Note that it's a little tricky to undo this; someone just reported
> yesterday that I broke kexec, and we're working on cleaning that up,
> but I think doing that then adding a check to not redefine
> KBUILD_CFLAGS (cc Joe) to scripts/checkpatch.pl would doable. If we
> fixed that, than we could use `-fgnu_inline` (or w/e the spelling is)
> command line flag instead of the compiler attribute.

Probably we can get rid of this too.

> Masahiro is playing around with the maybe_unused part. Seems to be a
> difference in GCC and Clang warning on unused static inline functions.
> I think this can be solved with correct usage of #ifdef guards for the
> appropriate CONFIG_'s, rather than __maybe_unused.

Yes.

> notrace's definition is pretty complicated, I have no idea what any of
> those attributes do...
>
> But maybe all of that is moot, if we just use __inline__. Looking
> more at your patchset after typing this all out, seems like that will
> work.
>
> >
> > The two x86 changes cause smaller code gen differences than I'd
> > expect, but I think we do want the asm_inline thing available sooner
> > or later, so this is just to get the ball rolling.
>
> Is it a net win for all arch's? Or just x86? `differences` being an
> improvement or a regression?
>
>
> >
> > Rasmus Villemoes (5):
> > treewide: replace __inline__ by inline
> > compiler_types.h: don't #define __inline__
> > compiler-gcc.h: add asm_inline definition
> > x86: alternative.h: use asm_inline for all alternative variants
> > x86: bug.h: use asm_inline in _BUG_FLAGS definitions
> >
> > arch/alpha/include/asm/atomic.h | 12 +++---
> > arch/alpha/include/asm/bitops.h | 6 +--
> > arch/alpha/include/asm/dma.h | 22 +++++-----
> > arch/alpha/include/asm/floppy.h | 2 +-
> > arch/alpha/include/asm/irq.h | 2 +-
> > arch/alpha/include/asm/local.h | 4 +-
> > arch/alpha/include/asm/smp.h | 2 +-
> > .../arm/mach-iop32x/include/mach/uncompress.h | 2 +-
> > .../arm/mach-iop33x/include/mach/uncompress.h | 2 +-
> > .../arm/mach-ixp4xx/include/mach/uncompress.h | 2 +-
> > arch/ia64/hp/common/sba_iommu.c | 2 +-
> > arch/ia64/hp/sim/simeth.c | 2 +-
> > arch/ia64/include/asm/atomic.h | 8 ++--
> > arch/ia64/include/asm/bitops.h | 34 ++++++++--------
> > arch/ia64/include/asm/delay.h | 14 +++----
> > arch/ia64/include/asm/irq.h | 2 +-
> > arch/ia64/include/asm/page.h | 2 +-
> > arch/ia64/include/asm/sn/leds.h | 2 +-
> > arch/ia64/include/asm/uaccess.h | 4 +-
> > arch/ia64/oprofile/backtrace.c | 4 +-
> > arch/m68k/include/asm/blinken.h | 2 +-
> > arch/m68k/include/asm/checksum.h | 2 +-
> > arch/m68k/include/asm/dma.h | 32 +++++++--------
> > arch/m68k/include/asm/floppy.h | 8 ++--
> > arch/m68k/include/asm/nettel.h | 8 ++--
> > arch/m68k/mac/iop.c | 14 +++----
> > arch/mips/include/asm/atomic.h | 16 ++++----
> > arch/mips/include/asm/checksum.h | 2 +-
> > arch/mips/include/asm/dma.h | 20 +++++-----
> > arch/mips/include/asm/jazz.h | 2 +-
> > arch/mips/include/asm/local.h | 4 +-
> > arch/mips/include/asm/string.h | 8 ++--
> > arch/mips/kernel/binfmt_elfn32.c | 2 +-
> > arch/nds32/include/asm/swab.h | 4 +-
> > arch/parisc/include/asm/atomic.h | 20 +++++-----
> > arch/parisc/include/asm/bitops.h | 18 ++++-----
> > arch/parisc/include/asm/checksum.h | 4 +-
> > arch/parisc/include/asm/compat.h | 2 +-
> > arch/parisc/include/asm/delay.h | 2 +-
> > arch/parisc/include/asm/dma.h | 20 +++++-----
> > arch/parisc/include/asm/ide.h | 8 ++--
> > arch/parisc/include/asm/irq.h | 2 +-
> > arch/parisc/include/asm/spinlock.h | 12 +++---
> > arch/powerpc/include/asm/atomic.h | 40 +++++++++----------
> > arch/powerpc/include/asm/bitops.h | 28 ++++++-------
> > arch/powerpc/include/asm/dma.h | 20 +++++-----
> > arch/powerpc/include/asm/edac.h | 2 +-
> > arch/powerpc/include/asm/irq.h | 2 +-
> > arch/powerpc/include/asm/local.h | 14 +++----
> > arch/sh/include/asm/pgtable_64.h | 2 +-
> > arch/sh/include/asm/processor_32.h | 4 +-
> > arch/sh/include/cpu-sh3/cpu/dac.h | 6 +--
> > arch/x86/include/asm/alternative.h | 14 +++----
> > arch/x86/include/asm/bug.h | 4 +-
> > arch/x86/um/asm/checksum.h | 4 +-
> > arch/x86/um/asm/checksum_32.h | 4 +-
> > arch/xtensa/include/asm/checksum.h | 14 +++----
> > arch/xtensa/include/asm/cmpxchg.h | 4 +-
> > arch/xtensa/include/asm/irq.h | 2 +-
> > block/partitions/amiga.c | 2 +-
> > drivers/atm/he.c | 6 +--
> > drivers/atm/idt77252.c | 6 +--
> > drivers/gpu/drm/mga/mga_drv.h | 2 +-
> > drivers/gpu/drm/mga/mga_state.c | 14 +++----
> > drivers/gpu/drm/r128/r128_drv.h | 2 +-
> > drivers/gpu/drm/r128/r128_state.c | 14 +++----
> > drivers/gpu/drm/via/via_irq.c | 2 +-
> > drivers/gpu/drm/via/via_verifier.c | 30 +++++++-------
> > drivers/media/pci/ivtv/ivtv-ioctl.c | 2 +-
> > drivers/net/ethernet/sun/sungem.c | 8 ++--
> > drivers/net/ethernet/sun/sunhme.c | 6 +--
> > drivers/net/hamradio/baycom_ser_fdx.c | 2 +-
> > drivers/net/wan/lapbether.c | 2 +-
> > drivers/net/wan/n2.c | 4 +-
> > drivers/parisc/led.c | 4 +-
> > drivers/parisc/sba_iommu.c | 2 +-
> > drivers/parport/parport_gsc.h | 4 +-
> > drivers/scsi/lpfc/lpfc_scsi.c | 2 +-
> > drivers/scsi/pcmcia/sym53c500_cs.c | 4 +-
> > drivers/scsi/qla2xxx/qla_inline.h | 2 +-
> > drivers/scsi/qla2xxx/qla_os.c | 2 +-
> > drivers/tty/amiserial.c | 2 +-
> > drivers/tty/serial/ip22zilog.c | 2 +-
> > drivers/tty/serial/sunsab.c | 4 +-
> > drivers/tty/serial/sunzilog.c | 2 +-
> > drivers/video/fbdev/core/fbcon.c | 20 +++++-----
> > drivers/video/fbdev/ffb.c | 2 +-
> > drivers/video/fbdev/intelfb/intelfbdrv.c | 8 ++--
> > drivers/video/fbdev/intelfb/intelfbhw.c | 2 +-
> > drivers/w1/masters/matrox_w1.c | 4 +-
> > fs/coda/coda_linux.h | 6 +--
> > fs/freevxfs/vxfs_inode.c | 2 +-
> > fs/nfsd/nfsfh.h | 4 +-
> > include/acpi/platform/acgcc.h | 2 +-
> > include/asm-generic/ide_iops.h | 8 ++--
> > include/linux/atalk.h | 4 +-
> > include/linux/compiler-gcc.h | 4 ++
> > include/linux/compiler_types.h | 5 ++-
> > include/linux/hdlc.h | 4 +-
> > include/linux/inetdevice.h | 6 +--
> > include/linux/parport.h | 4 +-
> > include/linux/parport_pc.h | 22 +++++-----
> > include/net/ax25.h | 2 +-
> > include/net/checksum.h | 2 +-
> > include/net/dn_nsp.h | 16 ++++----
> > include/net/ip.h | 2 +-
> > include/net/ip6_checksum.h | 2 +-
> > include/net/ipx.h | 10 ++---
> > include/net/llc_c_ev.h | 4 +-
> > include/net/llc_conn.h | 4 +-
> > include/net/llc_s_ev.h | 2 +-
> > include/net/netrom.h | 8 ++--
> > include/net/scm.h | 14 +++----
> > include/net/udplite.h | 2 +-
> > include/net/x25.h | 8 ++--
> > include/net/xfrm.h | 18 ++++-----
> > include/video/newport.h | 12 +++---
> > net/appletalk/atalk_proc.c | 4 +-
> > net/appletalk/ddp.c | 2 +-
> > net/core/neighbour.c | 2 +-
> > net/core/scm.c | 2 +-
> > net/decnet/dn_nsp_in.c | 2 +-
> > net/decnet/dn_nsp_out.c | 2 +-
> > net/decnet/dn_route.c | 2 +-
> > net/decnet/dn_table.c | 4 +-
> > net/ipv6/af_inet6.c | 2 +-
> > net/ipv6/icmp.c | 4 +-
> > net/ipv6/udp.c | 2 +-
> > net/lapb/lapb_iface.c | 4 +-
> > net/llc/llc_input.c | 2 +-
> > sound/sparc/amd7930.c | 6 +--
> > 131 files changed, 449 insertions(+), 442 deletions(-)
> >
> > --
> > 2.20.1
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers



--
Best Regards

Masahiro Yamada