Re: Avoiding unnecessary jump relocations in gas?

From: H.J. Lu
Date: Fri May 08 2015 - 08:09:17 EST


On Thu, May 7, 2015 at 8:22 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Thu, May 7, 2015 at 9:21 AM, H.J. Lu <hjl.tools@xxxxxxxxx> wrote:
>> On Thu, May 7, 2015 at 4:52 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>>>> On 07.05.15 at 08:02, <luto@xxxxxxxxxxxxxx> wrote:
>>>> AFAICT gas will produce relocations for jumps to global labels in the
>>>> same file. This doesn't seem directly harmful to me, except that, on
>>>> x86, it forces five-byte jumps instead of two-byte jumps.
>>>>
>>>> This seems especially unfortunate, since even hidden and protected
>>>> symbols have this problem.
>>>>
>>>> Given that many users don't want interposition support (especially the
>>>> kernel and anyone using .hidden or .protected), it would be nice to
>>>> have a command-line option to turn this off and probably also to turn
>>>> it off by default for hidden and protected symbols. Can gas do this?
>>>
>>> I've been running with the below changes (taken off of a bigger set
>>> of changes, so the line numbers may look a little odd) for the last
>>> couple of years. I never tried to submit this change because so far
>>> I couldn't find the time to check whether this would have any
>>> unwanted side effects on cases I don't normally use.
>>>
>>
>> This is the patch I checked in.
>>
>> Thanks.
>>
>> --
>> H.J.
>> ---
>> Branches to global non-weak symbols defined in the same segment with
>> non-default visibility can be optimized the same way as branches to
>> local symbols.
>
> Would it make sense to also add a command line option along the lines
> of gcc's -fno-semantic-interposition or some way to override the
> default visibility? AFAICS this patch helps but only if asm code gets
> liberally sprinkled with .hidden or .protected directives.
>

This is what I checked in. With

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 2fda005..186e6f7 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -107,6 +107,10 @@ else
KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args)
endif

+NO_SHARED_CFLAGS = $(call as-option,-Wa$(comma)-mno-shared)
+KBUILD_CFLAGS += $(NO_SHARED_CFLAGS)
+KBUILD_AFLAGS += $(NO_SHARED_CFLAGS)
+
# Make sure compiler does not have buggy stack-protector support.
ifdef CONFIG_CC_STACKPROTECTOR
cc_has_sp := $(srctree)/scripts/gcc-x86_$(BITS)-has-stack-protector.sh

On kernel master branch, I got

text data bss dec hex filename
10934167 2275232 1609728 14819127 e21f37 vmlinux.old
10934119 2275232 1609728 14819079 e21f07 vmlinux

It saves 48 bytes.

--
H.J.
From 7ae909be2dce6c47045e66fe94bd1a8261db1761 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@xxxxxxxxx>
Date: Fri, 8 May 2015 05:04:12 -0700
Subject: [PATCH] Add -mno-shared to x86 assembler

On ELF target, the assembler normally generates code which can go into a
shared library where non-weak symbols can be preempted. The -mno-shared
option tells the assembler to generate code not for a shared library,
where non-weak symbols won't be preempted. The resulting code is slightly
smaller. This option mainly affects the handling of branch instructions.

gas/

* config/tc-i386.c (no_shared): New.
(OPTION_MNO_SHARED): Likewise.
(elf_symbol_resolved_in_segment_p): Check no_shared.
(md_longopts): Add mno-shared.
(md_parse_option): Handle OPTION_MNO_SHARED.
(md_show_usage): Add -mno-shared.
* doc/c-i386.texi: Document -mno-shared.

gas/testsuite/

* gas/i386/i386.exp: Run relax-4 and x86-64-relax-3.
* gas/i386/relax-4.d: New file.
* gas/i386/x86-64-relax-3.d: Likewise.
---
gas/ChangeLog | 10 ++++++++++
gas/config/tc-i386.c | 17 +++++++++++++++++
gas/doc/c-i386.texi | 10 ++++++++++
gas/testsuite/ChangeLog | 6 ++++++
gas/testsuite/gas/i386/i386.exp | 2 ++
gas/testsuite/gas/i386/relax-4.d | 32 ++++++++++++++++++++++++++++++++
gas/testsuite/gas/i386/x86-64-relax-3.d | 33 +++++++++++++++++++++++++++++++++
7 files changed, 110 insertions(+)
create mode 100644 gas/testsuite/gas/i386/relax-4.d
create mode 100644 gas/testsuite/gas/i386/x86-64-relax-3.d

diff --git a/gas/ChangeLog b/gas/ChangeLog
index 9758e72..9bf6931 100644
--- a/gas/ChangeLog
+++ b/gas/ChangeLog
@@ -1,3 +1,13 @@
+2015-05-08 H.J. Lu <hongjiu.lu@xxxxxxxxx>
+
+ * config/tc-i386.c (no_shared): New.
+ (OPTION_MNO_SHARED): Likewise.
+ (elf_symbol_resolved_in_segment_p): Check no_shared.
+ (md_longopts): Add mno-shared.
+ (md_parse_option): Handle OPTION_MNO_SHARED.
+ (md_show_usage): Add -mno-shared.
+ * doc/c-i386.texi: Document -mno-shared.
+
2015-05-07 H.J. Lu <hongjiu.lu@xxxxxxxxx>

* config/tc-i386.c (elf_symbol_resolved_in_segment_p): New.
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index c4ba13d..bbc0969 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -524,6 +524,11 @@ static enum x86_elf_abi x86_elf_abi = I386_ABI;
static int use_big_obj = 0;
#endif

+#if defined (OBJ_ELF) || defined (OBJ_MAYBE_ELF)
+/* 1 if not generating code for a shared library. */
+static int no_shared = 0;
+#endif
+
/* 1 for intel syntax,
0 if att syntax. */
static int intel_syntax = 0;
@@ -8785,6 +8790,10 @@ elf_symbol_resolved_in_segment_p (symbolS *fr_symbol)
/* Symbol may be weak or local. */
return !S_IS_WEAK (fr_symbol);

+ /* Non-weak symbols won't be preempted. */
+ if (no_shared)
+ return 1;
+
/* Global symbols with default visibility in a shared library may be
preempted by another definition. */
return ELF_ST_VISIBILITY (S_GET_OTHER (fr_symbol)) != STV_DEFAULT;
@@ -9484,6 +9493,7 @@ const char *md_shortopts = "qn";
#define OPTION_MBIG_OBJ (OPTION_MD_BASE + 18)
#define OPTION_OMIT_LOCK_PREFIX (OPTION_MD_BASE + 19)
#define OPTION_MEVEXRCIG (OPTION_MD_BASE + 20)
+#define OPTION_MNO_SHARED (OPTION_MD_BASE + 21)

struct option md_longopts[] =
{
@@ -9494,6 +9504,7 @@ struct option md_longopts[] =
#endif
#if defined (OBJ_ELF) || defined (OBJ_MAYBE_ELF)
{"x32", no_argument, NULL, OPTION_X32},
+ {"mno-shared", no_argument, NULL, OPTION_MNO_SHARED},
#endif
{"divide", no_argument, NULL, OPTION_DIVIDE},
{"march", required_argument, NULL, OPTION_MARCH},
@@ -9554,6 +9565,10 @@ md_parse_option (int c, char *arg)
/* -s: On i386 Solaris, this tells the native assembler to use
.stab instead of .stab.excl. We always use .stab anyhow. */
break;
+
+ case OPTION_MNO_SHARED:
+ no_shared = 1;
+ break;
#endif
#if (defined (OBJ_ELF) || defined (OBJ_MAYBE_ELF) \
|| defined (TE_PE) || defined (TE_PEP) || defined (OBJ_MACH_O))
@@ -9980,6 +9995,8 @@ md_show_usage (FILE *stream)
-mold-gcc support old (<= 2.8.1) versions of gcc\n"));
fprintf (stream, _("\
-madd-bnd-prefix add BND prefix for all valid branches\n"));
+ fprintf (stream, _("\
+ -mno-shared enable branch optimization for non shared code\n"));
# if defined (TE_PE) || defined (TE_PEP)
fprintf (stream, _("\
-mbig-obj generate big object files\n"));
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 7f0e79f..eb6790c 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -297,6 +297,16 @@ The @code{.att_syntax} and @code{.intel_syntax} directives will take precedent.
This option forces the assembler to add BND prefix to all branches, even
if such prefix was not explicitly specified in the source code.

+@cindex @samp{-mno-shared} option, i386
+@cindex @samp{-mno-shared} option, x86-64
+@item -mno-shared
+On ELF target, the assembler normally generates code which can go into a
+shared library where non-weak symbols can be preempted. The
+@samp{-mno-shared} option tells the assembler to generate code not for
+a shared library, where non-weak symbols won't be preempted. The
+resulting code is slightly smaller. This option mainly affects the
+handling of branch instructions.
+
@cindex @samp{-mbig-obj} option, x86-64
@item -mbig-obj
On x86-64 PE/COFF target this option forces the use of big object file
diff --git a/gas/testsuite/ChangeLog b/gas/testsuite/ChangeLog
index 1cfa577..d63f48a 100644
--- a/gas/testsuite/ChangeLog
+++ b/gas/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2015-05-08 H.J. Lu <hongjiu.lu@xxxxxxxxx>
+
+ * gas/i386/i386.exp: Run relax-4 and x86-64-relax-3.
+ * gas/i386/relax-4.d: New file.
+ * gas/i386/x86-64-relax-3.d: Likewise.
+
2015-05-07 H.J. Lu <hongjiu.lu@xxxxxxxxx>

* gas/i386/i386.exp: Run relax-3 and x86-64-relax-2.
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index af56c26..bedd84c 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -396,6 +396,7 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_32_check]]
run_dump_test "note"

run_dump_test "relax-3"
+ run_dump_test "relax-4"
}

# This is a PE specific test.
@@ -752,6 +753,7 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t
run_list_test "x86-64-size-inval-1" "-al"

run_dump_test "x86-64-relax-2"
+ run_dump_test "x86-64-relax-3"
}

set ASFLAGS "$old_ASFLAGS"
diff --git a/gas/testsuite/gas/i386/relax-4.d b/gas/testsuite/gas/i386/relax-4.d
new file mode 100644
index 0000000..b188841
--- /dev/null
+++ b/gas/testsuite/gas/i386/relax-4.d
@@ -0,0 +1,32 @@
+#source: relax-3.s
+#as: -mno-shared
+#objdump: -dwr
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <foo>:
+[ ]*[a-f0-9]+: eb 1c jmp 1e <local>
+[ ]*[a-f0-9]+: eb 16 jmp 1a <hidden_def>
+[ ]*[a-f0-9]+: eb 16 jmp 1c <global_def>
+[ ]*[a-f0-9]+: e9 fc ff ff ff jmp 7 <foo\+0x7> 7: (R_386_PC)?(DISP)?32 weak_def
+[ ]*[a-f0-9]+: e9 fc ff ff ff jmp c <foo\+0xc> c: (R_386_PC)?(DISP)?32 weak_hidden_undef
+[ ]*[a-f0-9]+: e9 fc ff ff ff jmp 11 <foo\+0x11> 11: (R_386_PC)?(DISP)?32 weak_hidden_def
+[ ]*[a-f0-9]+: e9 fc ff ff ff jmp 16 <foo\+0x16> 16: (R_386_PC)?(DISP)?32 hidden_undef
+
+0+1a <hidden_def>:
+[ ]*[a-f0-9]+: c3 ret
+
+0+1b <weak_hidden_def>:
+[ ]*[a-f0-9]+: c3 ret
+
+0+1c <global_def>:
+[ ]*[a-f0-9]+: c3 ret
+
+0+1d <weak_def>:
+[ ]*[a-f0-9]+: c3 ret
+
+0+1e <local>:
+[ ]*[a-f0-9]+: c3 ret
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-relax-3.d b/gas/testsuite/gas/i386/x86-64-relax-3.d
new file mode 100644
index 0000000..d0c7ee4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-relax-3.d
@@ -0,0 +1,33 @@
+#source: relax-3.s
+#as: -mno-shared
+#objdump: -dwr
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <foo>:
+[ ]*[a-f0-9]+: eb 1c jmp 1e <local>
+[ ]*[a-f0-9]+: eb 16 jmp 1a <hidden_def>
+[ ]*[a-f0-9]+: eb 16 jmp 1c <global_def>
+[ ]*[a-f0-9]+: e9 00 00 00 00 jmpq b <foo\+0xb> 7: R_X86_64_PC32 weak_def-0x4
+[ ]*[a-f0-9]+: e9 00 00 00 00 jmpq 10 <foo\+0x10> c: R_X86_64_PC32 weak_hidden_undef-0x4
+[ ]*[a-f0-9]+: e9 00 00 00 00 jmpq 15 <foo\+0x15> 11: R_X86_64_PC32 weak_hidden_def-0x4
+[ ]*[a-f0-9]+: e9 00 00 00 00 jmpq 1a <hidden_def> 16: R_X86_64_PC32 hidden_undef-0x4
+
+0+1a <hidden_def>:
+[ ]*[a-f0-9]+: c3 retq
+
+0+1b <weak_hidden_def>:
+[ ]*[a-f0-9]+: c3 retq
+
+0+1c <global_def>:
+[ ]*[a-f0-9]+: c3 retq
+
+0+1d <weak_def>:
+[ ]*[a-f0-9]+: c3 retq
+
+0+1e <local>:
+[ ]*[a-f0-9]+: c3 retq
+#pass
--
2.1.0