[PATCH 6/7] kallsyms: add /proc/kallmodsyms

From: Nick Alcock
Date: Fri Jul 09 2021 - 18:26:30 EST


Use the tables added in the previous commits to introduce a new
/proc/kallmodsyms, in which [module names] are also given for things
that *could* have been modular had they not been built in to the kernel.
So symbols that are part of, say, ext4 are reported as [ext4] even if
ext4 happens to be buiilt in to the kernel in this configuration.

Symbols that are part of multiple modules at the same time are shown
with [multiple] [module names]: consumers will have to be ready to
handle such lines. Also, kernel symbols for built-in modules will be
sorted by size, as usual for the core kernel, so will probably appear
interspersed with other symbols that are part of different modules and
non-modular always-built-in symbols, which, as usual, have no
square-bracketed module denotation. This differs from /proc/kallsyms,
where all symbols associated with a module will always appear in a group
(and randomly ordered).

The result looks like this:

ffffffff8b013d20 t pt_buffer_setup_aux
ffffffff8b014130 T intel_pt_interrupt
ffffffff8b014250 T cpu_emergency_stop_pt
ffffffff8b014280 t rapl_pmu_event_init [intel_rapl_perf]
ffffffff8b0143c0 t rapl_event_update [intel_rapl_perf]
ffffffff8b014480 t rapl_pmu_event_read [intel_rapl_perf]
ffffffff8b014490 t rapl_cpu_offline [intel_rapl_perf]
ffffffff8b014540 t __rapl_event_show [intel_rapl_perf]
ffffffff8b014570 t rapl_pmu_event_stop [intel_rapl_perf]

This is emitted even if intel_rapl_perf is built into the kernel (but,
obviously, not if it's not in the .config at all, or is in a module that
is not loaded).

Further down, we see what happens when object files are reused by
multiple modules, all of which are built in to the kernel:

ffffffffa22b3aa0 t handle_timestamp [liquidio]
ffffffffa22b3b50 t free_netbuf [liquidio]
ffffffffa22b3ba0 t liquidio_ptp_settime [liquidio]
ffffffffa22b3c30 t liquidio_ptp_adjfreq [liquidio]
[...]
ffffffffa22b9490 t lio_vf_rep_create [liquidio]
ffffffffa22b96a0 t lio_vf_rep_destroy [liquidio]
ffffffffa22b9810 t lio_vf_rep_modinit [liquidio]
ffffffffa22b9830 t lio_vf_rep_modexit [liquidio]
ffffffffa22b9850 t lio_ethtool_get_channels [liquidio] [liquidio_vf]
ffffffffa22b9930 t lio_ethtool_get_ringparam [liquidio] [liquidio_vf]
ffffffffa22b99d0 t lio_get_msglevel [liquidio] [liquidio_vf]
ffffffffa22b99f0 t lio_vf_set_msglevel [liquidio] [liquidio_vf]
ffffffffa22b9a10 t lio_get_pauseparam [liquidio] [liquidio_vf]
ffffffffa22b9a40 t lio_get_ethtool_stats [liquidio] [liquidio_vf]
ffffffffa22ba180 t lio_vf_get_ethtool_stats [liquidio] [liquidio_vf]
ffffffffa22ba4f0 t lio_get_regs_len [liquidio] [liquidio_vf]
ffffffffa22ba530 t lio_get_priv_flags [liquidio] [liquidio_vf]
ffffffffa22ba550 t lio_set_priv_flags [liquidio] [liquidio_vf]
ffffffffa22ba580 t lio_set_fecparam [liquidio] [liquidio_vf]
ffffffffa22ba5f0 t lio_get_fecparam [liquidio] [liquidio_vf]
[...]
ffffffffa22cbd10 t liquidio_set_mac [liquidio_vf]
ffffffffa22cbe90 t handle_timestamp [liquidio_vf]
ffffffffa22cbf40 t free_netbuf [liquidio_vf]
ffffffffa22cbf90 t octnet_link_status_change [liquidio_vf]
ffffffffa22cbfc0 t liquidio_vxlan_port_command.constprop.0 [liquidio_vf]

Like /proc/kallsyms, the output is driven by address, so keeps the
curious property of /proc/kallsyms that symbols (like free_netbuf above)
may appear repeatedly with different addresses: but now, unlike in
/proc/kallsyms, we can see that those symbols appear repeatedly because
they are *different symbols* that ultimately belong to different
modules, all of which are built in to the kernel.

As with /proc/kallsyms, non-root usage produces addresses that are
all zero.

I am not wedded to the name or format of /proc/kallmodsyms, but felt it
best to split it out of /proc/kallsyms to avoid breaking existing
kallsyms parsers. Another possible syntax might be to use {curly
brackets} or something to denote built-in modules: it might be possible
to drop /proc/kallmodsyms and make /proc/kallsyms emit things in this
format. (Equally, now kallmodsyms data uses very little space, the
CONFIG_KALLMODSYMS config option might be something people don't want to
bother with.)

Internally, this uses a new kallsyms_builtin_module_address() almost
identical to kallsyms_sym_address() to get the address corresponding to
a given .kallsyms_modules index, and a new get_builtin_module_idx quite
similar to get_symbol_pos to determine the index in the
.kallsyms_modules array that relates to a given address. Save a little
time by exploiting the fact that all callers will only ever traverse
this list from start to end by allowing them to pass in the previous
index returned from this function as a hint: thus very few bsearches are
actually needed. (In theory this could change to just walk straight
down kallsyms_module_addresses/offsets and not bother bsearching at all,
but doing it this way is hardly any slower and much more robust.)

The display process is complicated a little by the weird format of the
.kallsyms_module_names table: we have to look for multimodule entries
and print them as space-separated lists of module names.

Signed-off-by: Nick Alcock <nick.alcock@xxxxxxxxxx>
---
kernel/kallsyms.c | 241 +++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 226 insertions(+), 15 deletions(-)

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index c851ca0ed357..ac095691008a 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -45,8 +45,18 @@ __section(".rodata") __attribute__((weak));
extern const unsigned long kallsyms_relative_base
__section(".rodata") __attribute__((weak));

+extern const unsigned long kallsyms_num_modules
+__section(".rodata") __attribute__((weak));
+
+extern const unsigned long kallsyms_module_names_len
+__section(".rodata") __attribute__((weak));
+
extern const char kallsyms_token_table[] __weak;
extern const u16 kallsyms_token_index[] __weak;
+extern const unsigned long kallsyms_module_addresses[] __weak;
+extern const int kallsyms_module_offsets[] __weak;
+extern const u32 kallsyms_modules[] __weak;
+extern const char kallsyms_module_names[] __weak;

extern const unsigned int kallsyms_markers[] __weak;

@@ -182,6 +192,25 @@ static inline bool cleanup_symbol_name(char *s)
static inline bool cleanup_symbol_name(char *s) { return false; }
#endif

+#ifdef CONFIG_KALLMODSYMS
+static unsigned long kallsyms_builtin_module_address(int idx)
+{
+ if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
+ return kallsyms_module_addresses[idx];
+
+ /* values are unsigned offsets if --absolute-percpu is not in effect */
+ if (!IS_ENABLED(CONFIG_KALLSYMS_ABSOLUTE_PERCPU))
+ return kallsyms_relative_base + (u32)kallsyms_module_offsets[idx];
+
+ /* ...otherwise, positive offsets are absolute values */
+ if (kallsyms_module_offsets[idx] >= 0)
+ return kallsyms_module_offsets[idx];
+
+ /* ...and negative offsets are relative to kallsyms_relative_base - 1 */
+ return kallsyms_relative_base - 1 - kallsyms_module_offsets[idx];
+}
+#endif
+
/* Lookup the address for this symbol. Returns 0 if not found. */
unsigned long kallsyms_lookup_name(const char *name)
{
@@ -285,6 +314,54 @@ static unsigned long get_symbol_pos(unsigned long addr,
return low;
}

+/*
+ * The caller passes in an address, and we return an index to the corresponding
+ * builtin module index in .kallsyms_modules, or (unsigned long) -1 if none
+ * match.
+ *
+ * The hint_idx, if set, is a hint as to the possible return value, to handle
+ * the common case in which consecutive runs of addresses relate to the same
+ * index.
+ */
+#ifdef CONFIG_KALLMODSYMS
+static unsigned long get_builtin_module_idx(unsigned long addr, unsigned long hint_idx)
+{
+ unsigned long low, high, mid;
+
+ if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
+ BUG_ON(!kallsyms_module_addresses);
+ else
+ BUG_ON(!kallsyms_module_offsets);
+
+ /*
+ * Do a binary search on the sorted kallsyms_modules array. The last
+ * entry in this array indicates the end of the text section, not an
+ * object file.
+ */
+ low = 0;
+ high = kallsyms_num_modules - 1;
+
+ if (hint_idx > low && hint_idx < (high - 1) &&
+ addr >= kallsyms_builtin_module_address(hint_idx) &&
+ addr < kallsyms_builtin_module_address(hint_idx + 1))
+ return hint_idx;
+
+ if (addr >= kallsyms_builtin_module_address(low)
+ && addr < kallsyms_builtin_module_address(high)) {
+ while (high - low > 1) {
+ mid = low + (high - low) / 2;
+ if (kallsyms_builtin_module_address(mid) <= addr)
+ low = mid;
+ else
+ high = mid;
+ }
+ return low;
+ }
+
+ return (unsigned long) -1;
+}
+#endif
+
/*
* Lookup an address but don't bother to find any names.
*/
@@ -495,6 +572,8 @@ struct kallsym_iter {
char type;
char name[KSYM_NAME_LEN];
char module_name[MODULE_NAME_LEN];
+ const char *builtin_module_names;
+ unsigned long hint_builtin_module_idx;
int exported;
int show_value;
};
@@ -525,6 +604,8 @@ static int get_ksymbol_mod(struct kallsym_iter *iter)
&iter->value, &iter->type,
iter->name, iter->module_name,
&iter->exported);
+ iter->builtin_module_names = NULL;
+
if (ret < 0) {
iter->pos_mod_end = iter->pos;
return 0;
@@ -544,6 +625,8 @@ static int get_ksymbol_ftrace_mod(struct kallsym_iter *iter)
&iter->value, &iter->type,
iter->name, iter->module_name,
&iter->exported);
+ iter->builtin_module_names = NULL;
+
if (ret < 0) {
iter->pos_ftrace_mod_end = iter->pos;
return 0;
@@ -558,6 +641,7 @@ static int get_ksymbol_bpf(struct kallsym_iter *iter)

strlcpy(iter->module_name, "bpf", MODULE_NAME_LEN);
iter->exported = 0;
+ iter->builtin_module_names = NULL;
ret = bpf_get_kallsym(iter->pos - iter->pos_ftrace_mod_end,
&iter->value, &iter->type,
iter->name);
@@ -578,23 +662,52 @@ static int get_ksymbol_kprobe(struct kallsym_iter *iter)
{
strlcpy(iter->module_name, "__builtin__kprobes", MODULE_NAME_LEN);
iter->exported = 0;
+ iter->builtin_module_names = NULL;
return kprobe_get_kallsym(iter->pos - iter->pos_bpf_end,
&iter->value, &iter->type,
iter->name) < 0 ? 0 : 1;
}

/* Returns space to next name. */
-static unsigned long get_ksymbol_core(struct kallsym_iter *iter)
+static unsigned long get_ksymbol_core(struct kallsym_iter *iter, int kallmodsyms)
{
unsigned off = iter->nameoff;
+ unsigned long mod_idx;

- iter->module_name[0] = '\0';
+ iter->exported = 0;
iter->value = kallsyms_sym_address(iter->pos);

iter->type = kallsyms_get_symbol_type(off);

+ iter->module_name[0] = '\0';
+ iter->builtin_module_names = NULL;
+
off = kallsyms_expand_symbol(off, iter->name, ARRAY_SIZE(iter->name));
+#ifdef CONFIG_KALLMODSYMS
+ if (kallmodsyms) {
+ if (kallsyms_module_offsets)
+ mod_idx =
+ get_builtin_module_idx(iter->value,
+ iter->hint_builtin_module_idx);

+ /*
+ * This is a built-in module iff the tables of built-in modules
+ * (address->module name mappings) and module names are known,
+ * and if the address was found there, and if the corresponding
+ * module index is nonzero. All other cases mean off the end of
+ * the binary or in a non-modular range in between one or more
+ * modules. (Also guard against a corrupt kallsyms_objfiles
+ * array pointing off the end of kallsyms_modules.)
+ */
+ if (kallsyms_modules != NULL && kallsyms_module_names != NULL &&
+ mod_idx != (unsigned long) -1 &&
+ kallsyms_modules[mod_idx] != 0 &&
+ kallsyms_modules[mod_idx] < kallsyms_module_names_len)
+ iter->builtin_module_names =
+ &kallsyms_module_names[kallsyms_modules[mod_idx]];
+ iter->hint_builtin_module_idx = mod_idx;
+ }
+#endif
return off - iter->nameoff;
}

@@ -640,7 +753,7 @@ static int update_iter_mod(struct kallsym_iter *iter, loff_t pos)
}

/* Returns false if pos at or past end of file. */
-static int update_iter(struct kallsym_iter *iter, loff_t pos)
+static int update_iter(struct kallsym_iter *iter, loff_t pos, int kallmodsyms)
{
/* Module symbols can be accessed randomly. */
if (pos >= kallsyms_num_syms)
@@ -650,7 +763,7 @@ static int update_iter(struct kallsym_iter *iter, loff_t pos)
if (pos != iter->pos)
reset_iter(iter, pos);

- iter->nameoff += get_ksymbol_core(iter);
+ iter->nameoff += get_ksymbol_core(iter, kallmodsyms);
iter->pos++;

return 1;
@@ -660,14 +773,14 @@ static void *s_next(struct seq_file *m, void *p, loff_t *pos)
{
(*pos)++;

- if (!update_iter(m->private, *pos))
+ if (!update_iter(m->private, *pos, 0))
return NULL;
return p;
}

static void *s_start(struct seq_file *m, loff_t *pos)
{
- if (!update_iter(m->private, *pos))
+ if (!update_iter(m->private, *pos, 0))
return NULL;
return m->private;
}
@@ -676,7 +789,7 @@ static void s_stop(struct seq_file *m, void *p)
{
}

-static int s_show(struct seq_file *m, void *p)
+static int s_show_internal(struct seq_file *m, void *p, int kallmodsyms)
{
void *value;
struct kallsym_iter *iter = m->private;
@@ -687,23 +800,67 @@ static int s_show(struct seq_file *m, void *p)

value = iter->show_value ? (void *)iter->value : NULL;

- if (iter->module_name[0]) {
+ /*
+ * Real module, or built-in module and /proc/kallsyms being shown.
+ */
+ if (iter->module_name[0] != '\0' ||
+ (iter->builtin_module_names != NULL && kallmodsyms != 0)) {
char type;

/*
- * Label it "global" if it is exported,
- * "local" if not exported.
+ * Label it "global" if it is exported, "local" if not exported.
*/
type = iter->exported ? toupper(iter->type) :
tolower(iter->type);
- seq_printf(m, "%px %c %s\t[%s]\n", value,
- type, iter->name, iter->module_name);
+#ifdef CONFIG_KALLMODSYMS
+ if (kallmodsyms) {
+ /*
+ * /proc/kallmodsyms, built as a module.
+ */
+ if (iter->builtin_module_names == NULL)
+ seq_printf(m, "%px %c %s\t[%s]\n", value,
+ type, iter->name,
+ iter->module_name);
+ /*
+ * /proc/kallmodsyms, single-module symbol.
+ */
+ else if (*iter->builtin_module_names != '\0')
+ seq_printf(m, "%px %c %s\t[%s]\n", value,
+ type, iter->name,
+ iter->builtin_module_names);
+ /*
+ * /proc/kallmodsyms, multimodule symbol. Formatted
+ * as \0MODULE_COUNTmodule-1\0module-2\0, where
+ * MODULE_COUNT is a single byte, 2 or higher.
+ */
+ else {
+ size_t i = *(char *)(iter->builtin_module_names + 1);
+ const char *walk = iter->builtin_module_names + 2;
+
+ seq_printf(m, "%px %c %s\t[%s]", value,
+ type, iter->name, walk);
+
+ while (--i > 0) {
+ walk += strlen(walk) + 1;
+ seq_printf (m, " [%s]", walk);
+ }
+ seq_printf(m, "\n");
+ }
+ } else /* !kallmodsyms */
+#endif /* CONFIG_KALLMODSYMS */
+ seq_printf(m, "%px %c %s\t[%s]\n", value,
+ type, iter->name, iter->module_name);
} else
seq_printf(m, "%px %c %s\n", value,
iter->type, iter->name);
return 0;
}

+static int s_show(struct seq_file *m, void *p)
+{
+ return s_show_internal(m, p, 0);
+}
+
static const struct seq_operations kallsyms_op = {
.start = s_start,
.next = s_next,
@@ -711,6 +868,35 @@ static const struct seq_operations kallsyms_op = {
.show = s_show
};

+#ifdef CONFIG_KALLMODSYMS
+static int s_mod_show(struct seq_file *m, void *p)
+{
+ return s_show_internal(m, p, 1);
+}
+static void *s_mod_next(struct seq_file *m, void *p, loff_t *pos)
+{
+ (*pos)++;
+
+ if (!update_iter(m->private, *pos, 1))
+ return NULL;
+ return p;
+}
+
+static void *s_mod_start(struct seq_file *m, loff_t *pos)
+{
+ if (!update_iter(m->private, *pos, 1))
+ return NULL;
+ return m->private;
+}
+
+static const struct seq_operations kallmodsyms_op = {
+ .start = s_mod_start,
+ .next = s_mod_next,
+ .stop = s_stop,
+ .show = s_mod_show
+};
+#endif
+
static inline int kallsyms_for_perf(void)
{
#ifdef CONFIG_PERF_EVENTS
@@ -746,7 +932,8 @@ bool kallsyms_show_value(const struct cred *cred)
}
}

-static int kallsyms_open(struct inode *inode, struct file *file)
+static int kallsyms_open_internal(struct inode *inode, struct file *file,
+ const struct seq_operations *ops)
{
/*
* We keep iterator in m->private, since normal case is to
@@ -754,7 +941,7 @@ static int kallsyms_open(struct inode *inode, struct file *file)
* using get_symbol_offset for every symbol.
*/
struct kallsym_iter *iter;
- iter = __seq_open_private(file, &kallsyms_op, sizeof(*iter));
+ iter = __seq_open_private(file, ops, sizeof(*iter));
if (!iter)
return -ENOMEM;
reset_iter(iter, 0);
@@ -767,6 +954,18 @@ static int kallsyms_open(struct inode *inode, struct file *file)
return 0;
}

+static int kallsyms_open(struct inode *inode, struct file *file)
+{
+ return kallsyms_open_internal(inode, file, &kallsyms_op);
+}
+
+#ifdef CONFIG_KALLMODSYMS
+static int kallmodsyms_open(struct inode *inode, struct file *file)
+{
+ return kallsyms_open_internal(inode, file, &kallmodsyms_op);
+}
+#endif
+
#ifdef CONFIG_KGDB_KDB
const char *kdb_walk_kallsyms(loff_t *pos)
{
@@ -777,7 +976,7 @@ const char *kdb_walk_kallsyms(loff_t *pos)
reset_iter(&kdb_walk_kallsyms_iter, 0);
}
while (1) {
- if (!update_iter(&kdb_walk_kallsyms_iter, *pos))
+ if (!update_iter(&kdb_walk_kallsyms_iter, *pos, 0))
return NULL;
++*pos;
/* Some debugging symbols have no name. Ignore them. */
@@ -794,9 +993,21 @@ static const struct proc_ops kallsyms_proc_ops = {
.proc_release = seq_release_private,
};

+#ifdef CONFIG_KALLMODSYMS
+static const struct proc_ops kallmodsyms_proc_ops = {
+ .proc_open = kallmodsyms_open,
+ .proc_read = seq_read,
+ .proc_lseek = seq_lseek,
+ .proc_release = seq_release_private,
+};
+#endif
+
static int __init kallsyms_init(void)
{
proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
+#ifdef CONFIG_KALLMODSYMS
+ proc_create("kallmodsyms", 0444, NULL, &kallmodsyms_proc_ops);
+#endif
return 0;
}
device_initcall(kallsyms_init);
--
2.32.0.255.gd9b1d14a2a