[PATCH 5.15 01/20] sysctl: add a new register_sysctl_init() interface
From: Eric Biggers
Date: Tue Jan 24 2023 - 13:52:09 EST
From: Xiaoming Ni <nixiaoming@xxxxxxxxxx>
commit 3ddd9a808cee7284931312f2f3e854c9617f44b2 upstream.
Patch series "sysctl: first set of kernel/sysctl cleanups", v2.
Finally had time to respin the series of the work we had started last
year on cleaning up the kernel/sysct.c kitchen sink. People keeps
stuffing their sysctls in that file and this creates a maintenance
burden. So this effort is aimed at placing sysctls where they actually
belong.
I'm going to split patches up into series as there is quite a bit of
work.
This first set adds register_sysctl_init() for uses of registerting a
sysctl on the init path, adds const where missing to a few places,
generalizes common values so to be more easy to share, and starts the
move of a few kernel/sysctl.c out where they belong.
The majority of rework on v2 in this first patch set is 0-day fixes.
Eric Biederman's feedback is later addressed in subsequent patch sets.
I'll only post the first two patch sets for now. We can address the
rest once the first two patch sets get completely reviewed / Acked.
This patch (of 9):
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.
Today though folks heavily rely on tables on kernel/sysctl.c so they can
easily just extend this table with their needed sysctls. In order to
help users move their sysctls out we need to provide a helper which can
be used during code initialization.
We special-case the initialization use of register_sysctl() since it
*is* safe to fail, given all that sysctls do is provide a dynamic
interface to query or modify at runtime an existing variable. So the
use case of register_sysctl() on init should *not* stop if the sysctls
don't end up getting registered. It would be counter productive to stop
boot if a simple sysctl registration failed.
Provide a helper for init then, and document the recommended init levels
to use for callers of this routine. We will later use this in
subsequent patches to start slimming down kernel/sysctl.c tables and
moving sysctl registration to the code which actually needs these
sysctls.
[mcgrof@xxxxxxxxxx: major commit log and documentation rephrasing also moved to fs/proc/proc_sysctl.c ]
Link: https://lkml.kernel.org/r/20211123202347.818157-1-mcgrof@xxxxxxxxxx
Link: https://lkml.kernel.org/r/20211123202347.818157-2-mcgrof@xxxxxxxxxx
Signed-off-by: Xiaoming Ni <nixiaoming@xxxxxxxxxx>
Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>
Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>
Cc: Iurii Zaikin <yzaikin@xxxxxxxxxx>
Cc: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Cc: Paul Turner <pjt@xxxxxxxxxx>
Cc: Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx>
Cc: Sebastian Reichel <sre@xxxxxxxxxx>
Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Cc: Petr Mladek <pmladek@xxxxxxxx>
Cc: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx>
Cc: Qing Wang <wangqing@xxxxxxxx>
Cc: Benjamin LaHaise <bcrl@xxxxxxxxx>
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Jan Kara <jack@xxxxxxx>
Cc: Amir Goldstein <amir73il@xxxxxxxxx>
Cc: Stephen Kitt <steve@xxxxxxx>
Cc: Antti Palosaari <crope@xxxxxx>
Cc: Arnd Bergmann <arnd@xxxxxxxx>
Cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
Cc: Clemens Ladisch <clemens@xxxxxxxxxx>
Cc: David Airlie <airlied@xxxxxxxx>
Cc: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx>
Cc: Joel Becker <jlbec@xxxxxxxxxxxx>
Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
Cc: Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx>
Cc: Julia Lawall <julia.lawall@xxxxxxxx>
Cc: Lukas Middendorf <kernel@xxxxxxxxxxx>
Cc: Mark Fasheh <mark@xxxxxxxxxx>
Cc: Phillip Potter <phil@xxxxxxxxxxxxxxxx>
Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx>
Cc: Douglas Gilbert <dgilbert@xxxxxxxxxxxx>
Cc: James E.J. Bottomley <jejb@xxxxxxxxxxxxx>
Cc: Jani Nikula <jani.nikula@xxxxxxxxx>
Cc: John Ogness <john.ogness@xxxxxxxxxxxxx>
Cc: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx>
Cc: Steven Rostedt (VMware) <rostedt@xxxxxxxxxxx>
Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
Cc: "Theodore Ts'o" <tytso@xxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Eric Biggers <ebiggers@xxxxxxxxxx>
---
fs/proc/proc_sysctl.c | 33 +++++++++++++++++++++++++++++++++
include/linux/sysctl.h | 3 +++
2 files changed, 36 insertions(+)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 013fc5931bc37..0b7a00ed6c49b 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -16,6 +16,7 @@
#include <linux/module.h>
#include <linux/bpf-cgroup.h>
#include <linux/mount.h>
+#include <linux/kmemleak.h>
#include "internal.h"
static const struct dentry_operations proc_sys_dentry_operations;
@@ -1384,6 +1385,38 @@ struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *tab
}
EXPORT_SYMBOL(register_sysctl);
+/**
+ * __register_sysctl_init() - register sysctl table to path
+ * @path: path name for sysctl base
+ * @table: This is the sysctl table that needs to be registered to the path
+ * @table_name: The name of sysctl table, only used for log printing when
+ * registration fails
+ *
+ * The sysctl interface is used by userspace to query or modify at runtime
+ * a predefined value set on a variable. These variables however have default
+ * values pre-set. Code which depends on these variables will always work even
+ * if register_sysctl() fails. If register_sysctl() fails you'd just loose the
+ * ability to query or modify the sysctls dynamically at run time. Chances of
+ * register_sysctl() failing on init are extremely low, and so for both reasons
+ * this function does not return any error as it is used by initialization code.
+ *
+ * Context: Can only be called after your respective sysctl base path has been
+ * registered. So for instance, most base directories are registered early on
+ * init before init levels are processed through proc_sys_init() and
+ * sysctl_init().
+ */
+void __init __register_sysctl_init(const char *path, struct ctl_table *table,
+ const char *table_name)
+{
+ struct ctl_table_header *hdr = register_sysctl(path, table);
+
+ if (unlikely(!hdr)) {
+ pr_err("failed when register_sysctl %s to %s\n", table_name, path);
+ return;
+ }
+ kmemleak_not_leak(hdr);
+}
+
static char *append_path(const char *path, char *pos, const char *name)
{
int namelen;
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index fa372b4c23132..47cf70c8eb93c 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -206,6 +206,9 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
void unregister_sysctl_table(struct ctl_table_header * table);
extern int sysctl_init(void);
+extern void __register_sysctl_init(const char *path, struct ctl_table *table,
+ const char *table_name);
+#define register_sysctl_init(path, table) __register_sysctl_init(path, table, #table)
void do_sysctl_args(void);
extern int pwrsw_enabled;
--
2.39.1