Re: [PATCH 6/8] ucounts: Handle inc_rlimit_ucounts wrapping in fork

From: Shuah Khan
Date: Fri Feb 11 2022 - 13:32:12 EST


On 2/11/22 10:50 AM, Eric W. Biederman wrote:
Alexey Gladkov <legion@xxxxxxxxxx> writes:

On Thu, Feb 10, 2022 at 08:13:22PM -0600, Eric W. Biederman wrote:
Move inc_rlimit_ucounts from copy_creds into copy_process immediately
after copy_creds where it can be called exactly once. Test for and
handle it when inc_rlimit_ucounts returns LONG_MAX indicating the
count has wrapped.

This is good hygenine and fixes a theoretical bug. In practice
PID_MAX_LIMIT is at most 2^22 so there is not a chance the number of
processes would ever wrap even on an architecture with a 32bit long.

Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
---
kernel/cred.c | 2 --
kernel/fork.c | 2 ++
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/cred.c b/kernel/cred.c
index 229cff081167..96d5fd6ff26f 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -358,7 +358,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
kdebug("share_creds(%p{%d,%d})",
p->cred, atomic_read(&p->cred->usage),
read_cred_subscribers(p->cred));
- inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
return 0;
}
@@ -395,7 +394,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
#endif
p->cred = p->real_cred = get_cred(new);
- inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
alter_cred_subscribers(new, 2);
validate_creds(new);
return 0;
diff --git a/kernel/fork.c b/kernel/fork.c
index 6f62d37f3650..69333078259c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2026,6 +2026,8 @@ static __latent_entropy struct task_struct *copy_process(
goto bad_fork_free;
retval = -EAGAIN;
+ if (inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1) == LONG_MAX)
+ goto bad_fork_cleanup_count;
if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
if ((task_ucounts(p) != &init_ucounts) &&
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))

It might make sense to do something like:

if (inc_rlimit_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1, rlimit(RLIMIT_NPROC)) == LONG_MAX) {
if ((task_ucounts(p) != &init_ucounts) &&
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))

and the new function:

long inc_rlimit_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, long v, unsigned long rlimit)
{
struct ucounts *iter;
long ret = 0;
long max = rlimit;
if (rlimit > LONG_MAX)
max = LONG_MAX;
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
long new = atomic_long_add_return(v, &iter->ucount[type]);
if (new < 0 || new > max)
ret = LONG_MAX;
else if (iter == ucounts)
ret = new;
max = READ_ONCE(iter->ns->ucount_max[type]);
}
return ret;
}

This will avoid double checking the same userns tree.

Or even modify inc_rlimit_ucounts. This function is used elsewhere like
this:


msgqueue = inc_rlimit_ucounts(info->ucounts, UCOUNT_RLIMIT_MSGQUEUE, mq_bytes);
if (msgqueue == LONG_MAX || msgqueue > rlimit(RLIMIT_MSGQUEUE)) {


memlock = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);
if (!allowed && (memlock == LONG_MAX || memlock > lock_limit) && !capable(CAP_IPC_LOCK)) {


In all cases, we have max value for comparison.

Good point. The downside is that it means we can't use the same code
in exec. The upside is that the code is more idiomatic.


Checking on this a bit more on other callers of inc_rlimit_ucounts(),
we might have another issue:

1. mqueue_get_inode() does:
spin_lock(&mq_lock);
msgqueue = inc_rlimit_ucounts(info->ucounts, UCOUNT_RLIMIT_MSGQUEUE, mq_bytes);
if (msgqueue == LONG_MAX || msgqueue > rlimit(RLIMIT_MSGQUEUE)) {
dec_rlimit_ucounts(info->ucounts, UCOUNT_RLIMIT_MSGQUEUE, mq_bytes);
spin_unlock(&mq_lock);


2. user_shm_lock() &shmlock_user_lock
spin_lock(&shmlock_user_lock);
memlock = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);

if (!allowed && (memlock == LONG_MAX || memlock > lock_limit) && !capable(CAP_IPC_LOCK)) {
dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);

3. user_namespace_sysctl_init() Doesn't check for max value.

4. copy_creds() doesn't check for max value in its 3 calls to inc_rlimit_ucounts()

You can see that each of these instances some callers dec_rlimit_ucounts().
They hold different locks. So do we have a window where LONG_MAX could
overflow and go unnoticed?

thanks,
-- Shuah