Re: [PATCH 6/8] ucounts: Handle inc_rlimit_ucounts wrapping in fork

From: Eric W. Biederman
Date: Fri Feb 11 2022 - 12:51:03 EST


Alexey Gladkov <legion@xxxxxxxxxx> writes:

> On Thu, Feb 10, 2022 at 08:13:22PM -0600, Eric W. Biederman wrote:
>> Move inc_rlimit_ucounts from copy_creds into copy_process immediately
>> after copy_creds where it can be called exactly once. Test for and
>> handle it when inc_rlimit_ucounts returns LONG_MAX indicating the
>> count has wrapped.
>>
>> This is good hygenine and fixes a theoretical bug. In practice
>> PID_MAX_LIMIT is at most 2^22 so there is not a chance the number of
>> processes would ever wrap even on an architecture with a 32bit long.
>>
>> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>> ---
>> kernel/cred.c | 2 --
>> kernel/fork.c | 2 ++
>> 2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/cred.c b/kernel/cred.c
>> index 229cff081167..96d5fd6ff26f 100644
>> --- a/kernel/cred.c
>> +++ b/kernel/cred.c
>> @@ -358,7 +358,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
>> kdebug("share_creds(%p{%d,%d})",
>> p->cred, atomic_read(&p->cred->usage),
>> read_cred_subscribers(p->cred));
>> - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>> return 0;
>> }
>>
>> @@ -395,7 +394,6 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags)
>> #endif
>>
>> p->cred = p->real_cred = get_cred(new);
>> - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>> alter_cred_subscribers(new, 2);
>> validate_creds(new);
>> return 0;
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index 6f62d37f3650..69333078259c 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -2026,6 +2026,8 @@ static __latent_entropy struct task_struct *copy_process(
>> goto bad_fork_free;
>>
>> retval = -EAGAIN;
>> + if (inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1) == LONG_MAX)
>> + goto bad_fork_cleanup_count;
>> if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
>> if ((task_ucounts(p) != &init_ucounts) &&
>> !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
>
> It might make sense to do something like:
>
> if (inc_rlimit_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1, rlimit(RLIMIT_NPROC)) == LONG_MAX) {
> if ((task_ucounts(p) != &init_ucounts) &&
> !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
>
> and the new function:
>
> long inc_rlimit_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, long v, unsigned long rlimit)
> {
> struct ucounts *iter;
> long ret = 0;
> long max = rlimit;
> if (rlimit > LONG_MAX)
> max = LONG_MAX;
> for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> long new = atomic_long_add_return(v, &iter->ucount[type]);
> if (new < 0 || new > max)
> ret = LONG_MAX;
> else if (iter == ucounts)
> ret = new;
> max = READ_ONCE(iter->ns->ucount_max[type]);
> }
> return ret;
> }
>
> This will avoid double checking the same userns tree.
>
> Or even modify inc_rlimit_ucounts. This function is used elsewhere like
> this:
>
>
> msgqueue = inc_rlimit_ucounts(info->ucounts, UCOUNT_RLIMIT_MSGQUEUE, mq_bytes);
> if (msgqueue == LONG_MAX || msgqueue > rlimit(RLIMIT_MSGQUEUE)) {
>
>
> memlock = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked);
> if (!allowed && (memlock == LONG_MAX || memlock > lock_limit) && !capable(CAP_IPC_LOCK)) {
>
>
> In all cases, we have max value for comparison.

Good point. The downside is that it means we can't use the same code
in exec. The upside is that the code is more idiomatic.

Eric