Re: Linux 5.18-rc4

From: John Johansen
Date: Mon Jun 06 2022 - 15:20:14 EST


On 6/6/22 11:28, Linus Torvalds wrote:
> On Mon, Jun 6, 2022 at 8:19 AM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>> Has anyone looked into this lock ordering issues?
>
> The deadlock is
>
>>>> [78140.503821] CPU0 CPU1
>>>> [78140.503823] ---- ----
>>>> [78140.503824] lock(&newf->file_lock);
>>>> [78140.503826] lock(&p->alloc_lock);
>>>> [78140.503828] lock(&newf->file_lock);
>>>> [78140.503830] lock(&ctx->lock);
>
> and the alloc_lock -> file_lock on CPU1 is trivial - it's seq_show()
> in fs/proc/fd.c:
>
> task_lock(task);
> files = task->files;
> if (files) {
> unsigned int fd = proc_fd(m->private);
>
> spin_lock(&files->file_lock);
>
> and that looks all normal.
>
> But the other chains look painful.
>
> I do see the IPC code doing ugly things, in particular I detest this code:
>
> task_lock(current);
> list_add(&shp->shm_clist, &current->sysvshm.shm_clist);
> task_unlock(current);
>
> where it is using the task lock to protect the shm_clist list. Nasty.
>
> And it's doing that inside the shm_ids.rwsem lock _and_ inside the
> shp->shm_perm.lock.
>
> So the IPC code has newseg() doing
>
> shmget ->
> ipcget():
> down_write(ids->rwsem) ->
> newseg():
> ipc_addid gets perm->lock
> task_lock(current)
>
> so you have
>
> ids->rwsem -> perm->lock -> alloc_lock
>
> there.
>
> So now we have that
>
> ids->rwsem -> ipcperm->lock -> alloc_lock -> file_lock
>
> when you put those sequences together.
>
> But I didn't figure out what the security subsystem angle is and how
> that then apparently mixes things up with execve.
>
> Yes, newseg() is doing that
>
> error = security_shm_alloc(&shp->shm_perm);
>
> while holding rwsem, but I can't see how that matters. From the
> lockdep output, rwsem doesn't actually seem to be part of the whole
> sequence.
>
> It *looks* like we have
>
> apparmour ctx->lock -->
> radix_tree_preloads.lock -->
> ipcperm->lock
>
> and apparently that's called under the file_lock somewhere, completing
> the circle.
>
> I guess the execve component is that
>
> begin_new_exec ->
> security_bprm_committing_creds ->
> apparmor_bprm_committing_creds ->
> aa_inherit_files ->
> iterate_fd -> *takes file_lock*
> match_file ->
> aa_file_perm ->
> update_file_ctx *takes ctx->lock*
>
> so that's how you get file_lock -> ctx->lock.
>
yes

> So you have:
>
> SHMGET:
> ipcperm->lock -> alloc_lock
> /proc:
> alloc_lock -> file_lock
> apparmor_bprm_committing_creds:
> file_lock -> ctx->lock
>
> and then all you need is ctx->lock -> ipcperm->lock but I didn't find that part.
>
yeah that is the part I got stuck on, before being pulled away from this

> I suspect that part is that both Apparmor and IPC use the idr local lock.
>
bingo,

apparmor moved its secids allocation from a custom radix tree to idr in

99cc45e48678 apparmor: Use an IDR to allocate apparmor secids

and ipc is using the idr for its id allocation as well

I can easily lift the secid() allocation out of the ctx->lock but that
would still leave it happening under the file_lock and not fix the problem.
I think the quick solution would be for apparmor to stop using idr, reverting
back at least temporarily to the custom radix tree.