Re: Linux 5.18-rc4

From: Linus Torvalds
Date: Mon Jun 06 2022 - 14:28:51 EST


On Mon, Jun 6, 2022 at 8:19 AM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
> Has anyone looked into this lock ordering issues?

The deadlock is

> >> [78140.503821] CPU0 CPU1
> >> [78140.503823] ---- ----
> >> [78140.503824] lock(&newf->file_lock);
> >> [78140.503826] lock(&p->alloc_lock);
> >> [78140.503828] lock(&newf->file_lock);
> >> [78140.503830] lock(&ctx->lock);

and the alloc_lock -> file_lock on CPU1 is trivial - it's seq_show()
in fs/proc/fd.c:

task_lock(task);
files = task->files;
if (files) {
unsigned int fd = proc_fd(m->private);

spin_lock(&files->file_lock);

and that looks all normal.

But the other chains look painful.

I do see the IPC code doing ugly things, in particular I detest this code:

task_lock(current);
list_add(&shp->shm_clist, &current->sysvshm.shm_clist);
task_unlock(current);

where it is using the task lock to protect the shm_clist list. Nasty.

And it's doing that inside the shm_ids.rwsem lock _and_ inside the
shp->shm_perm.lock.

So the IPC code has newseg() doing

shmget ->
ipcget():
down_write(ids->rwsem) ->
newseg():
ipc_addid gets perm->lock
task_lock(current)

so you have

ids->rwsem -> perm->lock -> alloc_lock

there.

So now we have that

ids->rwsem -> ipcperm->lock -> alloc_lock -> file_lock

when you put those sequences together.

But I didn't figure out what the security subsystem angle is and how
that then apparently mixes things up with execve.

Yes, newseg() is doing that

error = security_shm_alloc(&shp->shm_perm);

while holding rwsem, but I can't see how that matters. From the
lockdep output, rwsem doesn't actually seem to be part of the whole
sequence.

It *looks* like we have

apparmour ctx->lock -->
radix_tree_preloads.lock -->
ipcperm->lock

and apparently that's called under the file_lock somewhere, completing
the circle.

I guess the execve component is that

begin_new_exec ->
security_bprm_committing_creds ->
apparmor_bprm_committing_creds ->
aa_inherit_files ->
iterate_fd -> *takes file_lock*
match_file ->
aa_file_perm ->
update_file_ctx *takes ctx->lock*

so that's how you get file_lock -> ctx->lock.

So you have:

SHMGET:
ipcperm->lock -> alloc_lock
/proc:
alloc_lock -> file_lock
apparmor_bprm_committing_creds:
file_lock -> ctx->lock

and then all you need is ctx->lock -> ipcperm->lock but I didn't find that part.

I suspect that part is that both Apparmor and IPC use the idr local lock.

Linus