Re: [PATCH 0/2] shm: omit forced shm destroy if task IPC namespace was changed

From: Alexander Mihalicyn
Date: Sun Jul 11 2021 - 06:33:38 EST


Hi, Manfred,

On Sun, Jul 11, 2021 at 12:13 PM Manfred Spraul
<manfred@xxxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
>
> Am Samstag, 10. Juli 2021 schrieb Alexander Mihalicyn <alexander@xxxxxxxxxxxxx>:
>>
>>
>> Now, using setns() syscall, we can construct situation when on
>> task->sysvshm.shm_clist list
>> we have shm items from several (!) IPC namespaces.
>>
>>
> Does this imply that locking ist affected as well? According to the initial patch, accesses to shm_clist are protected by "the" IPC shm namespace rwsem. This can't work if the list contains objects from several namespaces.

Of course, you are right. I've to rework this part -> I can add check into
static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
function and before adding new shm into task list check that list is empty OR
an item which is present on the list from the same namespace as
current->nsproxy->ipc_ns.

>
> From ipc/shm.c:
>
> 397 down_read(&shm_ids(ns).rwsem);
> 398 list_for_each_entry(shp, &task->sysvshm.shm_clist, shm_clist)
> 399 shp->shm_creator = NULL;
> 400 /*
> 401 * Only under read lock but we are only called on current
> 402 * so no entry on the list will be shared.
> 403 */
> 404 list_del(&task->sysvshm.shm_clist);
> 405 up_read(&shm_ids(ns).rwsem);
>
>
> Task A and B in same namespace
>
> - A: create shm object
>
> - A: setns()
>
> - in parallel B) does shmctl(IPC_RMID), A) does exit()

Yep.

>
>
>>
>>
>> So, semantics of setns() and unshare() is different here. We can fix
>> this problem by adding
>> analogical calls to exit_shm(), shm_init_task() into
>>
>> static void commit_nsset(struct nsset *nsset)
>> ...
>> #ifdef CONFIG_IPC_NS
>> if (flags & CLONE_NEWIPC) {
>> exit_sem(me);
>> + shm_init_task(current);
>> + exit_shm(current);
>> }
>> #endif
>>
>> with this change semantics of unshare() and setns() will be equal in
>> terms of the shm_rmid_forced
>> feature.
>
> Additional advantage: exit_sem() and exit_shm() would appear as pairs, both in unshare () and setns().
>
>> But this may break some applications which using setns() and
>> IPC resources not destroying
>> after that syscall. (CRIU using setns() extensively and we have to
>> change our IPC ns C/R implementation
>> a little bit if we take this way of fixing the problem).
>>
>> I've proposed a change which keeps the old behaviour of setns() but
>> fixes double free.
>>
> Assuming that locking works, I would consider this as a namespace design question: Do we want to support that a task contains shm objects from several ipc namespaces?

This depends on what we mean by "task contains shm objects from
several ipc namespaces". There are two meanings:

1. Task has attached shm object from different ipc namespaces

We already support that by design. When we doing a change of namespace
using unshare(CLONE_NEWIPC) even with
sysctl shm_rmid_forced=1 we not detach all ipc's from task! Let see on
shm_exit() functio which is validly called
when we doing unshare():

if (shm_may_destroy(ns, shp)) { <--- (shp->shm_nattch == 0) &&
(ns->shm_rmid_forced || (shp->shm_perm.mode & SHM_DEST))
shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
}

here all depends on shp->shm_nattch which will be non-zero if used
doing something like this:

int id = shmget(0xAAAA, 4096, IPC_CREAT|0700);
void *addr = shmat(id, NULL, 0); // <-- map shm to the task address space
unshare(CLONE_NEWIPC); // <--- task->sysvshm.shm_clist is cleared! But
shm 0xAAAA remains attached
id = shmget(0xBBBB, 4096, IPC_CREAT|0700); // <-- add item to the
task->sysvshm.shm_clist now it contains object only from new IPC
namespace
addr = shmat(id, NULL, 0);

So, this task->sysvshm.shm_clist list used only for shm_rmid_forced
feature. It doesn't affect any mm-related things like /proc/<pid>/maps
or something similar.

2. Task task->sysvshm.shm_clist list has items from different IPC namespaces.

I'm not sure, do we need that or not. But I'm ready to prepare a patch
for any of the options which we choose:
a) just add exit_shm(current)+shm_init_task(current);
b) prepare PATCHv2 with appropriate check in the newseg() to prevent
adding new items from different namespace to the list
c) rework algorithm so we can safely have items from different
namespaces in task->sysvshm.shm_clist

and, of course, prepare a test case with this particular bug
reproducer to prevent future degradations and increase coverage.
(I'm not publishing the reproducer program directly on the list at the
moment because it may be not fully safe.
But I think any of us already knows how to trigger the problem.)

>
> Does it work everywhere (/proc/{pid}, ...)?
> --
> Manfred

Thanks,
Alex