Re: [patch V3a 17/18] posix-timers: Provide a mechanism to allocate a given timer ID

From: Frederic Weisbecker
Date: Tue Mar 11 2025 - 18:33:07 EST


Le Tue, Mar 11, 2025 at 11:07:44PM +0100, Thomas Gleixner a écrit :
> Checkpoint/Restore in Userspace (CRIU) requires to reconstruct posix timers
> with the same timer ID on restore. It uses sys_timer_create() and relies on
> the monotonic increasing timer ID provided by this syscall. It creates and
> deletes timers until the desired ID is reached. This is can loop for a long
> time, when the checkpointed process had a very sparse timer ID range.
>
> It has been debated to implement a new syscall to allow the creation of
> timers with a given timer ID, but that's tideous due to the 32/64bit compat
> issues of sigevent_t and of dubious value.
>
> The restore mechanism of CRIU creates the timers in a state where all
> threads of the restored process are held on a barrier and cannot issue
> syscalls. That means the restorer task has exclusive control.
>
> This allows to address this issue with a prctl() so that the restorer
> thread can do:
>
> if (prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_ON))
> goto linear_mode;
> create_timers_with_explicit_ids();
> prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_OFF);
>
> This is backwards compatible because the prctl() fails on older kernels and
> CRIU can fall back to the linear timer ID mechanism. CRIU versions which do
> not know about the prctl() just work as before.
>
> Implement the prctl() and modify timer_create() so that it copies the
> requested timer ID from userspace by utilizing the existing timer_t
> pointer, which is used to copy out the allocated timer ID on success.
>
> If the prctl() is disabled, which it is by default, timer_create() works as
> before and does not try to read from the userspace pointer.
>
> There is no problem when a broken or rogue user space application enables
> the prctl(). If the user space pointer does not contain a valid ID, then
> timer_create() fails. If the data is not initialized, but constains a
> random valid ID, timer_create() will create that random timer ID or fail if
> the ID is already given out.
>
> As CRIU must use the raw syscall to avoid manipulating the internal state
> of the restored process, this has no library dependencies and can be
> adopted by CRIU right away.
>
> Recreating two timers with IDs 1000000 and 2000000 takes 1.5 seconds with
> the create/delete method. With the prctl() it takes 3 microseconds.
>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

Reviewed-by: Frederic Weisbecker <frederic@xxxxxxxxxx>