Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls

From: Peter Zijlstra
Date: Wed Nov 24 2021 - 15:08:44 EST


On Mon, Nov 22, 2021 at 01:13:24PM -0800, Peter Oskolkov wrote:
> +/**
> + * struct umcg_task - controls the state of UMCG tasks.
> + *
> + * The struct is aligned at 64 bytes to ensure that it fits into
> + * a single cache line.
> + */
> +struct umcg_task {
> + /**
> + * @state_ts: the current state of the UMCG task described by
> + * this struct, with a unique timestamp indicating
> + * when the last state change happened.
> + *
> + * Readable/writable by both the kernel and the userspace.
> + *
> + * UMCG task state:
> + * bits 0 - 5: task state;
> + * bits 6 - 7: state flags;
> + * bits 8 - 12: reserved; must be zeroes;
> + * bits 13 - 17: for userspace use;
> + * bits 18 - 63: timestamp (see below).
> + *
> + * Timestamp: a 46-bit CLOCK_MONOTONIC timestamp, at 16ns resolution.
> + * See Documentation/userspace-api/umcg.txt for detals.
> + */
> + __u64 state_ts; /* r/w */
> +
> + /**
> + * @next_tid: the TID of the UMCG task that should be context-switched
> + * into in sys_umcg_wait(). Can be zero.
> + *
> + * Running UMCG workers must have next_tid set to point to IDLE
> + * UMCG servers.
> + *
> + * Read-only for the kernel, read/write for the userspace.
> + */
> + __u32 next_tid; /* r */
> +
> + __u32 flags; /* Reserved; must be zero. */
> +
> + /**
> + * @idle_workers_ptr: a single-linked list of idle workers. Can be NULL.
> + *
> + * Readable/writable by both the kernel and the userspace: the
> + * kernel adds items to the list, the userspace removes them.
> + */
> + __u64 idle_workers_ptr; /* r/w */
> +
> + /**
> + * @idle_server_tid_ptr: a pointer pointing to a single idle server.
> + * Readonly.
> + */
> + __u64 idle_server_tid_ptr; /* r */
> +} __attribute__((packed, aligned(8 * sizeof(__u64))));

The thing is; I really don't see how this is supposed to be used. Where
did the blocked and runnable list go ?

I also don't see why the kernel cares about idle workers at all; that
seems something userspace can sort itself just fine.

The whole next_tid thing seems confused too, how can it be the next task
when it must be the server? Also, what if there isn't an idle server?

This just all isn't making any sense to me.