Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls

From: Peter Zijlstra
Date: Wed Nov 24 2021 - 16:32:42 EST


On Wed, Nov 24, 2021 at 09:08:23PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 22, 2021 at 01:13:24PM -0800, Peter Oskolkov wrote:
> > +/**
> > + * struct umcg_task - controls the state of UMCG tasks.
> > + *
> > + * The struct is aligned at 64 bytes to ensure that it fits into
> > + * a single cache line.
> > + */
> > +struct umcg_task {
> > + /**
> > + * @state_ts: the current state of the UMCG task described by
> > + * this struct, with a unique timestamp indicating
> > + * when the last state change happened.
> > + *
> > + * Readable/writable by both the kernel and the userspace.
> > + *
> > + * UMCG task state:
> > + * bits 0 - 5: task state;
> > + * bits 6 - 7: state flags;
> > + * bits 8 - 12: reserved; must be zeroes;
> > + * bits 13 - 17: for userspace use;
> > + * bits 18 - 63: timestamp (see below).
> > + *
> > + * Timestamp: a 46-bit CLOCK_MONOTONIC timestamp, at 16ns resolution.
> > + * See Documentation/userspace-api/umcg.txt for detals.
> > + */
> > + __u64 state_ts; /* r/w */
> > +
> > + /**
> > + * @next_tid: the TID of the UMCG task that should be context-switched
> > + * into in sys_umcg_wait(). Can be zero.
> > + *
> > + * Running UMCG workers must have next_tid set to point to IDLE
> > + * UMCG servers.
> > + *
> > + * Read-only for the kernel, read/write for the userspace.
> > + */
> > + __u32 next_tid; /* r */
> > +
> > + __u32 flags; /* Reserved; must be zero. */
> > +
> > + /**
> > + * @idle_workers_ptr: a single-linked list of idle workers. Can be NULL.
> > + *
> > + * Readable/writable by both the kernel and the userspace: the
> > + * kernel adds items to the list, the userspace removes them.
> > + */
> > + __u64 idle_workers_ptr; /* r/w */
> > +
> > + /**
> > + * @idle_server_tid_ptr: a pointer pointing to a single idle server.
> > + * Readonly.
> > + */
> > + __u64 idle_server_tid_ptr; /* r */
> > +} __attribute__((packed, aligned(8 * sizeof(__u64))));
>
> The thing is; I really don't see how this is supposed to be used. Where
> did the blocked and runnable list go ?
>
> I also don't see why the kernel cares about idle workers at all; that
> seems something userspace can sort itself just fine.
>
> The whole next_tid thing seems confused too, how can it be the next task
> when it must be the server? Also, what if there isn't an idle server?
>
> This just all isn't making any sense to me.

Oooh, someone made things super confusing by doing s/runnable/idle/ on
the whole thing :-( That only took me most of the day to figure out.
Naming is important, don't mess about with stuff like this.