Re: rseq + membarrier programming model

From: Jann Horn
Date: Mon Dec 13 2021 - 14:28:15 EST


On Mon, Dec 13, 2021 at 7:48 PM Florian Weimer <fweimer@xxxxxxxxxx> wrote:
> I've been studying Jann Horn's biased locking example:
>
> Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space
> <https://lore.kernel.org/linux-api/CAG48ez02UDn_yeLuLF4c=kX0=h2Qq8Fdb0cer1yN8atbXSNjkQ@xxxxxxxxxxxxxx/>
>
> It uses MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ as part of the biased lock
> revocation.
>
> How does the this code know that the process has called
> MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ? Could it fall back to
> MEMBARRIER_CMD_GLOBAL instead?

AFAIK no - MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ specifically
forces targeted processes to go through an RSEQ preemption. That only
happens when this special membarrier command is used and when an
actual task switch happens; other membarrier flavors don't guarantee
that.


Also, MEMBARRIER_CMD_GLOBAL can take really long in terms of wall
clock time - it's basically just synchronize_rcu(), and as the
documentation at
https://www.kernel.org/doc/html/latest/RCU/Design/Requirements/Requirements.html
says:

"The synchronize_rcu() grace-period-wait primitive is optimized for
throughput. It may therefore incur several milliseconds of latency in
addition to the duration of the longest RCU read-side critical
section."


You can see that synchronize_rcu() indeed takes quite long in terms of
wall clock time (but not in terms of CPU time - as the documentation
says, it's optimized for throughput in a parallel context) with a
simple test program:

jannh@laptop:~/test/rcu$ cat rcu_membarrier.c
#define _GNU_SOURCE
#include <stdio.h>
#include <linux/membarrier.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <time.h>
#include <err.h>

int main(void) {
for (int i=0; i<20; i++) {
struct timespec ts1;
if (clock_gettime(CLOCK_MONOTONIC, &ts1))
err(1, "time");

if (syscall(__NR_membarrier, MEMBARRIER_CMD_GLOBAL, 0, 0))
err(1, "membarrier");

struct timespec ts2;
if (clock_gettime(CLOCK_MONOTONIC, &ts2))
err(1, "time");

unsigned long delta_ns = (ts2.tv_nsec - ts1.tv_nsec) +
(1000UL*1000*1000) * (ts2.tv_sec - ts1.tv_sec);
printf("MEMBARRIER_CMD_GLOBAL took %lu nanoseconds\n", delta_ns);
}
}
jannh@laptop:~/test/rcu$ gcc -o rcu_membarrier rcu_membarrier.c -Wall
jannh@laptop:~/test/rcu$ time ./rcu_membarrier
MEMBARRIER_CMD_GLOBAL took 17155142 nanoseconds
MEMBARRIER_CMD_GLOBAL took 19207001 nanoseconds
MEMBARRIER_CMD_GLOBAL took 16087350 nanoseconds
MEMBARRIER_CMD_GLOBAL took 15963711 nanoseconds
MEMBARRIER_CMD_GLOBAL took 16336149 nanoseconds
MEMBARRIER_CMD_GLOBAL took 15931331 nanoseconds
MEMBARRIER_CMD_GLOBAL took 16020315 nanoseconds
MEMBARRIER_CMD_GLOBAL took 15873814 nanoseconds
MEMBARRIER_CMD_GLOBAL took 15945667 nanoseconds
MEMBARRIER_CMD_GLOBAL took 23815452 nanoseconds
MEMBARRIER_CMD_GLOBAL took 23626444 nanoseconds
MEMBARRIER_CMD_GLOBAL took 19911435 nanoseconds
MEMBARRIER_CMD_GLOBAL took 23967343 nanoseconds
MEMBARRIER_CMD_GLOBAL took 15943147 nanoseconds
MEMBARRIER_CMD_GLOBAL took 23914809 nanoseconds
MEMBARRIER_CMD_GLOBAL took 32498986 nanoseconds
MEMBARRIER_CMD_GLOBAL took 19450932 nanoseconds
MEMBARRIER_CMD_GLOBAL took 16281308 nanoseconds
MEMBARRIER_CMD_GLOBAL took 24045168 nanoseconds
MEMBARRIER_CMD_GLOBAL took 15406698 nanoseconds

real 0m0.458s
user 0m0.058s
sys 0m0.031s
jannh@laptop:~/test/rcu$

Every invocation of MEMBARRIER_CMD_GLOBAL on my laptop took >10 ms.