Re: [RFC PATCH v9 for 4.15 01/14] Restartable sequences system call

From: Mathieu Desnoyers
Date: Mon Oct 23 2017 - 16:42:01 EST


----- On Oct 23, 2017, at 7:30 PM, Ben Maurer bmaurer@xxxxxx wrote:

>> if (!((long)ip - (long)start_ip <= (long)post_commit_offset))
>>ÂÂ return 1;
>
>> This introduces an issue here: if "ip" is lower than "start_ip", we
>> can incorrectly think we are in a critical section, when we are in
>> fact not.
>
> This shouldn't be an issue if we used unsigned numbers. Eg if start_ip is X and
> post_commit_offset is L, then (ip - X <= L) means that if ip is less than X ip
> - X will be signed, which will become a large unsigned value.
>
>> or to the kernel to set it back to NULL if it finds out that it is
>> preempting/delivering a signal over an instruction pointer outside
>> of the current rseq_cs start_ip/post_commit_ip range (lazy clear).
>
> I see, lazy clear makes sense. Still, if during most execution periods the user
> code enters some rseq section (likely if rseq is used for something like
> malloc) on every context switch this code will have to be run.
>
>> Moreover, this modification would add a subtraction on the common case
>> (ip - start_ip), and makes the ABI slightly uglier.
>
> We could benchmark it but the subtraction should be similar in cost to the extra
> comparison but reducing the number of branches seems like it will help as well.
> FWIW GCC attempts to translate this kind of sequence to a subtract and compare:
> https://godbolt.org/g/5DGLvo.
>
> I agree the ABI is uglier, but since we're mucking with every context switch I
> thought I'd point it out.

Thanks for following up on this. I did not initially realize the importance
of doing the unsigned comparison. I've pushed a commit in my private dev branch
implementing your suggestion.

https://github.com/compudj/linux-percpu-dev/commit/4cf8e9104636b51741c0118f2c88519e3acab7aa

Thanks!

Mathieu

>
>> If I understand well, you are proposing to speed up .so load time by
>> means of removing relocations of pointers within rseq_cs, done by
>> making those relative to the rseq_cs address.
>
> Yeah, I think this may be overkill as optimization.

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com