Re: [RFC v2 1/6] x86: introduce kernel restartable sequence

From: Nadav Amit
Date: Thu Jan 03 2019 - 17:29:42 EST

Next message: Heiko Carstens: "Re: "bpf: Improve the info.func_info and info.func_info_rec_size behavior" breaks strace self tests"
Previous message: Todd Kjos: "Re: [PATCH v1 2/2] binderfs: reserve devices for initial mount"
In reply to: Andi Kleen: "Re: [RFC v2 1/6] x86: introduce kernel restartable sequence"
Next in thread: Andi Kleen: "Re: [RFC v2 1/6] x86: introduce kernel restartable sequence"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> On Jan 3, 2019, at 2:21 PM, Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:
>
> Nadav Amit <namit@xxxxxxxxxx> writes:
>
> I see another poor man's attempt to reinvent TSX.
>
>> It is sometimes beneficial to have a restartable sequence - very few
>> instructions which if they are preempted jump to a predefined point.
>>
>> To provide such functionality on x86-64, we use an empty REX-prefix
>> (opcode 0x40) as an indication for instruction in such a sequence. Before
>> calling the schedule IRQ routine, if the "magic" prefix is found, we
>> call a routine to adjust the instruction pointer. It is expected that
>> this opcode is not in common use.
>
> You cannot just assume something like that. x86 is a constantly
> evolving architecture. The prefix might well have meaning at
> some point.
>
> Before doing something like that you would need to ask the CPU
> vendors to reserve the sequence you're using for software use.

Okâ Iâll try to think about another solution. Just note that this is just
used as a hint to avoid unnecessary lookups. (IOW, nothing will break if the
prefix is used.)

> You're doing the equivalent of patching a private system call
> into your own kernel without working with upstream, don't do that.

I donât understand this comment though. Can you please explain?

> Better to find some other solution to do the restart.
> How about simply using a per cpu variable? That should be cheaper
> anyways.

The problem is that the per-cpu variable needs to be updated after the call
is executed, when we are already not in the context of the âinjectedâ code.
I can increase it before the call, and decrease it after return - but this
can create (in theory) long periods in which the code is âunpatchableâ,
increase the code size and slow performance.

Anyhow, Iâll give more thought. Ideas are welcomed.

Next message: Heiko Carstens: "Re: "bpf: Improve the info.func_info and info.func_info_rec_size behavior" breaks strace self tests"
Previous message: Todd Kjos: "Re: [PATCH v1 2/2] binderfs: reserve devices for initial mount"
In reply to: Andi Kleen: "Re: [RFC v2 1/6] x86: introduce kernel restartable sequence"
Next in thread: Andi Kleen: "Re: [RFC v2 1/6] x86: introduce kernel restartable sequence"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]