Re: [PATCH -tip 1/2] x86/alternative: Sync bp_patching update for avoiding NULL pointer exception

From: Masami Hiramatsu
Date: Mon Dec 02 2019 - 09:39:37 EST


On Mon, 2 Dec 2019 14:43:54 +0100
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> On Mon, Dec 02, 2019 at 08:50:12PM +0900, Masami Hiramatsu wrote:
> > On Mon, 2 Dec 2019 10:15:19 +0100
> > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > On Wed, Nov 27, 2019 at 02:56:52PM +0900, Masami Hiramatsu wrote:
>
> > > > --- a/arch/x86/kernel/alternative.c
> > > > +++ b/arch/x86/kernel/alternative.c
> > > > @@ -1134,8 +1134,14 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
> > > > * sync_core() implies an smp_mb() and orders this store against
> > > > * the writing of the new instruction.
> > > > */
> > > > - bp_patching.vec = NULL;
> > > > bp_patching.nr_entries = 0;
> > > > + /*
> > > > + * This sync_core () ensures that all int3 handlers in progress
> > > > + * have finished. This allows poke_int3_handler () after this to
> > > > + * avoid touching bp_paching.vec by checking nr_entries == 0.
> > > > + */
> > > > + text_poke_sync();
> > > > + bp_patching.vec = NULL;
> > > > }
> > >
> > > Hurm.. is there no way we can merge that with the 'last'
> > > text_poke_sync() ? It seems a little daft to do 2 back-to-back IPI
> > > things like that.
> >
> > Maybe we can add a NULL check of bp_patchig.vec in poke_int3_handler()
> > but it doesn't ensure the fundamental safeness, because the array
> > pointed by bp_patching.vec itself can be released while
> > poke_int3_handler() accesses it.
>
> No, what I mean is something like:
>
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 30e86730655c..347a234a7c52 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -1119,17 +1119,13 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
> * Third step: replace the first byte (int3) by the first byte of
> * replacing opcode.
> */
> - for (do_sync = 0, i = 0; i < nr_entries; i++) {
> + for (i = 0; i < nr_entries; i++) {
> if (tp[i].text[0] == INT3_INSN_OPCODE)
> continue;
>
> text_poke(text_poke_addr(&tp[i]), tp[i].text, INT3_INSN_SIZE);
> - do_sync++;
> }
>
> - if (do_sync)
> - text_poke_sync();
> -
> /*
> * sync_core() implies an smp_mb() and orders this store against
> * the writing of the new instruction.
>
>
> Or is that unsafe ?

OK, let's check it.

text_poke_bp_batch() {
update vec
update nr_entries
smp_wmb()
write int3
text_poke_sync()
write rest_bytes
text_poke_sync() if rest_bytes
write first_byte
text_poke_sync() if first_byte ... (*)
update nr_entries
text_poke_sync() ... (**)
update vec
}

Before (*), the first byte can be new opcode or int3, thus
poke_int3_handler() can be called. But anyway, at that point
nr_entries != 0, thus poke_int3_handler() correctly emulate
the new instruction.

Before (**), all int3 should be removed, so nr_entries must
not accessed, EXCEPT for writing int3 case.

If we just remove the (*) as you say, the poke_int3_handler()
can see nr_entries = 0 before (**). So it is still unsafe.

I considered another way that skipping (**) if !first_byte,
since (*) ensured the target address(text) doesn't hit int3
anymore.
However, this will be also unsafe because there can be another
int3 (by kprobes) has been hit while updating nr_entries and vec.


Thank you,

--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>