Re: [PATCH] ftrace: fix race between ftrace and insmod

From: Zhang, Yanmin
Date: Wed Dec 16 2015 - 05:28:43 EST



On 2015/12/16 1:37, Steven Rostedt wrote:
> On Tue, 15 Dec 2015 11:26:41 +0800
> "Zhang, Yanmin" <yanmin_zhang@xxxxxxxxxxxxxxx> wrote:
>
>>> This seems very hackish, although I can't think of a better way at the
>>> moment. But I would like not to add more code into module.c if
>>> possible, and just use a notifier unless there's a real reason we can't
>>> (as there was when we added the hook in the first place).
>>> We would double-check it again. It's not a good idea to change common
>>> module codes.
>> I double-checked it. If using notifier, we have to add 2 new module notification
>> events in enum module_state.
>> For example, MODULE_STATE_INITING, MODULE_STATE_INITING_FAILED.
>> At MODULE_STATE_INITING, call ftrace_module_init(mod);
>> At MODULE_STATE_INITING_FAILED, call ftrace_release_mod (or similar function);
>> At MODULE_STATE_COMING, call a new function which links the new start_pg
>> to ftrace_pages.
>> Such design can also fix another bug in current kernel that ftrace start_pg
>> of the module is not cleaned up if complete_formation fails.
>> However, we change module common codes. The new events only serve ftrace.
>>
>> If not holding ftrace_mutex across start_pg setup and complete_formation,
>> some other variables are not protected well, such like ftrace_pages,
>> ftrace_start_up, and so on.
> OK, that was basically the reason the hook was added in the first place.
>
> The reason the modules notifier doesn't work here is that there's too
> many exits from the module code that may leave the mutex held.
>
> I thought about this some more. So the issue is that we add the module
> functions to the ftrace records. Then another task enables function
> tracing before the module is fully set up. As ftrace is enabling
> function tracing, the module sets its text to read-only, outside the
> scope of ftrace setting all text to read-write. When ftrace gets to the
> module code, it fails to do the modifications and you get the
> ftrace_bug() splat.
>
> Sounds right?
>
> Now, I really dislike the taking of the ftrace mutex and returning back
> to module handling. That is very subtle, fragile and prone to bugs. I
> took a step back to find another solution and I think I found one.
>
> Take the below patch and add to it (I'll keep this patch as mine, and
> you can submit another patch to do the following). You don't need to
> resend this patch, just send me a patch that does this:
>
>
> Add the module notifier that calls ftrace_module_enable(mod), removing
> it from ftrace_init_module(). Feel free to monkey with that function to
> be able to be called directly if it can't already.
>
> What my patch does is to create a new ftrace record flag DISABLED,
> which is set to all functions in a module as it is added. Then later on
> (in the module notifier that you will add), the flag is cleared. In
> between, the module functions will be ignored.
>
>
> If the module errors out and the notifier is not called, we don't care.
> The records will simply be removed, and the flag will be meaningless.
>
> I needed to move the code around to still let the module code get
> enabled if tracing was already enabled. I even cleaned that code up as
> I found it has gotten a bit messy.
>
> -- Steve
>
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 4736a826baf5..660e7c698f3b 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -357,6 +357,7 @@ bool is_ftrace_trampoline(unsigned long addr);
> * REGS - the record wants the function to save regs
> * REGS_EN - the function is set up to save regs.
> * IPMODIFY - the record allows for the IP address to be changed.
> + * DISABLED - the record is not ready to be touched yet
> *
> * When a new ftrace_ops is registered and wants a function to save
> * pt_regs, the rec->flag REGS is set. When the function has been
> @@ -371,10 +372,11 @@ enum {
> FTRACE_FL_TRAMP = (1UL << 28),
> FTRACE_FL_TRAMP_EN = (1UL << 27),
> FTRACE_FL_IPMODIFY = (1UL << 26),
> + FTRACE_FL_DISABLED = (1UL << 25),
> };
>
> -#define FTRACE_REF_MAX_SHIFT 26
> -#define FTRACE_FL_BITS 6
> +#define FTRACE_REF_MAX_SHIFT 25
> +#define FTRACE_FL_BITS 7
> #define FTRACE_FL_MASKED_BITS ((1UL << FTRACE_FL_BITS) - 1)
> #define FTRACE_FL_MASK (FTRACE_FL_MASKED_BITS << FTRACE_REF_MAX_SHIFT)
> #define FTRACE_REF_MAX ((1UL << FTRACE_REF_MAX_SHIFT) - 1)
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index e290a30f2d0b..e7e15874fb7c 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -2023,6 +2023,9 @@ static int ftrace_check_record(struct dyn_ftrace *rec, int enable, int update)
>
> ftrace_bug_type = FTRACE_BUG_UNKNOWN;
>
> + if (rec->flags & FTRACE_FL_DISABLED)
> + return FTRACE_UPDATE_IGNORE;
> +
> /*
> * If we are updating calls:
> *
> @@ -2849,64 +2852,41 @@ ops_references_rec(struct ftrace_ops *ops, struct dyn_ftrace *rec)
> return 1;
> }
>
> -static int referenced_filters(struct dyn_ftrace *rec)
> -{
> - struct ftrace_ops *ops;
> - int cnt = 0;
> -
> - for (ops = ftrace_ops_list; ops != &ftrace_list_end; ops = ops->next) {
> - if (ops_references_rec(ops, rec))
> - cnt++;
> - }
> -
> - return cnt;
> -}
> -
> static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs)
> {
> struct ftrace_page *pg;
> struct dyn_ftrace *p;
> cycle_t start, stop;
> unsigned long update_cnt = 0;
> - unsigned long ref = 0;
> - bool test = false;
> + unsigned long rec_flags = 0;
> int i;
>
> + start = ftrace_now(raw_smp_processor_id());
> +
> /*
> - * When adding a module, we need to check if tracers are
> - * currently enabled and if they are set to trace all functions.
> - * If they are, we need to enable the module functions as well
> - * as update the reference counts for those function records.
> + * When a module is loaded, this function is called to convert
> + * the calls to mcount in its text to nops, and also to create
> + * an entry in the ftrace data. Now, if ftrace is activated
> + * after this call, but before the module sets its text to
> + * read-only, the modification of enabling ftrace can fail if
> + * the read-only is done while ftrace is converting the calls.
> + * To prevent this, the module's records are set as disabled
> + * and will be enabled after the call to set the module's text
> + * to read-only.
> */
> - if (mod) {
> - struct ftrace_ops *ops;
> -
> - for (ops = ftrace_ops_list;
> - ops != &ftrace_list_end; ops = ops->next) {
> - if (ops->flags & FTRACE_OPS_FL_ENABLED) {
> - if (ops_traces_mod(ops))
> - ref++;
> - else
> - test = true;
> - }
> - }
> - }
> -
> - start = ftrace_now(raw_smp_processor_id());
> + if (mod)
> + rec_flags |= FTRACE_FL_DISABLED;
>
> for (pg = new_pgs; pg; pg = pg->next) {
>
> for (i = 0; i < pg->index; i++) {
> - int cnt = ref;
>
> /* If something went wrong, bail without enabling anything */
> if (unlikely(ftrace_disabled))
> return -1;
>
> p = &pg->records[i];
> - if (test)
> - cnt += referenced_filters(p);
> - p->flags = cnt;
> + p->flags = rec_flags;
>
> /*
> * Do the initial record conversion from mcount jump
> @@ -2916,21 +2896,6 @@ static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs)
> break;
>
> update_cnt++;
> -
> - /*
> - * If the tracing is enabled, go ahead and enable the record.
> - *
> - * The reason not to enable the record immediatelly is the
> - * inherent check of ftrace_make_nop/ftrace_make_call for
> - * correct previous instructions. Making first the NOP
> - * conversion puts the module to the correct state, thus
> - * passing the ftrace_make_call check.
> - */
> - if (ftrace_start_up && cnt) {
> - int failed = __ftrace_replace_code(p, 1);
> - if (failed)
> - ftrace_bug(failed, p);
> - }
> }
> }
>
> @@ -4938,6 +4903,19 @@ static int ftrace_process_locs(struct module *mod,
>
> #define next_to_ftrace_page(p) container_of(p, struct ftrace_page, next)
>
> +static int referenced_filters(struct dyn_ftrace *rec)
> +{
> + struct ftrace_ops *ops;
> + int cnt = 0;
> +
> + for (ops = ftrace_ops_list; ops != &ftrace_list_end; ops = ops->next) {
> + if (ops_references_rec(ops, rec))
> + cnt++;
> + }
> +
> + return cnt;
> +}
> +
> void ftrace_release_mod(struct module *mod)
> {
> struct dyn_ftrace *rec;
> @@ -4980,12 +4958,82 @@ void ftrace_release_mod(struct module *mod)
> mutex_unlock(&ftrace_lock);
> }
>
> +static void ftrace_module_enable(struct module *mod)
> +{
> + struct dyn_ftrace *rec;
> + struct ftrace_page *pg;
> +
> + mutex_lock(&ftrace_lock);
> +
> + if (ftrace_disabled)
> + goto out_unlock;
> +
> + /*
> + * If the tracing is enabled, go ahead and enable the record.
> + *
> + * The reason not to enable the record immediatelly is the
> + * inherent check of ftrace_make_nop/ftrace_make_call for
> + * correct previous instructions. Making first the NOP
> + * conversion puts the module to the correct state, thus
> + * passing the ftrace_make_call check.
> + *
> + * We also delay this to after the module code already set the
> + * text to read-only, as we now need to set it back to read-write
> + * so that we can modify the text.
> + */
> + if (ftrace_start_up)
> + ftrace_arch_code_modify_prepare();
> +
> + do_for_each_ftrace_rec(pg, rec) {
> + int cnt;
> + /*
> + * do_for_each_ftrace_rec() is a double loop.
> + * module text shares the pg. If a record is
> + * not part of this module, then skip this pg,
> + * which the "break" will do.
> + */
> + if (!within_module_core(rec->ip, mod))
> + break;
> +
> + cnt = 0;
> +
> + /*
> + * When adding a module, we need to check if tracers are
> + * currently enabled and if they are, and can trace this record,
> + * we need to enable the module functions as well as update the
> + * reference counts for those function records.
> + */
> + if (ftrace_start_up)
> + cnt += referenced_filters(rec);
> +
> + /* This clears FTRACE_FL_DISABLED */
> + rec->flags = cnt;
> +
> + if (ftrace_start_up && cnt) {
> + int failed = __ftrace_replace_code(rec, 1);

If we choose to call ftrace_module_enable when receiving module notification
MODULE_STATE_COMING, TEXT section of the module is already changed to RO.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/