Re: [PATCH v3 1/1] coresight: prevent deactivate active config while enabling the config
From: Yeoreum Yun
Date: Mon Mar 24 2025 - 15:03:13 EST
Hi Mike,
> Hi,
>
> On Mon, 24 Mar 2025 at 13:59, Yeoreum Yun <yeoreum.yun@xxxxxxx> wrote:
> >
> > Hi Mike,
> >
> > Please ignore my foremer mail.. and please see my comments for your
> > suggestion.
> >
> > > Hi
> > >
> > > On Fri, 14 Mar 2025 at 15:25, Yeo Reum Yun <YeoReum.Yun@xxxxxxx> wrote:
> > > >
> > > > Hi, Mike.
> > > >
> > > > > > static void cscfg_remove_owned_csdev_features(struct coresight_device *csdev, void *load_owner)
> > > > > > @@ -867,6 +870,28 @@ void cscfg_csdev_reset_feats(struct coresight_device *csdev)
> > > > > > }
> > > > > > EXPORT_SYMBOL_GPL(cscfg_csdev_reset_feats);
> > > > > >
> > > > > > +static bool cscfg_config_desc_get(struct cscfg_config_desc *config_desc, bool enable)
> > > > > > +{
> > > > > > + if (enable)
> > > > > > + return atomic_inc_not_zero(&config_desc->active_cnt);
> > > > > > +
> > > > >
> > > > > Not sure why we have an "enable" parameter here - it completely
> > > > > changes the meaning of the function - with no comment at the start.
> > > >
> > > > Sorry. But what I intended is to distinguish
> > > > - activation of config
> > > > - enable of activated config.
> > > > Because, current coresight doesn't grab the module reference on enable of activate config,
> > > > But It grabs that reference only in activation.
> > > > That's why I used to "enable" parameter to distinguish this
> > > > while I integrate with module_owner count.
> > > >
> > > > > > list_for_each_entry(config_desc, &cscfg_mgr->config_desc_list, item) {
> > > > > > if ((unsigned long)config_desc->event_ea->var == cfg_hash) {
> > > > > > - atomic_dec(&config_desc->active_cnt);
> > > > > > atomic_dec(&cscfg_mgr->sys_active_cnt);
> > > > > > - cscfg_owner_put(config_desc->load_owner);
> > > > > > + cscfg_config_desc_put(config_desc);
> > > > > > dev_dbg(cscfg_device(), "Deactivate config %s.\n", config_desc->name);
> > > > > > break;
> > > > > > }
> > > > > > @@ -1047,7 +1066,7 @@ int cscfg_csdev_enable_active_config(struct coresight_device *csdev,
> > > > > > unsigned long cfg_hash, int preset)
> > > > > > {
> > > > > > struct cscfg_config_csdev *config_csdev_active = NULL, *config_csdev_item;
> > > > > > - const struct cscfg_config_desc *config_desc;
> > > > > > + struct cscfg_config_desc *config_desc;
> > > > > > unsigned long flags;
> > > > > > int err = 0;
> > > > > >
> > > > > > @@ -1062,8 +1081,8 @@ int cscfg_csdev_enable_active_config(struct coresight_device *csdev,
> > > > > > raw_spin_lock_irqsave(&csdev->cscfg_csdev_lock, flags);
> > > > > > list_for_each_entry(config_csdev_item, &csdev->config_csdev_list, node) {
> > > > > > config_desc = config_csdev_item->config_desc;
> > > > > > - if ((atomic_read(&config_desc->active_cnt)) &&
> > > > > > - ((unsigned long)config_desc->event_ea->var == cfg_hash)) {
> > > > > > + if (((unsigned long)config_desc->event_ea->var == cfg_hash) &&
> > > > > > + cscfg_config_desc_get(config_desc, true)) {
> > > > > >
> > > > > This obfuscates the logic of the comparisons without good reason. With
> > > > > the true parameter, the function does no "get" operation but just
> > > > > replicates the logic being replaced - checking the active_cnt is
> > > > > non-zero.
> > > > >
> > > > > Restore this to the original logic to make it readable again
> > > >
> > > > It's not a replicates of comparsion logic, but if true,
> > >
> > > sorry - missed that point .
> > >
> > > > It get the reference of active_cnt but not get module reference.
> > > > The fundemental fault in the UAF becase of just "atomic_read()"
> > > > so, it should hold reference in here.
> > > >
> > > > So, If you think the cscfg_config_desc_get()'s parameter makes obfuscation,
> > > > I think there're two way to modfiy.
> > > >
> > > > 1. cscfg_config_desc_get()/put() always grab/drop the module count.
> > > > 2. remove cscfg_config_desc_get()/put() but just use atomic_XXX(&active_cnt) only
> > > > with cscfg_owner_get()/put()
> > > >
> > > > Any thougt?
> > > >
> > > > Thanks!
> > > >
> > > >
> > >
> > > The get and put functions are asymmetrical w.r.t. owner.
> > >
> > > The put will put owner if active count decrements to 0,
> > > The get if not on enable path will put owner unconditionally.
> > >
> > > This means that the caller has to work out the correct input conditions.
> > >
> > > Might be better if:-
> > >
> > > get_desc()
> > > {
> > > if (! desc->refcnt) {
> > > if (!get_owner())
> > > return false;
> > > }
> > > desc->refcnt++;
> > > return true;
> > > }
> >
> > I think This makes another problem when
> > it races with _cscfg_deactivate_config().
> >
> > CPU0 CPU1
> > (sysfs enable) load module
> > cscfg_load_config_sets()
> > activate config. // sysfs
> > (sys_active_cnt == 1)
> >
> > // sysfs
> > _cscfg_deactivate_config()
> > (sys_active_cnt == 0)
> > (config->active_cnt = 0)
> > ...
> > cscfg_csdev_enable_active_config()
> > lock(csdev->cscfg_csdev_lock)
> > // here get module reference??
> > // even sys_active_cnt == 0 and
> > // config->active_cnt == 1.
> > get_desc()
> > unlock(csdev->cscfg_csdev_lock)
> >
> >
> > // access to config_desc which freed
> > // while unloading module.
> > cfs_csdev_enable_config
> >
> >
> > Because, the desc->refcnt meaning of zero is different from the context.
> > - while activate . it should get module reference if zero.
> > - while enable active configuration, if zero, it should be failed.
> >
> > that means to prevent this race, the core key point is:
> > when config->active_cnt == 0, it should be failed in cscfg_csdev_enable_active_config()
> >
>
> This is not a failure case, it simple means that this config should
> not be activated for this device.
> It is possible for a configuration to be active on the system, without
> it being active for a particular coresight device.
>
> Having a get/put interface for the config descriptor - which prevents
> the config from being unloaded is fine, the key logic here is that we
> are searching a list of possible enabled configurations for this
> device and taking the necessary action to enable it if we find one -
> and there can only ever be a single configuration enabled for a trace
> session.
>
> Therefore when the list of loaded configs for a device is > 1, then
> all but one is allowed to be active - so the search code will validly
> find instances where config->active_cnt == 0.
>
> My objection to the original interface was not the get/put operations
> to protect the module from unload, but the fact that the logic
> deciding if a config needed to be enabled on the device was hidden
> inside the get() operation.
> My suggestion is to restore the logic that decides if there is a
> config to enable on the device be clear in the enable function itself,
> then use get/put as appropriate to prevent module unload.
>
Make sense. Then I'll respin with the suggestion :)
Thanks for clarification!