Re: [PATCH] x86/resctrl: Fix potential lockdep warning

From: Xiaochen Shen
Date: Sat Nov 16 2019 - 11:13:30 EST


Hi Boris,

Thank you for your kind code review. Please find my comments inline.

On 11/13/2019 19:44, Borislav Petkov wrote:
On Thu, Nov 07, 2019 at 06:36:36AM +0800, Xiaochen Shen wrote:
rdtgroup_cpus_write() and mkdir_rdt_prepare() call
rdtgroup_kn_lock_live() -> kernfs_to_rdtgroup() to get 'rdtgrp', and
then call rdt_last_cmd_xxx() functions which will check if

Write those names like this:

rdt_last_cmd_{clear,puts,...} but not with an "xxx" which confuses
people unfamiliar with the code.

OK. I got it. rdt_last_cmd_{clear,puts,printf}().


rdtgroup_mutex is held/requires its caller to hold rdtgroup_mutex.
But if 'rdtgrp' returned from kernfs_to_rdtgroup() is NULL,
rdtgroup_mutex is not held and calling rdt_last_cmd_xxx() will result
in a lockdep warning.

That's more of a self-incurred lockdep warning. You can't be calling
lockdep_assert_held() after a function which doesn't always grab the
mutex. Looks like the design needs changing here...

Actually this fix covers all the cases of an audit of the calling paths
of rdt_last_cmd_{clear,puts,printf}(), to make sure we only have the
lockdep_assert_held() in places where we are sure that it must be held.
Please find more background details as below.


Remove rdt_last_cmd_xxx() in these two paths. Just returning error
should be sufficient to report to the user that the entry doesn't exist
any more.

... or that.

In any case, you should consider fixing such patterns in the code as it
looks sub-optimal from where I'm standing.

I would like to provide more of the background details in the commit
comment in v2 patch:

-------------------
x86/resctrl: Fix potential lockdep warning

rdt_last_cmd_{clear,puts,printf}() call lockdep_assert_held() to assert
that rdtgroup_mutex is held.

During internal review of some other changes we found that there are
code paths that call rdt_last_cmd_{clear,puts}() when the rdtgroup_mutex
is not held.

An audit of calling sequences identified two different cases in
rdtgroup_kn_lock_live() which both returning NULL:
1.'rdtgrp' returned from kernfs_to_rdtgroup() is NULL, rdtgroup_mutex
is not held.
2.'rdtgrp' is being deleted, rdtgroup_mutex is held.

Checking all call sites of rdt_last_cmd_{clear,puts,printf}() found two
code paths where rdtgroup_mutex is not held: rdtgroup_cpus_write() and
mkdir_rdt_prepare().

Fix by removing rdt_last_cmd_{clear,puts}() in these two paths. Just
returning error should be sufficient to report to the user that the
entry doesn't exist any more.

Fixes: 94457b36e8a5 ("x86/intel_rdt: Add diagnostics when writing the cpus file")
Fixes: cfd0f34e4cd5 ("x86/intel_rdt: Add diagnostics when making directories")
Signed-off-by: Xiaochen Shen <xiaochen.shen@xxxxxxxxx>
Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>
Reviewed-by: Fenghua Yu <fenghua.yu@xxxxxxxxx>
Reviewed-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
-------------------
Updated commit comment to provide additional context on how these were
found.


Thx.


--
Best regards,
Xiaochen