[PATCH v4 0/8] timers/migration: Fix three possible races and some improvements

From: Anna-Maria Behnsen
Date: Tue Jul 16 2024 - 10:20:11 EST


Borislav reported a warning in timer migration deactive path

https://lore.kernel.org/r/20240612090347.GBZmlkc5PwlVpOG6vT@fat_crate.local

Sadly it doesn't reproduce directly. But with the change of timing (by
adding a trace prinkt before the warning), it is possible to trigger the
warning reliable at least in my test setup. The problem here is a racy
check agains group->parent pointer. This is also used in other places in
the code and fixing this racy usage is adressed by the first patch.

There were two other races reported by Frederic in setup path:

https://lore.kernel.org/r/ZnWOswTMML6ShzYO@localhost.localdomain

https://lore.kernel.org/r/ZnoIlO22habOyQRe@lothringen

Those races are both is addressed by the change of patch 2.

Some updates/cleanups are provided by patch 3-8. ("timers/migration:
Improve tracing" and "timers/migration: Spare write when nothing changed"
are the same as provided by v2).

Patches are available here:

https://git.kernel.org/pub/scm/linux/kernel/git/anna-maria/linux-devel.git timers/misc

---
Changes in v4:
- Update Patch 2: Fix broken cpuhp_setup_state() call for prepare
- Update Patch 2: Activate child during setup only when it is an already
existing group
- Update Patch 2: Change init into early_initcall() to make usage of
preparation by an already active CPU.
- Update Patch 2: Move initialization of tmc in tmigr_cpu_prepare() before
using data of tmc (e.g. by a tracepoint)
- Update Patch 5: Use proper childmask for tmigr_walk in __walk_groups()
- Update Patch 6: Fix missing update of s/childmask/groupmask in
connect_[cpu|child]_parent tracepoint and update to change of Patch 5
- Link to v3: https://lore.kernel.org/r/20240701-tmigr-fixes-v3-0-25cd5de318fb@xxxxxxxxxxxxx

Changes in v3:
- Address the new reported possible race (childmask and parent pointer)
together with the existing race (both reported by Frederic).
- New cleanup: Two patches to access childmask and parent pointer only in
one place
- New cleanup: Rename childmask to parentmask as during discussions there
was some kind of confusion because of the naming
- New cleanup: Fix typo
- Fix prefix in all patches (s$timer_migration$timers/migration$)
- Link to v2: https://lore.kernel.org/r/20240624-tmigr-fixes-v2-0-3eb4c0604790@xxxxxxxxxxxxx

Changes in v2:
- Address another possible race in setup code (reported by Frederic) and
recycle therefore one improvement patch
- Change order and move the already existing improvement patch to the end
of the queue
- Existing patches didn't change
- Link to v1: https://lore.kernel.org/r/20240621-tmigr-fixes-v1-0-8c8a2d8e8d77@xxxxxxxxxxxxx

Thanks,

Anna-Maria

---
Anna-Maria Behnsen (8):
timers/migration: Do not rely always on group->parent
timers/migration: Move hierarchy setup into cpuhotplug prepare callback
timers/migration: Improve tracing
timers/migration: Use a single struct for hierarchy walk data
timers/migration: Read childmask and parent pointer in a single place
timers/migration: Rename childmask by groupmask to make naming more obvious
timers/migration: Spare write when nothing changed
timers/migration: Fix grammar in comment

include/linux/cpuhotplug.h | 1 +
include/trace/events/timer_migration.h | 16 +-
kernel/time/timer_migration.c | 383 ++++++++++++++++-----------------
kernel/time/timer_migration.h | 27 ++-
4 files changed, 214 insertions(+), 213 deletions(-)