Re: [PATCH 1/1] perf: Add CPU hotplug support for events

From: Raghavendra Rao Ananta
Date: Fri Feb 16 2018 - 20:48:33 EST




On 02/16/2018 12:39 PM, Peter Zijlstra wrote:
On Fri, Feb 16, 2018 at 10:06:29AM -0800, Raghavendra Rao Ananta wrote:
No this is absolutely disguisting. You can simply keep the events in the
dead CPU's context. It's really not that hard.
Keeping the events in the dead CPU's context was also an idea that we had.
However, detaching that event from the PMU when the CPU is offline would be
a pain. Consider the scenario in which an event is about to be destroyed
when the CPU is offline (yet still attached to the CPU). During it's
destruction, a cross-cpu call is made (from perf_remove_from_context()) to
the offlined CPU to detach the event from the CPU's PMU. As the CPU is
offline, that would not be possible, and again a separate logic has to be
written for cleaning up the events whose CPUs are offlined.

That is actually really simple to deal with. The real problems are with
semantics, is an event enabled when the CPU is dead? Can you
disable/enable an event on a dead CPU.

The below patch (_completely_ untested) should do most of it, but needs
help with the details. I suspect we want to allow enable/disable on
events that are on a dead CPU, and equally I think we want to account
the time an enabled event spends on a dead CPU to go towards the
'enabled' bucket.
I've gone through your diff, and it gave me a hint of similar texture what we are trying to do (except for maintaining an offline event list). Nevertheless, I tried to test your patch. I created an hw event, and tried to offline the CPU in parallel, and I immediately hit a watchdog soft lockup bug! Tried the same this by first switching off the CPU (without any event created), and I hit into similar issue. I am sure we can fix it, but apart from the "why we are doing hotplug?" question, was was there specifically any issue with our patch?


Also, you _still_ don't explain why you care about dead CPUs.

I wanted to understand, if we no longer care about hotplugging of CPUs, then why do we still have exported symbols such as cpu_up() and cpu_down()? Moreover, we also have the hotplug interface exposed to users-space as well (through sysfs). As long as these interfaces exist, there's always a potential chance of bringing the CPU up/down. Can you please clear this thing up for me?

-- Raghavendra

--
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project