Re: [PATCH RESEND] cpu/hotplug: Wait for cpu_hotplug to be enabled in cpu_up/down
From: Matija Glavinic Pecotic
Date: Tue Feb 04 2020 - 02:44:07 EST
Hello Thomas,
On 02/03/2020 07:08 PM, Thomas Gleixner wrote:
So what?. User space has to handle -EBUSY properly and it was possible
even before that PCI commit that the online/offline operation request
returned -EBUSY.
What's confusing about a EBUSY return code? It's pretty universaly used
in situations where a facility is temporarily busy. If it's not
sufficiently documented, why EBUSY can be returned and what that means,
then this needs to be improved.
It is true this was happening before your work in the pci subsystem, I
should've referenced original commit which made cpu_up/down returning
EBUSY, I agree there is nothing to fix in your patch.
EBUSY existing and being commonly used doesnt justify it in every
situation. We do not have problem only in userspace, but kernel as well,
no user of cpu_up/down takes into account of possible temporal
unavailability. Going into extreme, we could start returning EBUSY
whenever we have resource/facility taken which would made every
interface candidate for returning it. As I see it, EBUSY has its place
in nonblocking APIs. Others should try (hard) not to return it. Handling
it is further topic of its own. How large the timeout to quit? Let's say
that we know that for cpu, it is 10 seconds which I proposed. Passing
responsibility to select tmo to the users will spread out that policy to
each subsystem of its own, yielding to situations where it will for
someone work, for others not, depending on the tmo chosen.
These kind of waits I do not prefer, but I wasnt able to think of
anything better to try to improve this situation. I still believe it
should be improved, and once/if cpu hotplug will be able to remove
cpu_hotplug_enable/disable, remove it.
I have no idea why you need to offline/online CPUs to partition a
system. There are surely more sensible ways to do that, but that's not
part of this discussion.
I'd be happy to make it part.
We are using partrt from
https://github.com/OpenEneaLinux/rt-tools/tree/master/partrt,
cpu_up/down is part of it, AFAIK, it is there to force timer migration
and doesnt have any other (known to me) usage. In the meantime since we
started with core isolation, we changed how we treat isolated cores. We
are now starting with isolcpus=cpu-list nohz_full=cpu-list
rcu_nocbs=cpu-list, and we are atm at Linux 4.19. Earlier we had
different setup where we wanted to use cores in the startup, partition
later, however that showed to be problematic and not in line with how
things are going in the area.
Do you think we do not need toggle them under these conditions?
Thanks,
Matija