Re: [PATCH RESEND] cpu/hotplug: Wait for cpu_hotplug to be enabled in cpu_up/down

From: Matija Glavinic Pecotic
Date: Tue Feb 04 2020 - 02:44:07 EST


Hello Thomas,

On 02/03/2020 07:08 PM, Thomas Gleixner wrote:
So what?. User space has to handle -EBUSY properly and it was possible
even before that PCI commit that the online/offline operation request
returned -EBUSY.

What's confusing about a EBUSY return code? It's pretty universaly used
in situations where a facility is temporarily busy. If it's not
sufficiently documented, why EBUSY can be returned and what that means,
then this needs to be improved.

It is true this was happening before your work in the pci subsystem, I should've referenced original commit which made cpu_up/down returning EBUSY, I agree there is nothing to fix in your patch.

EBUSY existing and being commonly used doesnt justify it in every situation. We do not have problem only in userspace, but kernel as well, no user of cpu_up/down takes into account of possible temporal unavailability. Going into extreme, we could start returning EBUSY whenever we have resource/facility taken which would made every interface candidate for returning it. As I see it, EBUSY has its place in nonblocking APIs. Others should try (hard) not to return it. Handling it is further topic of its own. How large the timeout to quit? Let's say that we know that for cpu, it is 10 seconds which I proposed. Passing responsibility to select tmo to the users will spread out that policy to each subsystem of its own, yielding to situations where it will for someone work, for others not, depending on the tmo chosen.

These kind of waits I do not prefer, but I wasnt able to think of anything better to try to improve this situation. I still believe it should be improved, and once/if cpu hotplug will be able to remove cpu_hotplug_enable/disable, remove it.

I have no idea why you need to offline/online CPUs to partition a
system. There are surely more sensible ways to do that, but that's not
part of this discussion.

I'd be happy to make it part.

We are using partrt from https://github.com/OpenEneaLinux/rt-tools/tree/master/partrt, cpu_up/down is part of it, AFAIK, it is there to force timer migration and doesnt have any other (known to me) usage. In the meantime since we started with core isolation, we changed how we treat isolated cores. We are now starting with isolcpus=cpu-list nohz_full=cpu-list rcu_nocbs=cpu-list, and we are atm at Linux 4.19. Earlier we had different setup where we wanted to use cores in the startup, partition later, however that showed to be problematic and not in line with how things are going in the area.

Do you think we do not need toggle them under these conditions?

Thanks,

Matija