On Thu, Apr 02, 2015 at 10:15:27AM -0400, Don Zickus wrote:
On Thu, Apr 02, 2015 at 09:49:45AM -0400, Chris Metcalf wrote:Still useful actually because nohz full only takes effect when a single task runs
Ah, I am starting to understand your approach in the original patch better.Can I ask how the NO_HZ_FULL technology works from userspace? Is there aThe NO_HZ_FULL option, when configured into the kernel, lets
system command that has to be sent? How does the kernel know to turn off
ticks and trust userspace to do the right thing?
you boot with "nohz_full=1-15" (or whatever cpumask you like),
typically in conjunction with "isolcpus=1-15". At this point no tasks
will run on those cores until explicitly placed there by affinity, and
once there and running in userspace, the kernel will automatically
get out of their way and not interrupt at all. This lets those tasks
run with 100.000% of the cpu, which is a requirement for many
user-space device drivers running high throughput devices.
(This is typically the use case for the tile architecture customers.)
So, other than a boot flag, there are no system commands or
other APIs to deal with.
Part of the requirement, though, is that there can be only one taskSo, there is no preemption happening, which means the softlockup is rather
bound and runnable on that cpu, otherwise the kernel has to be
involved to do the context-switching off of the scheduler tick.
This is why having the standard watchdog kernel thread doesn't
work in this context.
pointless.
on the CPU. But there can still be more than 1 task running, just nohz full will
be disabled. It all happens dynamically.
Can interrupts be disabled or handled on that cpu? I am tryingNo interrupts aren't disabled on these CPUs. Now the goal is to avoid them:
to see if the hardlockup detector becomes rather silly on those cpus too.
migrate irqs, nohz full, etc...
But there can be irqs. And actually there is at least 1 tick every second in
order to keep the scheduler stats moving forward. We plan to get rid of it but
anyway the point is that IRQ can happen on nohz full CPUs.
All agreed with that! We should at least keep the watchdog running onI continue to suspect that the right model here is to disable theI think I might be slowly coming around to your thoughts. I might request a
watchdog specifically on the cores that the user has tagged with
the nohz_full boot argument. I agree that there might be a case
to be made for leaving the watchdog conditionally (as suggested
by Ingo) but it should be possible to have the watchdogs on
the nohz_full cores be turned off completely if desired.
different patch though based on the answers above. Maybe even create a
subset of the online cpus for the watchdog to work off of. The watchdog
would copy the online cpu mask, mask off the nohz cpus and just function
that way. It would print loud messages for each nohz cpu it was masking
off.
non-nohz-full CPUs. And also allow to re-enable it everywhere when needed,
in case we have a lockup to chase on nohz full CPUs.
Then perhaps as a debug aid, expose a /proc/sys/kernel/watchdog_cpumask forThat sounds like a good idea.
folks to modify in case they want to enable the watchdog on the nohz cpus.