Re: [PATCH v5 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state

From: Thomas Gleixner
Date: Sun Jun 23 2019 - 23:01:50 EST


On Wed, 19 Jun 2019, Fenghua Yu wrote:
> C0.2 state in umwait and tpause instructions can be enabled or disabled
> on a processor through IA32_UMWAIT_CONTROL MSR register.

through the IA32....CONTROL MSR.

MSR register, IOW: Machine Specific Register register.

>
> Andy Lutomirski suggests to turn off local irqs before writing
> MSR_TEST_CTL to ensure msr_test_ctl_cached is not changed by sysfs write

What has MSR_TEST_CTL to do with this?

> +/*
> + * Serialize access to umwait_control_cached and IA32_UMWAIT_CONTROL MSR
> + * in writing sysfs to ensure all CPUs have the same MSR value.
> + */
> +static DEFINE_MUTEX(umwait_lock);
> +
> +static void update_this_cpu_umwait_control_msr(void)

Why is this not following the umwait_ namespace as everything else?

> +{
> + unsigned long flags;
> +
> + /*
> + * We need to prevent umwait_control_cached from being changed *and*
> + * completing its WRMSR between our read and our WRMSR. By turning

Huch? How does umwait_control_cached complete its WRMSR?

> + * IRQs off here, ensure that no sysfs write happens on this CPU

How would that happen? If this is called from a hotplugged CPU then that
CPU cannot handle sysfs writes even if the hotplug thread on which this
runs is preempted. The CPU is not marked active yet and cannot schedule
user space tasks. Doing the wrmsr(MSR, READ()) preemptible would just make
the race window larger.

> + * and we also make sure that any concurrent sysfs write from a
> + * different CPU will not finish updating us via IPI until we're done.

will not finish updating us?

> + local_irq_save(flags);

That local_irq_save() belongs into the cpu hotplug callback and there it
wants to be a local_irq_disable(). The IPI runs already with interrupts
disabled. And that comment wants to be in the hotplug function as well.

With that you can also spare the indirection via that extra IPI function
below and just add the void *unused argument to this and invoke it from the
hotplug callback with NULL.

> +
> + wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0);
> +
> + local_irq_restore(flags);
> +}

> +static u32 get_umwait_ctrl_c02(void)
> +{
> + return READ_ONCE(umwait_control_cached) & MSR_IA32_UMWAIT_CONTROL_C02_DISABLED;
> +}
> +
> +static u32 get_umwait_ctrl_max_time(void)
> +{
> + return READ_ONCE(umwait_control_cached) & MSR_IA32_UMWAIT_CONTROL_MAX_TIME;
> +}
> +
> +static ssize_t
> +enable_c02_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> + /*
> + * When bit 0 in IA32_UMWAIT_CONTROL MSR is 1, C0.2 is disabled.
> + * Otherwise, C0.2 is enabled. Show the opposite of bit 0.
> + */
> + return sprintf(buf, "%d\n", !(bool)get_umwait_ctrl_c02());

Eeew. !(bool)....

> +}
> +
> +static ssize_t enable_c02_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + u32 umwait_c02;
> + bool c02_enabled;

Your naming conventions are inconsistent. The file is named 'enable', which
is fine, but here you read the written value into a variable named 'enabled'

> + int ret;
> +
> + ret = kstrtobool(buf, &c02_enabled);
> + if (ret)
> + return ret;
> +
> + mutex_lock(&umwait_lock);
> +
> + /*
> + * The value of bit 0 in IA32_UMWAIT_CONTROL MSR is opposite of
> + * c02_enabled.

How many of those comments do we need?

> + */
> + umwait_c02 = (u32)!c02_enabled;

Umpf.

> + if (umwait_c02 == get_umwait_ctrl_c02())
> + goto out_unlock;
> +
> + WRITE_ONCE(umwait_control_cached,
> + UMWAIT_CTRL_VAL(get_umwait_ctrl_max_time(), umwait_c02));

And how often do we need to read that cached value? Why not doing the
obvious, read the ctrl value once and then work from there.

ctrl = READ_ONCE(umwait_control_cached);

if (c02_enable == umwait_c02_enabled(ctrl))
goto out_unlock;

ctrl = umwait_max_time(ctrl);
if (!c02_enable)
ctrl |= MSR_IA32_UMWAIT_CONTROL_C02_DISABLE;

WRITE_ONCE(umwait_control_cached, ctrl);

That does not need any comment about the inverted bit and whatever. It's
just clear what it does. The only place where this comment is required is
in umwait_c02_enabled() which returns the enabled state from the control
value argument as boolean. That makes also the weird type casts go away.

> static int __init umwait_init(void)
> {
> + struct device *dev;
> int ret;
>
> if (!boot_cpu_has(X86_FEATURE_WAITPKG))
> return -ENODEV;
>
> + /* Add umwait control interface. */
> + dev = cpu_subsys.dev_root;
> + ret = sysfs_create_group(&dev->kobj, &umwait_attr_group);
> + if (ret)
> + return ret;

When that fails then all CPUs have some random possibly nonsensical value
in the control MSR, whatever the firmware or reset default set it to.

Not that it matters much, because boot will probably not work at all, but
you can spare the removal below, when you move that sysfs thing to the very
end. And it makes more sense that way.

Thanks,

tglx