Re: [PATCH v2 1/3] x86/cpufeatures: Enumerate user wait instructions

From: Andy Lutomirski
Date: Thu Feb 21 2019 - 01:37:45 EST


On Wed, Feb 20, 2019 at 7:44 PM Tao Xu <tao3.xu@xxxxxxxxx> wrote:
>
> From: Fenghua Yu <fenghua.yu@xxxxxxxxx>

>
> From patchwork Wed Jan 16 21:18:41 2019
> Content-Type: text/plain; charset="utf-8"

[snipped more stuff like this]

What happened here?

> +/* Return value that will be used to set umwait control MSR */
> +static inline u32 umwait_control_val(void)
> +{
> + /*
> + * Enable or disable C0.2 (bit 0) based on global setting on all CPUs.
> + * When bit 0 is 1, C0.2 is disabled. Otherwise, C0.2 is enabled.
> + * So value in bit 0 is opposite of umwait_enable_c0_2.
> + */
> + return ~umwait_enable_c0_2 & UMWAIT_CONTROL_C02_MASK;
> +}

This function is horribly named. How about something like
umwait_compute_msr_value() or something liek that? Also, what
happened to the maximum wait time?

> +
> +static ssize_t umwait_enable_c0_2_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + return sprintf(buf, "%d\n", umwait_enable_c0_2);

I realize that it's traditional to totally ignore races in sysfs and
such, but it's a bad tradition. Please either READ_ONCE it with a
comment or take the mutex.

> +static ssize_t umwait_enable_c0_2_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + int enable_c0_2, cpu, ret;
> + u32 msr_val;
> +
> + ret = kstrtou32(buf, 10, &enable_c0_2);
> + if (ret)
> + return ret;
> +
> + if (enable_c0_2 != 1 && enable_c0_2 != 0)
> + return -EINVAL;

How about if (enable_c0_2 > 1)?

> +
> + mutex_lock(&umwait_lock);
> +
> + umwait_enable_c0_2 = enable_c0_2;
> + msr_val = umwait_control_val();
> + get_online_cpus();
> + /* All CPUs have same umwait control setting */
> + for_each_online_cpu(cpu)
> + wrmsr_on_cpu(cpu, MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
> + put_online_cpus();
> +
> + mutex_unlock(&umwait_lock);

Please factor this thing out into a helper like
umwait_update_all_cpus(). That helper can assert that the lock is
held.

> +/* Set up umwait control MSR on this CPU using the current global setting. */
> +static int umwait_cpu_online(unsigned int cpu)
> +{
> + u32 msr_val;
> +
> + mutex_lock(&umwait_lock);
> +
> + msr_val = umwait_control_val();
> + wrmsr(MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
> +
> + mutex_unlock(&umwait_lock);
> +
> + return 0;
> +}
> +
> +static int __init umwait_init(void)
> +{
> + struct device *dev;
> + int ret;
> +
> + if (!boot_cpu_has(X86_FEATURE_WAITPKG))
> + return -ENODEV;
> +
> + /* Add CPU global user wait interface to control umwait. */
> + dev = cpu_subsys.dev_root;
> + ret = sysfs_create_group(&dev->kobj, &umwait_attr_group);
> + if (ret)
> + return ret;
> +
> + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
> + umwait_cpu_online, NULL);

This hotplug notifier thing is awful. Thomas, do we have a function
that gets called every time a CPU is brought up (via BSP boot, AP
boot, hotplug, hibernation resume, etc) where we can just put all
these things? cpu_init() is almost appropriate, except that it's
called at somewhat erratic times (quite different for BSP and AP IIRC)
and it's not called AFAICT during hibernation restore. I suppose we
could add a new thing that is called by cpu_init() and
restore_processor_state().

Also, surely you should actually write the MSR in this function, too.

>
> static int umwait_enable_c0_2 = 1; /* 0: disable C0.2. 1: enable C0.2. */
> +static u32 umwait_max_time; /* In TSC-quanta. Only bits [31:2] are used. */

I still think the default should be some reasonable nonzero value. It
should be long enough that we get decent C0.2 residency and short
enough that UMWAIT never gives the impression that it is anything
other than a fancy way to save a bit of power and SMT resources when
spinning. I don't want to see a situation where some library uses
UMWAIT under the expectation that it genuinely waits for an event,
appears to work well on bare metal on an otherwise idle system, and
falls apart when it's run in a VM guest or with other software
running. IOW, programs more or less must be written to expect many
spurious wakeups, so I think we should pick a default value so that
there are essentially always many spurious wakeups.

As a guess, I think that the default wait time should be well under 1
ms but at least 20x the C0.2 entry+exit latency.

--Andy
> static ssize_t umwait_enable_c0_2_show(struct device *dev,
> @@ -61,8 +63,46 @@ static ssize_t umwait_enable_c0_2_store(struct device *dev,
>
> static DEVICE_ATTR_RW(umwait_enable_c0_2);
>
> +static ssize_t umwait_max_time_show(struct device *kobj,
> + struct device_attribute *attr, char *buf)
> +{
> + return sprintf(buf, "%u\n", umwait_max_time);
> +}
> +
> +static ssize_t umwait_max_time_store(struct device *kobj,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + u32 msr_val, max_time;
> + int cpu, ret;
> +
> + ret = kstrtou32(buf, 10, &max_time);
> + if (ret)
> + return ret;
> +
> + mutex_lock(&umwait_lock);
> +
> + /* Only get max time value from bits [31:2] */
> + max_time &= UMWAIT_CONTROL_MAX_TIME_MASK;
> + /* Update the max time value in memory */
> + umwait_max_time = max_time;
> + msr_val = umwait_control_val();
> + get_online_cpus();
> + /* All CPUs have same umwait max time */
> + for_each_online_cpu(cpu)
> + wrmsr_on_cpu(cpu, MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
> + put_online_cpus();
> +
> + mutex_unlock(&umwait_lock);
> +
> + return count;
> +}
> +
> +static DEVICE_ATTR_RW(umwait_max_time);
> +
> static struct attribute *umwait_attrs[] = {
> &dev_attr_umwait_enable_c0_2.attr,
> + &dev_attr_umwait_max_time.attr,
> NULL
> };
>