[PATCH] arch/x86/kernel/tsc.c : set X86_FEATURE_ART for TSC on CPUs like i7-4910MQ : bug #194609

From: Jason Vas Dias
Date: Sun Feb 19 2017 - 10:36:08 EST


Patch to make tsc.c set X86_FEATURE_ART and setup the TSC_ADJUST MSR
correctly on my "i7-4910MQ" CPU, which reports
( boot_cpu_data.cpuid_level==0x13 &&
boot_cpu_data.extended_cpuid_level==0x80000008
), so the code didn't think it supported CPUID:15h, but it does .

Patch:
<quote><code><pre><patch>
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 46b2f41..f76cca8 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1030,6 +1030,7 @@ core_initcall(cpufreq_register_tsc_scaling);
#endif /* CONFIG_CPU_FREQ */

#define ART_CPUID_LEAF (0x15)
+#define MINIMUM_CPUID_EXTENDED_LEAF_THAT_MUST_HAVE_ART (0x80000008)
#define ART_MIN_DENOMINATOR (1)


@@ -1038,24 +1039,43 @@ core_initcall(cpufreq_register_tsc_scaling);
*/
static void detect_art(void)
{
- unsigned int unused[2];
-
- if (boot_cpu_data.cpuid_level < ART_CPUID_LEAF)
- return;
-
- cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
- &art_to_tsc_numerator, unused, unused+1);
-
+ unsigned int v[2];
+
+ if(boot_cpu_data.cpuid_level < ART_CPUID_LEAF)
+ {
+ if(boot_cpu_data.extended_cpuid_level >=
MINIMUM_CPUID_EXTENDED_LEAF_THAT_MUST_HAVE_ART)
+ {
+ pr_info("Would normally not use ART -
cpuid_level:%d < %d - but extended_cpuid_level is: %x, so probing for
ART support.\n",
+ boot_cpu_data.cpuid_level, ART_CPUID_LEAF,
boot_cpu_data.extended_cpuid_level);
+ }else
+ return;
+ }
+
+ cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
+ &art_to_tsc_numerator, v, v+1);
+
/* Don't enable ART in a VM, non-stop TSC required */
if (boot_cpu_has(X86_FEATURE_HYPERVISOR) ||
- !boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ||
- art_to_tsc_denominator < ART_MIN_DENOMINATOR)
- return;
-
- if (rdmsrl_safe(MSR_IA32_TSC_ADJUST, &art_to_tsc_offset))
- return;
-
+ !boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ||
+ art_to_tsc_denominator < ART_MIN_DENOMINATOR)
+ {
+ pr_info("Not using Intel ART for TSC - HYPERVISOR:%d
NO NONSTOP_TSC:%d bad TSC/Crystal ratio denominator: %d.",
boot_cpu_has(X86_FEATURE_HYPERVISOR),
!boot_cpu_has(X86_FEATURE_NONSTOP_TSC), art_to_tsc_denominator );
+ return;
+ }
+ if ( (v[0]=rdmsrl_safe(MSR_IA32_TSC_ADJUST,
&art_to_tsc_offset))!=0) /* will get fault on first read if nothing
written yet */
+ {
+ if((v[1]=wrmsrl_safe(MSR_IA32_TSC_ADJUST, 0))!=0)
+ {
+ pr_info("Not using Intel ART for TSC - failed
to initialize TSC_ADJUST: %d %d.\n", v[0], v[1] );
+ return;
+ }else
+ {
+ art_to_tsc_offset = 0; /* perhaps initalize
to -1 * current rdtsc value ? */
+ pr_info("Using Intel ART for TSC - TSC_ADJUST
initialized to %llu.\n",art_to_tsc_offset);
+ }
+ }
/* Make this sticky over multiple CPU init calls */
+ pr_info("Using Intel Always Running Timer (ART) feature %x
for TSC on all CPUs - TSC/CCC: %d/%d offset: %llu.\n",
X86_FEATURE_ART, art_to_tsc_numerator, art_to_tsc_denominator,
art_to_tsc_offset );
setup_force_cpu_cap(X86_FEATURE_ART);
}
</patch></quote></code></pre>

I originally reported this issue on bugzilla.kernel.org : bug # 194609 :
https://bugzilla.kernel.org/show_bug.cgi?id=194609
, but it was not posted to the list , & then I posted it to the list, but
Julia Lawell <julia.lawall@xxxxxxx> kindly suggested I should re-post with
patch inline, & include extra recipients, which includes the last
person to modify tsc.c (Prarit), so am doing so.

My CPU reports 'model name' as
"Intel(R) Core(TM) i7-4910MQ CPU @ 2.90GHz" ,
has 4 physical & 8 hyperthreading cores with a frequency scalable from 800000
to 3900000 (/sys/devices/system/cpu/cpu0/cpufreq/scaling_{min,max}_freq) , and
flags :
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1
sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase
tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm xsaveopt dtherm
ida arat pln pts

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
$

CPUID:15H is available in user-space, returning the integers : ( 7,
832, 832 ) in EAX:EBX:ECX , yet boot_cpu_data.cpuid_level is 13 , so
in detect_art() in tsc.c,
Linux does not think ART is enabled, and does not set the synthesized CPUID +
((3*32)+10) bit, so a program looking at /dev/cpu/0/cpuid would not
see this bit set .
if an e1000 NIC card had been installed, PTP would not be available.
Also, if the MSR TSC_ADJUST has not yet been written, as it seems to be
nowhere else in Linux, the code will always think X86_FEATURE_ART is 0
because the CPU will always get a fault reading the MSR since it has
never been written.

So the attached patch makes tsc.c set X86_FEATURE_ART correctly in tsc.c ,
and set the TSC_ADJUST to 0 if the rdmsr gets an error .
Please consider applying it to a future linux version.

It would be nice for user-space programs that want to use the TSC with
rdtsc / rdtscp instructions, such as the demo program attached to the
bug report,
could have confidence that Linux is actually generating the results of
clock_gettime(CLOCK_MONOTONIC_RAW, &timespec)
in a predictable way from the TSC by looking at the
/dev/cpu/0/cpuid[bit(((3*32)+10)] value before enabling user-space
use of TSC values, so that they can correlate TSC values with linux
clock_gettime() values.

The patch applies to linux kernels v4.8 & v4.9.10 GIT tags and the
kernels build
and run & the demo program produces results like :
$ ./ttsc1
has tsc: 1 constant: 1
832 / 7 = 118 : 832 - 9.888914286E+04hz : OK:1
Hooray! TSC is enabled with KHz: 2893300
ts2 - ts1: 261 ts3 - ts2: 211 ns1: 0.000000146 ns2: 0.000001629
ts3 - ts2: 27 ns1: 0.000000168
ts3 - ts2: 20 ns1: 0.000000147
ts3 - ts2: 14 ns1: 0.000000152
ts3 - ts2: 15 ns1: 0.000000151
ts3 - ts2: 15 ns1: 0.000000153
ts3 - ts2: 15 ns1: 0.000000150
ts3 - ts2: 20 ns1: 0.000000148
ts3 - ts2: 19 ns1: 0.000000164
ts3 - ts2: 19 ns1: 0.000000164
ts3 - ts2: 19 ns1: 0.000000160
t1 - t0: 52901 - ns2: 0.000053951

The value 'ts3 - ts2' is the number of nanoseconds measured by
successive calls to
'rdtscp'; the 'ns1' value is the number of nanoseconds (shown as
decimal seconds)
measured by
clock_gettime(CLOCK_MONOTONIC_RAW, &ts2) -
clock_gettime(CLOCK_MONOTONIC_RAW, &ts1)
when casting each {ts.tv_sec, ts.tv_nsec} to a 128 bit long long integer .
It shows a user-space program can read the TSC with a latency of @20ns
but can only measure times >= @ 140ns using Linux clock_gettime() on this CPU.

Please make Linux provide some help to programs that want to use the
TSC from user-space by applying the patch so they can determine with
confidence that Linux
is supplying TSC values with a predictable conversion function in response to
clock_gettime(CLOCK_MONOTONIC_RAW,&ts) system calls. It would also
be nice if it actually exported the 'Refined TSC calibrated frequency'
(tsc_khz)
in sysfs , but that's another story.

Best Regards,
Jason

Attachment: x86_kernel_tsc-bz194609.patch
Description: Binary data