Re: [patch] a simple hardware detector for latency as well asthroughput ver. 0.1.0

From: Luming Yu
Date: Mon Jun 25 2012 - 09:20:53 EST


On Wed, Jun 13, 2012 at 11:07 PM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> Hi Luming,
>
> On Monday 28 May 2012, Luming Yu wrote:
>> The patch is the fist step to test some basic hardware functions like
>> TSC to help people understand if there is any hardware latency as well as
>> throughput problem exposed on bare metal or left behind by BIOS or
>> interfered by SM. Currently the patch tests hardware features
>> (tsc,freq, and rdrandom whiich is new instruction to get random
>> number) in stop_machine context. I will add more after the first step
>> get merged for those guys who want to directly play with new hardware
>> functions.
>>
>> I suppose I can add your signed-off-by as the code is derived from your
>> hwlat_dector.
>>
>> I'm also reuqesting if you are going to queue it up somewhere that can
>> be pulled into 3.5.
>>
>> Of cause, I will update the patch based upon any comments that you
>> think must be fixed for 3.5 merge.
>>
>> Signed-off-by Luming Yu <luming.yu@xxxxxxxxx>
>>
>>
>> ÂKconfig  |  Â7
>> ÂMakefile Â| Â Â2
>> Âhw_test.c | Â954
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Â3 files changed, 963 insertions(+)
>
> Please write the text in the email so that it can be used as a permanent
> changelog entry for the kernel git history. Everything that you do not
> want to have in that history can go under the "---" line that separates
> the Signed-off-by from the diffstat.

Done in patch update.

>
>> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
>> index c779509..f66ad56 100644
>> --- a/drivers/misc/Kconfig
>> +++ b/drivers/misc/Kconfig
>> @@ -123,6 +123,13 @@ config IBM_ASM
>> Â Â Â Â for information on the specific driver level and support statement
>> Â Â Â Â for your IBM server.
>>
>> +config HW_TEST
>> + Â Â tristate "Testing module to detect hardware lattency and throughput"
>> + Â Â depends on DEBUG_FS
>> + Â Â depends on RING_BUFFER
>> + Â Â depends on X86
>> + Â Â default m
>> +
>> Âconfig PHANTOM
>> Â Â Â tristate "Sensable PHANToM (PCI)"
>> Â Â Â depends on PCI
>
> Any reason why this depends on X86?
>
> I think the name "HW_TEST" is too generic. Maybe "HW_LATENCY_TEST"?

Done in update

>
>> +static struct ring_buffer *ring_buffer;
>> +static DEFINE_MUTEX(ring_buffer_mutex);
>> +static unsigned long buf_size = 262144UL;
>> +static struct task_struct *kthread;
>> +
>> +struct sample;
>> +struct data;
>> +
>> +struct sample_function {
>> + Â Â const char *name;
>> + Â Â struct list_head list;
>> + Â Â int (*get_sample)(void *unused);
>> +};
>
> why the "unused" argument?
It's a placeholder for future use although it's not used right now in
current tests.

>
>> +static struct sample_function *current_sample_func = NULL;
>> +static LIST_HEAD(sample_function_list);
>> +static DEFINE_MUTEX(sample_function_mutex);
>> +static int sample_function_register(struct sample_function *sf);
>> +
>> +static struct dentry *debug_dir;
>> +static struct dentry *debug_available;
>> +static struct dentry *debug_current;
>> +static struct dentry *debug_max;
>> +static struct dentry *debug_count;
>> +static struct dentry *debug_sample_width;
>> +static struct dentry *debug_sample_window;
>> +static struct dentry *debug_sample;
>> +static struct dentry *debug_threshold;
>> +static struct dentry *debug_enable;
>
> I think it would help here to put the local variables into a data structure
> that gets allocated at module load time.

I've reduced the code redundancy in update.

>
>> +static int __buffer_add_sample(struct sample *sample);
>> +static struct sample *buffer_get_sample(struct sample *sample);
>> +
> ...
>> +static ssize_t debug_enable_fwrite(struct file *filp,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â const char __user *user_buffer,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â size_t user_size, loff_t *offset);
>> +
>> +static int init_debugfs(void);
>> +static void free_debugfs(void);
>> +static int hw_test_init(void);
>> +static void hw_test_exit(void);
>
> Please reorder all the functions so that you no longer need forward
> declarations.

Done

>
>> +static int get_random_bytes_sample(void *unused)
>> +{
>> + Â Â u32 *buffer;
>> + Â Â ktime_t start, t1, t2;
>> + Â Â s64 Â Â diff, total = 0;
>> + Â Â u64 Â Â sample = 0;
>> +   int   ret = 1;
>> +
>> + Â Â buffer = kzalloc(1024, GFP_KERNEL);
>> +
>> + Â Â start = ktime_get();
>> + Â Â do {
>> +
>> + Â Â Â Â Â Â t1 = ktime_get();
>> + Â Â Â Â Â Â get_random_bytes(buffer, 1024);
>> + Â Â Â Â Â Â t2 = ktime_get();
>> + Â Â Â Â Â Â total = ktime_to_us(ktime_sub(t2, start));
>> + Â Â Â Â Â Â diff = ktime_to_us(ktime_sub(t2, t1));
>
> You need more comments to explain why you are doing things. For instance
> here: why is it a useful thing to test how long get_random_bytes()
> takes?

It is for testing new instruction rdrand for new cpus like Intel Ivy Bridge.

>
>> +static int get_freq_sample(void *unused)
>> +{
>> + Â Â ktime_t start, t1, t2;
>> + Â Â s64 Â Â diff, total = 0;
>> + Â Â u32 Â Â sample = 0;
>> +   int   ret = 1;
>> + Â Â unsigned int cpu_tsc_freq;
>> + Â Â static DEFINE_MUTEX(freq_pit_mutex);
>> +
>> + Â Â start = ktime_get();
>> + Â Â do {
>> + Â Â Â Â Â Â t1 = ktime_get();
>> + Â Â Â Â Â Â mutex_lock(&freq_pit_mutex);
>> + Â Â Â Â Â Â cpu_tsc_freq = x86_platform.calibrate_tsc();
>> + Â Â Â Â Â Â mutex_unlock(&freq_pit_mutex);
>
> Or the calibrate_tsc() function.
>
> It's also not clear what function the freq_pit_mutex has. I don't
> see how the code could ever get run twice at the same time.

The purpose is to sequentially test all cpu's frequency.

>
>> +static int kthread_fn(void *unused)
>> +{
>> + Â Â int err = 0;
>> + Â Â u64 interval = 0;
>> + Â Â int (*get_sample)(void *unused);
>> +
>> + Â Â mutex_lock(&sample_function_mutex);
>> + Â Â if (current_sample_func)
>> +static int hw_test_init(void)
>> +{
>> + Â Â int ret = -ENOMEM;
>> +
>> + Â Â printk(KERN_INFO BANNER "version %s\n", VERSION);
>> +
>> + Â Â sample_function_register(&tsc_sample);
>> + Â Â sample_function_register(&tsc_freq_sample);
>> + Â Â sample_function_register(&random_bytes_sample);
>> +
>> + Â Â ret = init_stats();
>> + Â Â if (0 != ret)
>> + Â Â Â Â Â Â goto out;
>> + Â Â ret = init_debugfs();
>> + Â Â if (0 != ret)
>> + Â Â Â Â Â Â goto err_stats;
>> + Â Â if (enabled)
>> + Â Â Â Â Â Â ret = start_kthread();
>> + Â Â goto out;
>> +
>> +err_stats:
>> + Â Â ring_buffer_free(ring_buffer);
>> +out:
>> + Â Â return ret;
>> +}
>> +
>> +static void hw_test_exit(void)
>> +{
>> + Â Â int err;
>> +
>> + Â Â if (enabled) {
>> + Â Â Â Â Â Â enabled = 0;
>> + Â Â Â Â Â Â err = stop_kthread();
>> + Â Â Â Â Â Â if (err)
>> + Â Â Â Â Â Â Â Â Â Â printk(KERN_ERR BANNER "cannot stop kthread\n");
>> + Â Â }
>> +
>> + Â Â free_debugfs();
>> + Â Â ring_buffer_free(ring_buffer);
>> +}
>> +
>> +module_init(hw_test_init);
>> +module_exit(hw_test_exit);
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at Âhttp://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at Âhttp://www.tux.org/lkml/
>>
>
>> + Â Â Â Â Â Â get_sample = current_sample_func->get_sample;
>> + Â Â else
>> + Â Â Â Â Â Â goto out;
>> +
>> + Â Â while (!kthread_should_stop()) {
>> + Â Â Â Â Â Â mutex_lock(&data.lock);
>> +
>> + Â Â Â Â Â Â err = stop_machine(get_sample, unused, cpu_online_mask);
>> + Â Â Â Â Â Â if (err) {
>> + Â Â Â Â Â Â Â Â Â Â mutex_unlock(&data.lock);
>> + Â Â Â Â Â Â Â Â Â Â goto err_out;
>> + Â Â Â Â Â Â }
>> +
>> + Â Â Â Â Â Â wake_up(&data.wq);
>
> Using stop_machine() sounds like a very expensive thing to do here.
> Is it necessary?

Yes, the only purpose of the tool is to measure various hardware
operation's latency
as well as bandwidth. We need the test done with least invasion from others.

>
>> + Â Â Â Â Â Â interval = data.sample_window - data.sample_width;
>> + Â Â Â Â Â Â do_div(interval, USEC_PER_MSEC);
>
> Have you tried avoiding the need for do_div? You could use an "unsigned long"
> to count the microseconds here, or you could do everything using miliseconds.

It's out the testing loops. I didn't notice it hurts testing results
so far. I will change it
in 0.1-> 0.2

>
>> +static int start_kthread(void)
>> +{
>> + Â Â kthread = kthread_run(kthread_fn, NULL, DRVNAME);
>> + Â Â if (IS_ERR(kthread)) {
>> + Â Â Â Â Â Â printk(KERN_ERR BANNER "could not start sampling thread\n");
>> + Â Â Â Â Â Â enabled = 0;
>> + Â Â Â Â Â Â return -ENOMEM;
>> + Â Â }
>> + Â Â return 0;
>> +}
>> +
>> +static int stop_kthread(void)
>> +{
>> + Â Â int ret;
>> + Â Â ret = kthread_stop(kthread);
>> + Â Â return ret;
>> +}
>
> These helper functions do not appear to help. Just remove them.
There is a data.lock encapsulated in the helper. I can't remove them
otherwise the data protection will get lost.

>
>> +static ssize_t simple_data_read(struct file *filp, char __user *ubuf,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â size_t cnt, loff_t *ppos, const u64 *entry)
>> +{
>> + Â Â char buf[U64STR_SIZE];
>> + Â Â u64 val = 0;
>> + Â Â int len = 0;
>> +
>> + Â Â memset(buf, 0, sizeof(buf));
>> + Â Â if (!entry)
>> + Â Â Â Â Â Â return -EFAULT;
>> + Â Â mutex_lock(&data.lock);
>> + Â Â val = *entry;
>> + Â Â mutex_unlock(&data.lock);
>> + Â Â len = snprintf(buf, sizeof(buf), "%llu\n", (unsigned long long)val);
>> + Â Â return simple_read_from_buffer(ubuf, cnt, ppos, buf, len);
>> +}
>> +
>> +static ssize_t simple_data_write(struct file *filp, const char __user *ubuf,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â size_t cnt, loff_t *ppos, u64 *entry)
>> +{
>> + Â Â char buf[U64STR_SIZE];
>> + Â Â int csize = min(cnt, sizeof(buf));
>> + Â Â u64 val = 0;
>> + Â Â int err = 0;
>> +
>> + Â Â memset(buf, '\0', sizeof(buf));
>> + Â Â if (copy_from_user(buf, ubuf, csize))
>> + Â Â Â Â Â Â return -EFAULT;
>> + Â Â buf[U64STR_SIZE-1] = '\0';
>> + Â Â err = strict_strtoull(buf, 10, &val);
>> + Â Â if (err)
>> + Â Â Â Â Â Â return -EINVAL;
>> + Â Â mutex_lock(&data.lock);
>> + Â Â *entry = val;
>> + Â Â mutex_unlock(&data.lock);
>> + Â Â return csize;
>> +}
>
> These look like you can use DEFINE_SIMPLE_ATTRIBUTE() or debugfs_create_u64()
> to simplify your code.

I've simplified syfs interface code in other ways. Basically use define magic.

>
>> +static int debug_available_fopen(struct inode *inode, struct file *filp)
>> +{
>> + Â Â return 0;
>> +}
>
> Just use simple_open() or no open function in the file_operations, there is
> no need to provide an empty function.

I've used define magic in patch update.
>
>> +static ssize_t debug_available_fwrite(struct file *filp, const char __user *ubuf,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â size_t Âcnt, loff_t *ppos)
>> +{
>> + Â Â return (ssize_t) 0;
>> +}
>
> Same for empty write functions.

All empty functions are gone.

>
>> +static int init_debugfs(void)
>> +{
>> + Â Â int ret = -ENOMEM;
>> +
>> + Â Â debug_dir = debugfs_create_dir(DRVNAME, NULL);
>> + Â Â if (!debug_dir)
>> + Â Â Â Â Â Â goto err_debug_dir;
>> +
>> + Â Â debug_available = debugfs_create_file("available", 0444,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â debug_dir, NULL,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â &available_fops);
>> + Â Â if (!debug_available)
>> + Â Â Â Â Â Â goto err_available;
>> +
>> + Â Â debug_current = debugfs_create_file("current", 0444,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â debug_dir, NULL,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â &current_fops);
>> + Â Â if (!debug_current)
>> + Â Â Â Â Â Â goto err_current;
>> +
>> + Â Â debug_sample = debugfs_create_file("sample", 0444,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â debug_dir, NULL,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â &sample_fops);
>> + Â Â if (!debug_sample)
>> + Â Â Â Â Â Â goto err_sample;
>> +
>> + Â Â debug_count = debugfs_create_file("count", 0444,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â debug_dir, NULL,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â &count_fops);
>
> Just use an array of file_operations and create the files using a loop.

Done in update patch.

>
>
> Â Â Â ÂArnd

Thanks for detailed review. An update patch is being prepared right
now. Will send in few minutes.

/l
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/