Re: [patch] voluntary-preempt-2.6.9-rc1-bk4-R1

From: K.R. Foley
Date: Fri Sep 10 2004 - 13:05:08 EST


K.R. Foley wrote:
Ingo Molnar wrote:

* Mark_H_Johnson@xxxxxxxxxxxx <Mark_H_Johnson@xxxxxxxxxxxx> wrote:


If you look at the date / time of the traces, you will notice that
most occur in the latter part of the test. This is during the "disk
copy" and "disk read" parts of the testing. [...]



would it be possible to test with DMA disabled? (hdparm -d0 /dev/hda) It
might take some extra work to shun the extra latency reports from the
PIO IDE path (which is quite slow) but once that is done you should be
able to see whether these long 0.5 msec delays remain even if all (most)
DMA activity has been eliminated.


preemption latency trace v1.0.5 on 2.6.9-rc1-VP-R1
--------------------------------------------------
latency: 550 us, entries: 6 (6)
-----------------
| task: cat/6771, uid:0 nice:0 policy:0 rt_prio:0
-----------------
=> started at: kmap_atomic+0x23/0xe0
=> ended at: kunmap_atomic+0x7b/0xa0
=======>
00000001 0.000ms (+0.000ms): kmap_atomic (file_read_actor)
00000001 0.000ms (+0.000ms): page_address (file_read_actor)
00000001 0.000ms (+0.549ms): __copy_to_user_ll (file_read_actor)
00000001 0.550ms (+0.000ms): kunmap_atomic (file_read_actor)
00000001 0.550ms (+0.000ms): sub_preempt_count (kunmap_atomic)
00000001 0.550ms (+0.000ms): update_max_trace (check_preempt_timing)



this is a full page copy, from userspace into a kernelspace pagecache
page. This shouldnt take 500 usecs on any hardware. Since this is a
single instruction (memcpy's rep; movsl instruction) there's nothing
that Linux can do to avoid (or even to cause) such a situation.


I saw this one (or one very similar) on a system that I just started testing on some today. Not quite as high (~219 usec if I remember correctly). I don't have access to the system from here. I will forward the trace tomorrow when I'm there tomorrow. However, I haven't seen this on my slower system running the same stress tests. There are several possible points of interest:

System I saw this on:
P4 2.4GHz or 3.0GHz
2GB memory
2.6.9-rc1-bk12-S0 built for SMP (even though hyperthreading is off currently)

Actual system info for above is 2.4GHz with 512 memory.


System I haven't seen this on:
PII 450
256MB memory
2.6.9-rc1-bk12-R6 built for UP

Sorry I don't have more complete data in front of me. I will send the concrete info tomorrow with the trace.


As promised here is the trace:

http://www.cybsft.com/testresults/2.6.9-rc1-bk12-S0/trace2.txt

I also have some other traces from this system that I have not seen before on my slower system. For instance this one where we spend ~204 usec in __spin_lock_irqsave:

http://www.cybsft.com/testresults/2.6.9-rc1-bk12-S0/trace1.txt

Or this one where we spend ~203 usec in sched_clock. That just doesn't seem possible.

http://www.cybsft.com/testresults/2.6.9-rc1-bk12-S0/trace4.txt

kr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/