Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband

From: Li Zefan
Date: Tue May 12 2009 - 04:10:09 EST


CC more people on tracing issue

Ryo Tsuruta wrote:
> Hi Li,
>
> From: Li Zefan <lizf@xxxxxxxxxxxxxx>
> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband
> Date: Tue, 12 May 2009 11:49:07 +0800
>
>> Ryo Tsuruta wrote:
>>> Hi Li,
>>>
>>> From: Ryo Tsuruta <ryov@xxxxxxxxxxxxx>
>>> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband
>>> Date: Thu, 07 May 2009 09:23:22 +0900 (JST)
>>>
>>>> Hi Li,
>>>>
>>>> From: Li Zefan <lizf@xxxxxxxxxxxxxx>
>>>> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband
>>>> Date: Mon, 04 May 2009 11:24:27 +0800
>>>>
>>>>> Ryo Tsuruta wrote:
>>>>>> Hi Alan,
>>>>>>
>>>>>>> Hi Ryo -
>>>>>>>
>>>>>>> I don't know if you are taking in patches, but whilst trying to uncover
>>>>>>> some odd behavior I added some blktrace messages to dm-ioband-ctl.c. If
>>>>>>> you're keeping one code base for old stuff (2.6.18-ish RHEL stuff) and
>>>>>>> upstream you'll have to #if around these (the blktrace message stuff
>>>>>>> came in around 2.6.26 or 27 I think).
>>>>>>>
>>>>>>> My test case was to take a single 400GB storage device, put two 200GB
>>>>>>> partitions on it and then see what the "penalty" or overhead for adding
>>>>>>> dm-ioband on top. To do this I simply created an ext2 FS on each
>>>>>>> partition in parallel (two processes each doing a mkfs to one of the
>>>>>>> partitions). Then I put two dm-ioband devices on top of the two
>>>>>>> partitions (setting the weight to 100 in both cases - thus they should
>>>>>>> have equal access).
>>>>>>>
>>>>>>> Using default values I was seeing /very/ large differences - on the
>>>>>>> order of 3X. When I bumped the number of tokens to a large number
>>>>>>> (10,240) the timings got much closer (<2%). I have found that using
>>>>>>> weight-iosize performs worse than weight (closer to 5% penalty).
>>>>>> I could reproduce similar results. One dm-ioband device seems to stop
>>>>>> issuing I/Os for a few seconds at times. I'll investigate more on that.
>>>>>>
>>>>>>> I'll try to formalize these results as I go forward and report out on
>>>>>>> them. In any event, I thought I'd share this patch with you if you are
>>>>>>> interested...
>>>>>> Thanks. I'll include your patche into the next release.
>>>>>>
>>>>> IMO we should use TRACE_EVENT instead of adding new blk_add_trace_msg().
>>>> Thanks for your suggestion. I'll use TRACE_EVENT instead.
>>> blk_add_trace_msg() supports both blktrace and tracepoints. I can
>>> get messages from dm-ioband through debugfs. Could you expain why
>>> should we use TRACE_EVENT instead?
>>>
>> Actually blk_add_trace_msg() has nothing to do with tracepoints..
>>
>> If we use blk_add_trace_msg() is dm, we can use it in md, various block
>> drivers and even ext4. So the right thing is, if a subsystem wants to add
>> trace facility, it should use tracepoints/TRACE_EVENT.
>>
>> With TRACE_EVENT, you can get output through debugfs too, and it can be used
>> together with blktrace:
>>
>> # echo 1 > /sys/block/dm/trace/enable
>> # echo blk > /debugfs/tracing/current_tracer
>> # echo dm-ioband-foo > /debugfs/tracing/tracing/set_event
>> # cat /deubgfs/tracing/trace_pipe
>>
>> And you can enable dm-ioband-foo while disabling dm-ioband-bar, and you can
>> use filter feature too.
>
> Thanks for explaining.
> The base kernel of current dm tree (2.6.30-rc4) has not supported
> dm-device tracing yet. I'll consider using TRACE_EVENT when the base
> kernel supports dm-device tracing.
>

I think we don't have any plan on dm-device tracing support, do we? And you
don't need that.

I downloaded dm-ioband 1.10.4, and applied it to latest tip-tree, and made
a patch to convert those blktrace msgs to trace events.

# ls /debug/tracing/events/dm-ioband
enable ioband_endio ioband_issue_list ioband_urgent_bio
filter ioband_hold_bio ioband_make_request
# echo 1 /debug/tracing/events/dm-ioband/enable
# echo blk /debug/tracing/current_tracer
...

Here is a sample output:

pdflush-237 [001] 1706.377788: ioband_hold_bio: 8,9 ioband 1 hold_nrm 1
pdflush-237 [001] 1706.377795: 254,0 Q W 188832 + 8 [pdflush]
pdflush-237 [001] 1706.377800: ioband_hold_bio: 8,9 ioband 1 hold_nrm 2
pdflush-237 [001] 1706.377806: 254,0 Q W 197032 + 8 [pdflush]
pdflush-237 [001] 1706.377835: 254,0 C W 82232 + 8 [0]
kioband/1-2569 [001] 1706.377922: ioband_issue_list: 8,9 ioband 1 add_iss 1 1
kioband/1-2569 [001] 1706.377923: ioband_issue_list: 8,9 ioband 1 add_iss 0 0
kioband/1-2569 [001] 1706.377925: ioband_make_request: 8,9 ioband 1 pop_iss
kioband/1-2569 [001] 1706.377940: ioband_make_request: 8,9 ioband 1 pop_iss
konsole-2140 [001] 1706.377969: 254,0 C W 90432 + 8 [0]
konsole-2140 [001] 1706.378087: 254,0 C W 98632 + 8 [0]
konsole-2140 [001] 1706.378232: 254,0 C W 106832 + 8 [0]
konsole-2140 [001] 1706.378350: 254,0 C W 115032 + 8 [0]
konsole-2140 [001] 1706.378466: 254,0 C W 123232 + 8 [0]
konsole-2140 [001] 1706.378581: 254,0 C W 131432 + 8 [0]

---
drivers/md/dm-ioband-ctl.c | 19 +++++-
include/trace/events/dm-ioband.h | 134 ++++++++++++++++++++++++++++++++++++++
2 files changed, 150 insertions(+), 3 deletions(-)
create mode 100644 include/trace/events/dm-ioband.h

diff --git a/drivers/md/dm-ioband-ctl.c b/drivers/md/dm-ioband-ctl.c
index 151b178..4033724 100644
--- a/drivers/md/dm-ioband-ctl.c
+++ b/drivers/md/dm-ioband-ctl.c
@@ -17,6 +17,9 @@
#include "md.h"
#include "dm-ioband.h"

+#define CREATE_TRACE_POINTS
+#include <trace/events/dm-ioband.h>
+
#define POLICY_PARAM_START 6
#define POLICY_PARAM_DELIM "=:,"

@@ -632,9 +635,11 @@ static void hold_bio(struct ioband_group *gp, struct bio *bio)
*/
dp->g_prepare_bio(gp, bio, IOBAND_URGENT);
bio_list_add(&dp->g_urgent_bios, bio);
+ trace_ioband_hold_bio(bio, dp, true);
} else {
gp->c_blocked++;
dp->g_hold_bio(gp, bio);
+ trace_ioband_hold_bio(bio, dp, false);
}
}

@@ -675,14 +680,17 @@ static int make_issue_list(struct ioband_group *gp, struct bio *bio,
clear_group_blocked(gp);
wake_up_all(&gp->c_waitq);
}
- if (should_pushback_bio(gp))
+ if (should_pushback_bio(gp)) {
bio_list_add(pushback_list, bio);
+ trace_ioband_issue_list(bio, dp, gp, true);
+ }
else {
int rw = bio_data_dir(bio);

gp->c_stat[rw].deferred++;
gp->c_stat[rw].sectors += bio_sectors(bio);
bio_list_add(issue_list, bio);
+ trace_ioband_issue_list(bio, dp, gp, false);
}
return prepare_to_issue(gp, bio);
}
@@ -702,6 +710,7 @@ static void release_urgent_bios(struct ioband_device *dp,
dp->g_blocked--;
dp->g_issued[bio_data_dir(bio)]++;
bio_list_add(issue_list, bio);
+ trace_ioband_urgent_bio(bio, dp);
}
}

@@ -915,10 +924,14 @@ static void ioband_conduct(struct work_struct *work)

spin_unlock_irqrestore(&dp->g_lock, flags);

- while ((bio = bio_list_pop(&issue_list)))
+ while ((bio = bio_list_pop(&issue_list))) {
+ trace_ioband_make_request(bio, dp);
generic_make_request(bio);
- while ((bio = bio_list_pop(&pushback_list)))
+ }
+ while ((bio = bio_list_pop(&pushback_list))) {
+ trace_ioband_endio(bio, dp);
bio_endio(bio, -EIO);
+ }
}

static int ioband_end_io(struct dm_target *ti, struct bio *bio,
diff --git a/include/trace/events/dm-ioband.h b/include/trace/events/dm-ioband.h
new file mode 100644
index 0000000..17dd61c
--- /dev/null
+++ b/include/trace/events/dm-ioband.h
@@ -0,0 +1,134 @@
+#if !defined(_TRACE_DM_IOBAND_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_DM_IOBAND_H
+
+#include <linux/tracepoint.h>
+#include <linux/interrupt.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM dm-ioband
+
+TRACE_EVENT(ioband_hold_bio,
+
+ TP_PROTO(struct bio *bio, struct ioband_device *dp, bool urg),
+
+ TP_ARGS(bio, dp, urg),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( bool, urg )
+ __field( int, g_blocked )
+ __string( g_name, dp->g_name )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = bio->bi_bdev->bd_dev;
+ __entry->urg = urg;
+ __entry->g_blocked = dp->g_blocked;
+ __assign_str(g_name, dp->g_name);
+ ),
+
+ TP_printk("%d,%d ioband %s hold_%s %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __get_str(g_name),
+ __entry->urg ? "urg" : "nrm", __entry->g_blocked)
+);
+
+TRACE_EVENT(ioband_issue_list,
+
+ TP_PROTO(struct bio *bio, struct ioband_device *dp,
+ struct ioband_group *gp, bool pushback),
+
+ TP_ARGS(bio, dp, gp, pushback),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( bool, pushback )
+ __field( int, g_blocked )
+ __field( int, c_blocked )
+ __string( g_name, dp->g_name )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = bio->bi_bdev->bd_dev;
+ __entry->pushback = pushback;
+ __entry->g_blocked = dp->g_blocked;
+ __entry->c_blocked = gp->c_blocked;
+ __assign_str(g_name, dp->g_name);
+ ),
+
+ TP_printk("%d,%d ioband %s add_%s %d %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __get_str(g_name),
+ __entry->pushback ? "pback" : "iss",
+ __entry->g_blocked, __entry->c_blocked)
+);
+
+TRACE_EVENT(ioband_urgent_bio,
+
+ TP_PROTO(struct bio *bio, struct ioband_device *dp),
+
+ TP_ARGS(bio, dp),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( int, g_blocked )
+ __string( g_name, dp->g_name )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = bio->bi_bdev->bd_dev;
+ __entry->g_blocked = dp->g_blocked;
+ __assign_str(g_name, dp->g_name);
+ ),
+
+ TP_printk("%d,%d ioband %s urg_add_iss %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __get_str(g_name), __entry->g_blocked)
+);
+
+TRACE_EVENT(ioband_make_request,
+
+ TP_PROTO(struct bio *bio, struct ioband_device *dp),
+
+ TP_ARGS(bio, dp),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __string( g_name, dp->g_name )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = bio->bi_bdev->bd_dev;
+ __assign_str(g_name, dp->g_name);
+ ),
+
+ TP_printk("%d,%d ioband %s pop_iss",
+ MAJOR(__entry->dev), MINOR(__entry->dev), __get_str(g_name))
+);
+
+TRACE_EVENT(ioband_endio,
+
+ TP_PROTO(struct bio *bio, struct ioband_device *dp),
+
+ TP_ARGS(bio, dp),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __string( g_name, dp->g_name )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = bio->bi_bdev->bd_dev;
+ __assign_str(g_name, dp->g_name);
+ ),
+
+ TP_printk("%d,%d ioband %s pop_pback",
+ MAJOR(__entry->dev), MINOR(__entry->dev), __get_str(g_name))
+);
+
+#endif /* _TRACE_DM_IOBAND_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
+
+
--
1.5.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/