On Fri, Apr 13, 2018 at 03:07:18PM +0200, Steffen Maier wrote:
Complements v2.6.31 commit 55782138e47d ("tracing/events: convert block
trace points to TRACE_EVENT()") to be equivalent to traditional blktrace
output. Also this allows event filtering to not always get all (un)plug
events.
NB: The NULL pointer check for q->kobj.parent is certainly racy and
I don't have enough experience if it's good enough for a trace event.
The change did work for my cases (block device read/write I/O on
zfcp-attached SCSI disks and dm-mpath on top).
While I haven't seen any prior art using driver core (parent) relations
for trace events, there are other cases using this when no direct pointer
exists between objects, such as:
#define to_scsi_target(d) container_of(d, struct scsi_target, dev)
static inline struct scsi_target *scsi_target(struct scsi_device *sdev)
{
return to_scsi_target(sdev->sdev_gendev.parent);
}
That is because you "know" the parent of a target device is a
scsi_target.
This is the object model we make use of here:
struct gendisk {
struct hd_struct {
struct device { /*container_of*/
struct kobject kobj; <--+
dev_t devt; /*deref*/ |
} __dev; |
} part0; |
struct request_queue *queue; ..+ |
} : |
: |
struct request_queue { <..............+ |
/* queue kobject */ |
struct kobject { |
struct kobject *parent; --------+
Are you sure about this?
} kobj;
}
The difference to blktrace parsed output is that block events don't use the
partition's minor number but the containing block device's minor number:
Why do you want the block device's minor number here? What is wrong
with the partition's minor number? I would think you want that instead.
static void blk_add_trace_plug(void *ignore, struct request_queue *q)^^^^^^^^^^^^
{
struct blk_trace *bt = q->blk_trace;
^^^^^^^^^^^^
if (bt)
__blk_add_trace(bt, 0, 0, 0, 0, BLK_TA_PLUG, 0, 0, NULL, NULL);
}
static void blk_add_trace_unplug(void *ignore, struct request_queue *q,
unsigned int depth, bool explicit)
{
struct blk_trace *bt = q->blk_trace;
if (bt) {
__be64 rpdu = cpu_to_be64(depth);
u32 what;
if (explicit)
what = BLK_TA_UNPLUG_IO;
else
what = BLK_TA_UNPLUG_TIMER;
__blk_add_trace(bt, 0, 0, 0, 0, what, 0, sizeof(rpdu), &rpdu, NULL);
}
}
struct blk_trace {^^^
int trace_state;
struct rchan *rchan;
unsigned long __percpu *sequence;
unsigned char __percpu *msg_data;
u16 act_mask;
u64 start_lba;
u64 end_lba;
u32 pid;
u32 dev;
struct dentry *dir;
struct dentry *dropped_file;
struct dentry *msg_file;
struct list_head running_list;
atomic_t dropped;
};
$ dd if=/dev/sdf1 count=1
$ cat /sys/kernel/debug/tracing/trace
block_bio_remap: 8,80 R 2048 + 32 <- (8,81) 0
block_bio_queue: 8,80 R 2048 + 32 [dd]
block_getrq: 8,80 R 2048 + 32 [dd]
block_plug: 8,80 [dd]
^^^^
block_rq_insert: 8,80 R 16384 () 2048 + 32 [dd]
block_unplug: 8,80 [dd] 1 explicit
^^^^
block_rq_issue: 8,80 R 16384 () 2048 + 32 [dd]
block_rq_complete: 8,80 R () 2048 + 32 [0]
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index a13613d27cee..cffedc26e8a3 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -460,14 +460,18 @@ TRACE_EVENT(block_plug,
TP_ARGS(q),
TP_STRUCT__entry(
+ __field( dev_t, dev )
__array( char, comm, TASK_COMM_LEN )
),
TP_fast_assign(
+ __entry->dev = q->kobj.parent ?
+ container_of(q->kobj.parent, struct device, kobj)->devt : 0;
That really really really scares me. It feels very fragile and messing
with parent pointers is ripe for things breaking in the future in odd
and unexplainable ways.
And how can the parent be NULL?
I don't know the block layer but this feels very wrong to me. Are you
sure there isn't some other way to get this info?