[v6 PATCH] block: introduce block_rq_error tracepoint
From: Yang Shi
Date: Thu Feb 03 2022 - 15:12:14 EST
Currently, rasdaemon uses the existing tracepoint block_rq_complete
and filters out non-error cases in order to capture block disk errors.
But there are a few problems with this approach:
1. Even kernel trace filter could do the filtering work, there is
still some overhead after we enable this tracepoint.
2. The filter is merely based on errno, which does not align with kernel
logic to check the errors for print_req_error().
3. block_rq_complete only provides dev major and minor to identify
the block device, it is not convenient to use in user-space.
So introduce a new tracepoint block_rq_error just for the error case.
With this patch, rasdaemon could switch to block_rq_error.
Cc: Jens Axboe <axboe@xxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Cc: Chaitanya Kulkarni <chaitanyak@xxxxxxxxxx>
Reviewed-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
Signed-off-by: Cong Wang <xiyou.wangcong@xxxxxxxxx>
Signed-off-by: Yang Shi <shy828301@xxxxxxxxx>
---
The v3 patch was submitted in Feb 2020, and Steven reviewed the patch, but
it was not merged to upstream. See
https://lore.kernel.org/lkml/20200203053650.8923-1-xiyou.wangcong@xxxxxxxxx/.
The problems fixed by that patch still exist and we do need it to make
disk error handling in rasdaemon easier. So this resurrected it and
continued the version number.
v5 --> v6:
* Removed disk name per Christoph and Chaitanya
* Kept errno since I didn't find any other block tracepoints print blk
status code and userspace (i.e. rasdaemon) does expect errno.
v4 --> v5:
* Report the actual block layer status code instead of the errno per
Christoph Hellwig.
v3 --> v4:
* Rebased to v5.17-rc1.
* Collected reviewed-by tag from Steven.
block/blk-mq.c | 4 +++-
include/trace/events/block.h | 39 ++++++++++++++++++++++++++++++++++++
2 files changed, 42 insertions(+), 1 deletion(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f3bf3358a3bb..4ca72ea917d4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -789,8 +789,10 @@ bool blk_update_request(struct request *req, blk_status_t error,
#endif
if (unlikely(error && !blk_rq_is_passthrough(req) &&
- !(req->rq_flags & RQF_QUIET)))
+ !(req->rq_flags & RQF_QUIET))) {
+ trace_block_rq_error(req, error, nr_bytes);
blk_print_req_error(req, error);
+ }
blk_account_io_completion(req, nr_bytes);
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 27170e40e8c9..8c0bb06e16b8 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -144,6 +144,45 @@ TRACE_EVENT(block_rq_complete,
__entry->nr_sector, __entry->error)
);
+/**
+ * block_rq_error - block IO operation error reported by device driver
+ * @rq: block operations request
+ * @error: status code
+ * @nr_bytes: number of completed bytes
+ *
+ * The block_rq_error tracepoint event indicates that some portion
+ * of operation request has failed as reported by the device driver.
+ */
+TRACE_EVENT(block_rq_error,
+
+ TP_PROTO(struct request *rq, blk_status_t error, unsigned int nr_bytes),
+
+ TP_ARGS(rq, error, nr_bytes),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( sector_t, sector )
+ __field( unsigned int, nr_sector )
+ __field( int, error )
+ __array( char, rwbs, RWBS_LEN )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = rq->q->disk ? disk_devt(rq->q->disk) : 0;
+ __entry->sector = blk_rq_pos(rq);
+ __entry->nr_sector = nr_bytes >> 9;
+ __entry->error = blk_status_to_errno(error);
+
+ blk_fill_rwbs(__entry->rwbs, rq->cmd_flags);
+ ),
+
+ TP_printk("%d,%d %s %llu + %u [%d]",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->rwbs,
+ (unsigned long long)__entry->sector,
+ __entry->nr_sector, __entry->error)
+);
+
DECLARE_EVENT_CLASS(block_rq,
TP_PROTO(struct request *rq),
--
2.26.3