Re: [PATCH 0/8]: blk-mq: use static_rqs to iterate busy tags

From: Bart Van Assche
Date: Fri Mar 15 2019 - 12:19:09 EST


On Fri, 2019-03-15 at 17:44 +-0800, jianchao.wang wrote:
+AD4 On 3/15/19 5:20 PM, Christoph Hellwig wrote:
+AD4 +AD4 On Fri, Mar 15, 2019 at 04:57:36PM +-0800, Jianchao Wang wrote:
+AD4 +AD4 +AD4 Hi Jens
+AD4 +AD4 +AD4
+AD4 +AD4 +AD4 As we know, there is a risk of accesing stale requests when iterate
+AD4 +AD4 +AD4 in-flight requests with tags-+AD4-rqs+AFsAXQ and this has been talked in following
+AD4 +AD4 +AD4 thread,
+AD4 +AD4 +AD4 +AFs-1+AF0 https://urldefense.proofpoint.com/v2/url?u+AD0-https-3A+AF8AXw-marc.info+AF8--3Fl-3Dlinux-2Dscsi-26m-3D154511693912752-26w-3D2+ACY-d+AD0-DwICAg+ACY-c+AD0-RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI+AF8-JnE+ACY-r+AD0-7WdAxUBeiTUTCy8v-7zX
+AD4 +AD4 +AD4 yr4qk7sx26ATvfo6QSTvZyQ+ACY-m+AD0-CydqJPTf4FUrfs7ipUc2chm2jGuNuDVn+AF8-onIetKEehM+ACY-s+AD0-ZQ7RfO6-737-t5kQv7SFlXMhIdpwn+AF8-AxJI93d6c-nj0+ACY-e+AD0
+AD4 +AD4 +AD4 +AFs-2+AF0 https://urldefense.proofpoint.com/v2/url?u+AD0-https-3A+AF8AXw-marc.info+AF8--3Fl-3Dlinux-2Dblock-26m-3D154526189023236-26w-3D2+ACY-d+AD0-DwICAg+ACY-c+AD0-RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI+AF8-JnE+ACY-r+AD0-7WdAxUBeiTUTCy8v-7z
+AD4 +AD4 +AD4 Xyr4qk7sx26ATvfo6QSTvZyQ+ACY-m+AD0-CydqJPTf4FUrfs7ipUc2chm2jGuNuDVn+AF8-onIetKEehM+ACY-s+AD0-EBV1M5p4mE8jZ5ZD1ecU5kMbJ9EtbpVJoc7Tqolrsc8+ACY-e+AD0
+AD4 +AD4
+AD4 +AD4 I'd rather take one step back and figure out why we are iterating
+AD4 +AD4 the busy requests. There really shouldn't be any reason why a driver
+AD4 +AD4 is even doings that (vs some error handling helpers in the core
+AD4 +AD4 block code that can properly synchronize).
+AD4 +AD4
+AD4
+AD4 A typical scene is blk+AF8-mq+AF8-in+AF8-flight,
+AD4
+AD4 blk+AF8-mq+AF8-get+AF8-request blk+AF8-mq+AF8-in+AF8-flight
+AD4 -+AD4 blk+AF8-mq+AF8-get+AF8-tag -+AD4 blk+AF8-mq+AF8-queue+AF8-tag+AF8-busy+AF8-iter
+AD4 -+AD4 bt+AF8-for+AF8-each
+AD4 -+AD4 bt+AF8-iter
+AD4 -+AD4 rq +AD0 taags-+AD4-rqs+AFsAXQ
+AD4 -+AD4 rq-+AD4-q //---+AD4 get a stale request
+AD4 -+AD4 blk+AF8-mq+AF8-rq+AF8-ctx+AF8-init
+AD4 -+AD4 data-+AD4-hctx-+AD4-tags-+AD4-rqs+AFs-rq-+AD4-tag+AF0 +AD0 rq
+AD4
+AD4 This stale request maybe something that has been freed due to io scheduler
+AD4 is detached or a q using a shared tagset is gone.
+AD4
+AD4 And also the blk+AF8-mq+AF8-timeout+AF8-work could use it to pick up the expired request.
+AD4 The driver would also use it to requeue the in-flight requests when the device is dead.
+AD4
+AD4 Compared with adding more synchronization, using static+AF8-rqs+AFsAXQ directly maybe simpler :)

Hi Jianchao,

Although I appreciate your work: I agree with Christoph that we should avoid races
like this rather than modifying the block layer to make sure that such races are
handled safely.

Thanks,

Bart.