Re: [PATCH 2/3] sbitmap: ammortize cost of clearing bits

From: Guenter Roeck
Date: Sun Dec 09 2018 - 00:53:45 EST


Hi,

On Thu, Nov 29, 2018 at 06:12:33PM -0700, Jens Axboe wrote:
> sbitmap maintains a set of words that we use to set and clear bits, with
> each bit representing a tag for blk-mq. Even though we spread the bits
> out and maintain a hint cache, one particular bit allocated will end up
> being cleared in the exact same spot.
>
> This introduces batched clearing of bits. Instead of clearing a given
> bit, the same bit is set in a cleared/free mask instead. If we fail
> allocating a bit from a given word, then we check the free mask, and
> batch move those cleared bits at that time. This trades 64 atomic bitops
> for 2 cmpxchg().
>
> In a threaded poll test case, half the overhead of getting and clearing
> tags is removed with this change. On another poll test case with a
> single thread, performance is unchanged.
>
> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>

This patch results in irq lock inversion warnings when trying to boot
from usb drives. This was observed with qemu boots of aarch64, x86, and
x86_64 images.

Log output as well as bisect results are attached below. I ran the bisect
twice, once with aarch64 and once with x86_64, with the same results.

Unfortunately, reverting the patch results in a compile error, so I was
unable to test if reverting the patch fixes the problem.

Sample qemu command line (x86_64):

qemu-system-x86_64 \
-kernel arch/x86/boot/bzImage -M q35 -cpu SandyBridge \
-no-reboot -snapshot -smp 4 -m 1G \
-usb -device usb-storage,drive=d0 \
-drive file=rootfs.ext2,if=none,id=d0,format=raw \
--append 'root=/dev/sda rw rootwait panic=-1 console=ttyS0 console=tty' \
-nographic

Guenter

---
...
[ 19.035262] Waiting for root device /dev/sda...
[ 19.139135] scsi 6:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
[ 19.161632] sd 6:0:0:0: Attached scsi generic sg1 type 0
[ 19.171661] sd 6:0:0:0: Power-on or device reset occurred
[ 19.186664] sd 6:0:0:0: [sda] 20480 512-byte logical blocks: (10.5 MB/10.0 MiB)
[ 19.197035] sd 6:0:0:0: [sda] Write Protect is off
[ 19.209327] sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 19.330020] sd 6:0:0:0: [sda] Attached SCSI disk
[ 19.363210] EXT4-fs (sda): mounting ext2 file system using the ext4 subsystem
[ 19.392150]
[ 19.394410] ========================================================
[ 19.397051] WARNING: possible irq lock inversion dependency detected
[ 19.399744] 4.20.0-rc5-next-20181207 #1 Not tainted
[ 19.402294] --------------------------------------------------------
[ 19.404981] swapper/3/0 just changed the state of lock:
[ 19.407185] (____ptrval____) (&sbq->ws[i].wait){..-.}, at: __wake_up_common_lock+0x5e/0xb0
[ 19.409500] but this lock took another, SOFTIRQ-unsafe lock in the past:
[ 19.411650] (&(&sb->map[i].swap_lock)->rlock){+.+.}
[ 19.411663]
[ 19.411663]
[ 19.411663] and interrupts could create inverse lock ordering between them.
[ 19.411663]
[ 19.421461]
[ 19.421461] other info that might help us debug this:
[ 19.425187] Chain exists of:
[ 19.425187] &sbq->ws[i].wait --> &(&hctx->dispatch_wait_lock)->rlock --> &(&sb->map[i].swap_lock)->rlock
[ 19.425187]
[ 19.432974] Possible interrupt unsafe locking scenario:
[ 19.432974]
[ 19.436954] CPU0 CPU1
[ 19.439002] ---- ----
[ 19.440897] lock(&(&sb->map[i].swap_lock)->rlock);
[ 19.442828] local_irq_disable();
[ 19.444767] lock(&sbq->ws[i].wait);
[ 19.446767] lock(&(&hctx->dispatch_wait_lock)->rlock);
[ 19.448716] <Interrupt>
[ 19.450574] lock(&sbq->ws[i].wait);
[ 19.452330]
[ 19.452330] *** DEADLOCK ***
[ 19.452330]
[ 19.456951] no locks held by swapper/3/0.
[ 19.458648]
[ 19.458648] the shortest dependencies between 2nd lock and 1st lock:
[ 19.461928] -> (&(&sb->map[i].swap_lock)->rlock){+.+.} ops: 38 {
[ 19.463648] HARDIRQ-ON-W at:
[ 19.465233] _raw_spin_lock+0x27/0x40
[ 19.466837] sbitmap_get+0x8b/0x160
[ 19.468490] __sbitmap_queue_get+0x24/0x90
[ 19.469988] blk_mq_get_tag+0x28a/0x2c0
[ 19.471455] blk_mq_get_driver_tag+0x7a/0x100
[ 19.473051] blk_mq_dispatch_rq_list+0x95/0x550
[ 19.474653] blk_mq_do_dispatch_sched+0x71/0x110
[ 19.476137] blk_mq_sched_dispatch_requests+0x114/0x160
[ 19.477712] __blk_mq_run_hw_queue+0x67/0xf0
[ 19.479282] __blk_mq_delay_run_hw_queue+0x150/0x170
[ 19.480830] blk_mq_run_hw_queue+0x83/0xe0
[ 19.482302] blk_mq_sched_insert_request+0x14a/0x1c0
[ 19.483868] blk_execute_rq+0x64/0x90
[ 19.485396] __scsi_execute+0x104/0x250
[ 19.486779] sd_revalidate_disk+0xe7/0x1cf0
[ 19.488184] sd_probe_async+0xae/0x194
[ 19.489598] async_run_entry_fn+0x34/0x100
[ 19.491025] process_one_work+0x237/0x5d0
[ 19.493611] worker_thread+0x37/0x380
[ 19.496588] kthread+0x118/0x130
[ 19.498871] ret_from_fork+0x3a/0x50
[ 19.501009] SOFTIRQ-ON-W at:
[ 19.502991] _raw_spin_lock+0x27/0x40
[ 19.505302] sbitmap_get+0x8b/0x160
[ 19.507375] __sbitmap_queue_get+0x24/0x90
[ 19.509440] blk_mq_get_tag+0x28a/0x2c0
[ 19.510859] blk_mq_get_driver_tag+0x7a/0x100
[ 19.512334] blk_mq_dispatch_rq_list+0x95/0x550
[ 19.513960] blk_mq_do_dispatch_sched+0x71/0x110
[ 19.515388] blk_mq_sched_dispatch_requests+0x114/0x160
[ 19.516909] __blk_mq_run_hw_queue+0x67/0xf0
[ 19.518433] __blk_mq_delay_run_hw_queue+0x150/0x170
[ 19.519831] blk_mq_run_hw_queue+0x83/0xe0
[ 19.521232] blk_mq_sched_insert_request+0x14a/0x1c0
[ 19.522638] blk_execute_rq+0x64/0x90
[ 19.524015] __scsi_execute+0x104/0x250
[ 19.525393] sd_revalidate_disk+0xe7/0x1cf0
[ 19.526795] sd_probe_async+0xae/0x194
[ 19.528213] async_run_entry_fn+0x34/0x100
[ 19.529810] process_one_work+0x237/0x5d0
[ 19.531178] worker_thread+0x37/0x380
[ 19.532548] kthread+0x118/0x130
[ 19.533892] ret_from_fork+0x3a/0x50
[ 19.535232] INITIAL USE at:
[ 19.536504] _raw_spin_lock+0x27/0x40
[ 19.537939] sbitmap_get+0x8b/0x160
[ 19.539411] __sbitmap_queue_get+0x24/0x90
[ 19.540782] blk_mq_get_tag+0x28a/0x2c0
[ 19.542134] blk_mq_get_driver_tag+0x7a/0x100
[ 19.543495] blk_mq_dispatch_rq_list+0x95/0x550
[ 19.544925] blk_mq_do_dispatch_sched+0x71/0x110
[ 19.546312] blk_mq_sched_dispatch_requests+0x114/0x160
[ 19.547712] __blk_mq_run_hw_queue+0x67/0xf0
[ 19.549097] __blk_mq_delay_run_hw_queue+0x150/0x170
[ 19.550692] blk_mq_run_hw_queue+0x83/0xe0
[ 19.552086] blk_mq_sched_insert_request+0x14a/0x1c0
[ 19.553494] blk_execute_rq+0x64/0x90
[ 19.554862] __scsi_execute+0x104/0x250
[ 19.556233] sd_revalidate_disk+0xe7/0x1cf0
[ 19.557621] sd_probe_async+0xae/0x194
[ 19.558993] async_run_entry_fn+0x34/0x100
[ 19.560383] process_one_work+0x237/0x5d0
[ 19.561751] worker_thread+0x37/0x380
[ 19.563267] kthread+0x118/0x130
[ 19.564715] ret_from_fork+0x3a/0x50
[ 19.566050] }
[ 19.567296] ... key at: [<ffffffffa2f27468>] __key.23877+0x0/0x8
[ 19.568691] ... acquired at:
[ 19.569986] sbitmap_get+0x8b/0x160
[ 19.571279] __sbitmap_queue_get+0x24/0x90
[ 19.572619] blk_mq_get_tag+0x28a/0x2c0
[ 19.573957] blk_mq_get_driver_tag+0xda/0x100
[ 19.575308] blk_mq_dispatch_rq_list+0x2f6/0x550
[ 19.576643] blk_mq_do_dispatch_sched+0x71/0x110
[ 19.577968] blk_mq_sched_dispatch_requests+0x114/0x160
[ 19.579328] __blk_mq_run_hw_queue+0x67/0xf0
[ 19.580671] __blk_mq_delay_run_hw_queue+0x150/0x170
[ 19.582047] blk_mq_run_hw_queue+0x83/0xe0
[ 19.583517] blk_mq_sched_insert_request+0x14a/0x1c0
[ 19.584932] blk_execute_rq+0x64/0x90
[ 19.586256] __scsi_execute+0x104/0x250
[ 19.587617] scsi_probe_and_add_lun+0x1b5/0xc00
[ 19.589020] __scsi_scan_target+0x1f9/0x530
[ 19.590367] scsi_scan_channel.part.10+0x51/0x70
[ 19.591742] scsi_scan_host_selected+0xc9/0x140
[ 19.593200] scsi_scan_host+0x18f/0x1d0
[ 19.594621] usb_stor_scan_dwork+0x1a/0x80
[ 19.595943] process_one_work+0x237/0x5d0
[ 19.597271] worker_thread+0x37/0x380
[ 19.598566] kthread+0x118/0x130
[ 19.599845] ret_from_fork+0x3a/0x50
[ 19.601128]
[ 19.602384] -> (&(&hctx->dispatch_wait_lock)->rlock){....} ops: 1 {
[ 19.603964] INITIAL USE at:
[ 19.605271] _raw_spin_lock+0x27/0x40
[ 19.606670] blk_mq_dispatch_rq_list+0x270/0x550
[ 19.608277] blk_mq_do_dispatch_sched+0x71/0x110
[ 19.609808] blk_mq_sched_dispatch_requests+0x114/0x160
[ 19.611239] __blk_mq_run_hw_queue+0x67/0xf0
[ 19.612650] __blk_mq_delay_run_hw_queue+0x150/0x170
[ 19.614080] blk_mq_run_hw_queue+0x83/0xe0
[ 19.615497] blk_mq_sched_insert_request+0x14a/0x1c0
[ 19.616949] blk_execute_rq+0x64/0x90
[ 19.618374] __scsi_execute+0x104/0x250
[ 19.619787] scsi_probe_and_add_lun+0x1b5/0xc00
[ 19.621226] __scsi_scan_target+0x1f9/0x530
[ 19.622666] scsi_scan_channel.part.10+0x51/0x70
[ 19.624117] scsi_scan_host_selected+0xc9/0x140
[ 19.625557] scsi_scan_host+0x18f/0x1d0
[ 19.626992] usb_stor_scan_dwork+0x1a/0x80
[ 19.628441] process_one_work+0x237/0x5d0
[ 19.629832] worker_thread+0x37/0x380
[ 19.631235] kthread+0x118/0x130
[ 19.632724] ret_from_fork+0x3a/0x50
[ 19.634252] }
[ 19.635499] ... key at: [<ffffffffa2f256a0>] __key.49100+0x0/0x8
[ 19.636925] ... acquired at:
[ 19.638255] blk_mq_dispatch_rq_list+0x270/0x550
[ 19.639655] blk_mq_do_dispatch_sched+0x71/0x110
[ 19.641076] blk_mq_sched_dispatch_requests+0x114/0x160
[ 19.642517] __blk_mq_run_hw_queue+0x67/0xf0
[ 19.643961] __blk_mq_delay_run_hw_queue+0x150/0x170
[ 19.645424] blk_mq_run_hw_queue+0x83/0xe0
[ 19.646831] blk_mq_sched_insert_request+0x14a/0x1c0
[ 19.648265] blk_execute_rq+0x64/0x90
[ 19.649634] __scsi_execute+0x104/0x250
[ 19.650983] scsi_probe_and_add_lun+0x1b5/0xc00
[ 19.652360] __scsi_scan_target+0x1f9/0x530
[ 19.653808] scsi_scan_channel.part.10+0x51/0x70
[ 19.655364] scsi_scan_host_selected+0xc9/0x140
[ 19.656751] scsi_scan_host+0x18f/0x1d0
[ 19.658086] usb_stor_scan_dwork+0x1a/0x80
[ 19.659416] process_one_work+0x237/0x5d0
[ 19.660750] worker_thread+0x37/0x380
[ 19.662057] kthread+0x118/0x130
[ 19.663353] ret_from_fork+0x3a/0x50
[ 19.664649]
[ 19.665860] -> (&sbq->ws[i].wait){..-.} ops: 3 {
[ 19.667186] IN-SOFTIRQ-W at:
[ 19.668474] _raw_spin_lock_irqsave+0x30/0x40
[ 19.669878] __wake_up_common_lock+0x5e/0xb0
[ 19.671278] __sbq_wake_up+0x8d/0xa0
[ 19.672650] sbitmap_queue_clear+0x3b/0x60
[ 19.674036] __blk_mq_free_request+0x64/0x90
[ 19.675438] scsi_end_request+0x109/0x350
[ 19.676911] scsi_io_completion+0x72/0x600
[ 19.678477] blk_done_softirq+0x9a/0xd0
[ 19.679847] __do_softirq+0xc8/0x455
[ 19.681212] irq_exit+0xcc/0xe0
[ 19.682543] call_function_single_interrupt+0xf/0x20
[ 19.683941] default_idle+0x19/0x160
[ 19.685311] do_idle+0x1d3/0x250
[ 19.686643] cpu_startup_entry+0x14/0x20
[ 19.688000] start_secondary+0x18f/0x1c0
[ 19.689443] secondary_startup_64+0xa4/0xb0
[ 19.690983] INITIAL USE at:
[ 19.693258] _raw_spin_lock_irq+0x2d/0x40
[ 19.696185] blk_mq_dispatch_rq_list+0x25c/0x550
[ 19.699217] blk_mq_do_dispatch_sched+0x71/0x110
[ 19.702076] blk_mq_sched_dispatch_requests+0x114/0x160
[ 19.704737] __blk_mq_run_hw_queue+0x67/0xf0
[ 19.707182] __blk_mq_delay_run_hw_queue+0x150/0x170
[ 19.709412] blk_mq_run_hw_queue+0x83/0xe0
[ 19.710880] blk_mq_sched_insert_request+0x14a/0x1c0
[ 19.712307] blk_execute_rq+0x64/0x90
[ 19.713697] __scsi_execute+0x104/0x250
[ 19.715083] scsi_probe_and_add_lun+0x1b5/0xc00
[ 19.716574] __scsi_scan_target+0x1f9/0x530
[ 19.718206] scsi_scan_channel.part.10+0x51/0x70
[ 19.719641] scsi_scan_host_selected+0xc9/0x140
[ 19.721184] scsi_scan_host+0x18f/0x1d0
[ 19.722697] usb_stor_scan_dwork+0x1a/0x80
[ 19.724077] process_one_work+0x237/0x5d0
[ 19.725460] worker_thread+0x37/0x380
[ 19.726820] kthread+0x118/0x130
[ 19.728167] ret_from_fork+0x3a/0x50
[ 19.729508] }
[ 19.730752] ... key at: [<ffffffffa2f27460>] __key.24146+0x0/0x8
[ 19.732169] ... acquired at:
[ 19.733509] __lock_acquire+0x8c6/0x1880
[ 19.734865] lock_acquire+0xaa/0x180
[ 19.736210] _raw_spin_lock_irqsave+0x30/0x40
[ 19.737564] __wake_up_common_lock+0x5e/0xb0
[ 19.738927] __sbq_wake_up+0x8d/0xa0
[ 19.740276] sbitmap_queue_clear+0x3b/0x60
[ 19.741635] __blk_mq_free_request+0x64/0x90
[ 19.743008] scsi_end_request+0x109/0x350
[ 19.744361] scsi_io_completion+0x72/0x600
[ 19.745690] blk_done_softirq+0x9a/0xd0
[ 19.747017] __do_softirq+0xc8/0x455
[ 19.748342] irq_exit+0xcc/0xe0
[ 19.749637] call_function_single_interrupt+0xf/0x20
[ 19.750979] default_idle+0x19/0x160
[ 19.752318] do_idle+0x1d3/0x250
[ 19.753758] cpu_startup_entry+0x14/0x20
[ 19.755129] start_secondary+0x18f/0x1c0
[ 19.756470] secondary_startup_64+0xa4/0xb0
[ 19.757954]
[ 19.759225]
[ 19.759225] stack backtrace:
[ 19.761899] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.20.0-rc5-next-20181207 #1
[ 19.763359] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[ 19.765024] Call Trace:
[ 19.766589] <IRQ>
[ 19.768074] dump_stack+0x67/0x90
[ 19.769580] check_usage_forwards.cold.54+0x20/0x29
[ 19.771137] mark_lock+0x2ca/0x620
[ 19.772705] ? check_usage_backwards+0x110/0x110
[ 19.774300] __lock_acquire+0x8c6/0x1880
[ 19.775676] ? select_task_rq_fair+0x1ca/0x1110
[ 19.777062] ? __lock_acquire+0x38a/0x1880
[ 19.778424] ? __lock_acquire+0x38a/0x1880
[ 19.779751] lock_acquire+0xaa/0x180
[ 19.781070] ? __wake_up_common_lock+0x5e/0xb0
[ 19.782422] _raw_spin_lock_irqsave+0x30/0x40
[ 19.783766] ? __wake_up_common_lock+0x5e/0xb0
[ 19.785129] __wake_up_common_lock+0x5e/0xb0
[ 19.786483] __sbq_wake_up+0x8d/0xa0
[ 19.787809] sbitmap_queue_clear+0x3b/0x60
[ 19.789143] __blk_mq_free_request+0x64/0x90
[ 19.790631] scsi_end_request+0x109/0x350
[ 19.792150] scsi_io_completion+0x72/0x600
[ 19.793671] blk_done_softirq+0x9a/0xd0
[ 19.795121] __do_softirq+0xc8/0x455
[ 19.796441] irq_exit+0xcc/0xe0
[ 19.797730] call_function_single_interrupt+0xf/0x20
[ 19.799153] </IRQ>
[ 19.800515] RIP: 0010:default_idle+0x19/0x160
[ 19.801932] Code: ff 48 89 ea 48 89 df 31 f6 5b 5d e9 31 73 91 ff 90 41 55 41 54 55 53 65 8b 2d a3 dd e4 5e 66 66 66 66 90 e8 b9 1f 59 ff fb f4 <65> 8b 2d 90 dd e4 5e 66 66 66 66 90 5b 5d 41 5c 41 5d c3 65 8b 05
[ 19.806839] RSP: 0000:ffffb6f74020bec0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff04
[ 19.808531] RAX: ffff97563f2bc080 RBX: 0000000000000003 RCX: 0000000000000000
[ 19.810312] RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffff97563f2bc080
[ 19.811925] RBP: 0000000000000003 R08: 0000000000000001 R09: 0000000000000000
[ 19.813555] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 19.815169] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 19.817002] do_idle+0x1d3/0x250
[ 19.818604] cpu_startup_entry+0x14/0x20
[ 19.820125] start_secondary+0x18f/0x1c0
[ 19.821635] secondary_startup_64+0xa4/0xb0
[ 19.856824] EXT4-fs (sda): mounted filesystem without journal. Opts: (null)
[ 19.860450] VFS: Mounted root (ext2 filesystem) on device 8:0.
[ 19.870456] devtmpfs: mounted
[ 19.975778] Freeing unused kernel image memory: 1348K
[ 19.987597] Write protecting the kernel read-only data: 22528k
[ 19.999208] Freeing unused kernel image memory: 2008K
[ 20.007277] Freeing unused kernel image memory: 1824K
[ 20.010030] rodata_test: all tests were successful
[ 20.012948] Run /sbin/init as init process
[ 20.640528] EXT4-fs (sda): re-mounted. Opts: block_validity,barrier,user_xattr,acl
[ 21.570798] random: dd: uninitialized urandom read (512 bytes read)
[ 21.621870] dd (1296) used greatest stack depth: 12728 bytes left

---
# bad: [74c4a24df7cac1f9213a811d79558ecde23be9a2] Add linux-next specific files for 20181207
# good: [2595646791c319cadfdbf271563aac97d0843dc7] Linux 4.20-rc5
git bisect start 'HEAD' 'v4.20-rc5'
# good: [8d285caa38663cc22f7166f7cfb9ceb9d06d3c7e] Merge remote-tracking branch 'ipsec-next/master'
git bisect good 8d285caa38663cc22f7166f7cfb9ceb9d06d3c7e
# bad: [a8147fd1777cecc0a0cc57fcd742a2b06660bd12] Merge remote-tracking branch 'backlight/for-backlight-next'
git bisect bad a8147fd1777cecc0a0cc57fcd742a2b06660bd12
# good: [467e8a516dcf922d1ea343cebb0e751f81f0dca3] Merge tag 'drm-intel-next-2018-12-04' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
git bisect good 467e8a516dcf922d1ea343cebb0e751f81f0dca3
# good: [25f68bda094fd6a808d1284abc23b194b0668a47] Merge remote-tracking branch 'sound/for-next'
git bisect good 25f68bda094fd6a808d1284abc23b194b0668a47
# bad: [81b6aa39f339c7ea9fe318580c168e55c5a7f593] Merge branch 'for-4.21/block' into for-next
git bisect bad 81b6aa39f339c7ea9fe318580c168e55c5a7f593
# good: [b6676f653f13f83582985bc713525a48d735b2a3] block: remove a few unused exports
git bisect good b6676f653f13f83582985bc713525a48d735b2a3
# good: [80ff2040ac3dd6d9d68d59159b0a6dffd8adc5ff] ataflop: implement mq_ops->commit_rqs() hook
git bisect good 80ff2040ac3dd6d9d68d59159b0a6dffd8adc5ff
# bad: [79eb7650f7f33f4e63a2380f00fc2fac651da7c1] Merge branch 'for-4.21/block' into for-next
git bisect bad 79eb7650f7f33f4e63a2380f00fc2fac651da7c1
# bad: [6322307809649cba6f545640563f95d686ecf404] nvme-pci: cleanup SQ allocation a bit
git bisect bad 6322307809649cba6f545640563f95d686ecf404
# bad: [2149da0748fc236b9916c53e26b3b0c9ab20a5dd] block: add cmd_flags to print_req_error
git bisect bad 2149da0748fc236b9916c53e26b3b0c9ab20a5dd
# good: [27fae429acee1e9418059e7fa545438075af5256] sbitmap: don't loop for find_next_zero_bit() for !round_robin
git bisect good 27fae429acee1e9418059e7fa545438075af5256
# bad: [ea86ea2cdced20057da4d2c32965c1219c238197] sbitmap: ammortize cost of clearing bits
git bisect bad ea86ea2cdced20057da4d2c32965c1219c238197
# good: [531724abc3bfb556c1dd68086cf9cb51f76464e3] block: avoid extra bio reference for async O_DIRECT
git bisect good 531724abc3bfb556c1dd68086cf9cb51f76464e3
# first bad commit: [ea86ea2cdced20057da4d2c32965c1219c238197] sbitmap: ammortize cost of clearing bits