Re: [PATCH 3/3] wait: avoid spurious calls to prepare_to_wait_event() in ___wait_event()
From: Mateusz Guzik
Date: Tue Mar 04 2025 - 10:45:48 EST
On Tue, Mar 4, 2025 at 3:19 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Mar 04, 2025 at 12:04:09AM +0100, Mateusz Guzik wrote:
> > In vast majority of cases the condition determining whether the thread
> > can proceed is true after the first wake up.
> >
> > However, even in that case the thread ends up calling into
> > prepare_to_wait_event() again, suffering a spurious irq + lock trip.
> >
> > Then it calls into finish_wait() to unlink itself.
> >
> > Note that in case of a pending signal the work done by
> > prepare_to_wait_event() gets ignored even without the change.
> >
> > pre-check the condition after waking up instead.
> >
> > Stats gathared during a kernel build:
> > bpftrace -e 'kprobe:prepare_to_wait_event,kprobe:finish_wait \
> > { @[probe] = count(); }'
> >
> > @[kprobe:finish_wait]: 392483
> > @[kprobe:prepare_to_wait_event]: 778690
> >
> > As in calls to prepare_to_wait_event() almost double calls to
> > finish_wait(). This evens out with the patch.
> >
> > Signed-off-by: Mateusz Guzik <mjguzik@xxxxxxxxx>
> > ---
> >
> > One may worry about using "condition" twice. However, macros leading up
> > to this one already do it, so it should be fine.
> >
> > Also one may wonder about fences -- to my understanding going off and on
> > CPU guarantees a full fence, so the now avoided lock trip changes
> > nothing.
>
> so it always helps if you provide actual numbers. Supposedly this makes
> it go faster?
>
> Also, how much bytes does it add to the image?
>
> Anyway, no real objection, but it would be good to have better
> substantiation etc.
>
I did not bother benchmarking this per se, but I did demonstrate with
bpftrace above that this does avoid the irq and lock usage, which is
normally the desired outcome.
It is a fair point that in this case this is a tradeoff with i-cache footprint.
bloat-o-meter claims:
add/remove: 2/2 grow/shrink: 159/5 up/down: 3865/-250 (3615)
[..]
Total: Before=29946099, After=29949714, chg +0.01%
The biggest growth, by a large margin:
will_read_block.part - 183 +183
fuse_get_req 800 924 +124
load_module 9137 9255 +118
fuse_do_readfolio 415 528 +113
fuse_page_mkwrite 157 259 +102
On the other hand the shrinkage:
port_fops_read 504 503 -1
__ps2_command 1402 1401 -1
fcntl_setlk 961 947 -14
__pfx_fuse_wait_on_folio_writeback 16 - -16
load_compressed_image 4007 3974 -33
uprobe_notify_resume 3384 3323 -61
fuse_wait_on_folio_writeback 124 - -124
The entire thing is attached.
--
Mateusz Guzik <mjguzik gmail.com>
add/remove: 2/2 grow/shrink: 159/5 up/down: 3865/-250 (3615)
Function old new delta
will_read_block.part - 183 +183
fuse_get_req 800 924 +124
load_module 9137 9255 +118
fuse_do_readfolio 415 528 +113
fuse_page_mkwrite 157 259 +102
wait_port_writable 545 642 +97
ksm_scan_thread 8189 8270 +81
sg_ioctl 2839 2918 +79
fuse_wait_on_folio_writeback.part 202 280 +78
ring_buffer_wait 431 500 +69
__bio_queue_enter 551 617 +66
drm_wait_vblank_ioctl 1692 1756 +64
blk_queue_enter 490 554 +64
add_transaction_credits 700 762 +62
khugepaged 2197 2256 +59
sg_open 1663 1721 +58
sg_read 1523 1579 +56
fuse_dev_do_read 2405 2459 +54
joydev_read 1106 1158 +52
btrfs_wait_block_group_cache_progress 370 420 +50
syslog_print 709 757 +48
usb_wait_anchor_empty_timeout 235 280 +45
mmu_interval_notifier_remove 412 456 +44
ec_guard 510 554 +44
kcompactd 961 1004 +43
wait_for_tpm_stat 669 711 +42
psi_rtpoll_worker 906 948 +42
anon_pipe_write 1581 1621 +40
ksgxd 357 394 +37
btrfs_commit_transaction 3598 3634 +36
scsi_block_when_processing_errors 200 234 +34
save_compressed_image 2695 2728 +33
virtio_gpu_init 2249 2281 +32
regmap_async_complete.part 413 444 +31
fuse_wait_on_page_writeback 185 216 +31
fanotify_handle_event 6056 6087 +31
crtc_crc_read 718 749 +31
acpi_ec_stop 418 448 +30
fuse_readahead 930 959 +29
pipe_wait_writable 234 262 +28
evdev_read 643 671 +28
ps2_drain 322 349 +27
__tpm_tis_request_locality 466 492 +26
throttle_direct_reclaim 626 651 +25
ps2_do_sendbyte 539 563 +24
drm_atomic_helper_wait_for_vblanks.part 538 562 +24
devkmsg_read 467 491 +24
set_termios.part 637 660 +23
fuse_wait_aborted 252 275 +23
btrfs_caching_ctl_wait_done 177 200 +23
fuse_file_cached_io_open 377 399 +22
btrfs_scrub_pause 233 255 +22
virtballoon_free_page_report 219 240 +21
locks_lock_inode_wait 357 378 +21
usbhid_wait_io 238 258 +20
serio_raw_read 521 541 +20
request_wait_answer 519 539 +20
btrfs_pause_balance 264 284 +20
blk_mq_freeze_queue_wait_timeout 213 233 +20
anon_pipe_read 970 990 +20
shpc_write_cmd 656 675 +19
mousedev_read 457 476 +19
btrfs_commit_transaction_async 253 272 +19
tell_host 226 244 +18
oom_killer_disable 281 299 +18
virtgpu_virtio_get_uuid 200 217 +17
pipe_wait_readable 228 245 +17
async_synchronize_cookie_domain 285 302 +17
jbd2_journal_destroy 768 784 +16
decompress_threadfn 323 339 +16
crc32_threadfn 317 333 +16
compress_threadfn 344 360 +16
__pfx_will_read_block.part - 16 +16
wait_current_trans 296 311 +15
pciehp_sysfs_enable_slot 403 418 +15
pciehp_sysfs_disable_slot 392 407 +15
module_patient_check_exists.isra 313 328 +15
btrfs_balance_delayed_items 436 450 +14
genl_unregister_family 480 493 +13
btrfs_rm_dev_replace_blocked 159 172 +13
pci_dpc_recovered 212 224 +12
jbd2_log_wait_commit 269 281 +12
fuse_sync_fs 569 581 +12
fuse_set_nowrite 212 224 +12
flush_scrub_stripes 615 627 +12
drm_vblank_work_flush_all 217 229 +12
drm_read 678 690 +12
blk_mq_freeze_queue_wait 150 162 +12
rmw_rbio 2673 2684 +11
pseudo_lock_measure_trigger 740 751 +11
percpu_ref_switch_to_atomic_sync 186 197 +11
flush_space 1395 1406 +11
btrfs_bio_counter_inc_blocked 256 267 +11
__usermodehelper_disable 308 319 +11
wait_scrub_stripe_io 138 148 +10
wait_for_device_probe 148 158 +10
usbdev_ioctl 5231 5241 +10
synchronize_rcu_expedited 459 469 +10
serial_core_unregister_port 739 749 +10
pci_doe_wait.constprop 193 203 +10
oom_reaper 973 983 +10
kernfs_drain 305 315 +10
kauditd_thread 677 687 +10
jbd2_journal_lock_updates 210 220 +10
jbd2_journal_load 891 901 +10
iosf_mbi_punit_acquire 184 194 +10
iosf_mbi_block_punit_i2c_access 818 828 +10
fsnotify_destroy_group 213 223 +10
expand_files 667 677 +10
exp_funnel_lock 503 513 +10
dm_wait_event 157 167 +10
cpuset_handle_hotplug 3241 3251 +10
btrfs_wait_on_delayed_iputs 160 170 +10
btrfs_scrub_cancel_dev 255 265 +10
btrfs_scrub_cancel 252 262 +10
btrfs_remount_begin 221 231 +10
btrfs_drew_write_lock 199 209 +10
__synchronize_irq 148 158 +10
__scrub_blocked_if_needed 185 195 +10
virtio_gpu_vram_mmap 484 493 +9
virtio_gpu_get_caps_ioctl 555 564 +9
try_flush_qgroup 275 284 +9
start_this_handle 1243 1252 +9
scrub_raid56_parity_stripe 1328 1337 +9
pcie_wait_cmd 568 577 +9
pci_wait_cfg 170 179 +9
ns_revision_read 394 403 +9
mmu_interval_read_begin 190 199 +9
drm_vblank_work_flush 195 204 +9
dquot_disable 1596 1605 +9
dm_kcopyd_client_destroy 353 362 +9
cpufreq_freq_transition_begin 333 342 +9
cppc_set_perf 885 894 +9
btrfs_cancel_balance 339 348 +9
wake_up_and_wait_for_irq_thread_ready 156 164 +8
squashfs_cache_get 799 807 +8
scrub_rbio_work_locked 2250 2258 +8
rdtgroup_pseudo_lock_create 1725 1733 +8
drm_wait_one_vblank 473 481 +8
close_ctree 1258 1266 +8
btrfs_write_dirty_block_groups 910 918 +8
btrfs_start_ordered_extent_nowriteback 379 387 +8
btrfs_cleanup_transaction.isra 1373 1381 +8
autofs_wait 1763 1771 +8
wait_for_random_bytes 235 242 +7
virtio_gpu_cursor_ping 647 654 +7
usb_poison_urb 176 183 +7
usb_kill_urb.part 149 156 +7
rcu_sync_enter 251 258 +7
io_sqd_handle_event 305 312 +7
btrfs_drew_read_lock 136 143 +7
__vt_event_wait.part 122 129 +7
__percpu_ref_switch_mode 458 465 +7
wb_wait_for_completion 130 136 +6
hib_wait_io 131 137 +6
do_coredump 6667 6673 +6
xfs_pwork_poll 186 191 +5
virtio_gpu_queue_ctrl_sgs 648 653 +5
tty_wait_until_sent 429 434 +5
submit_read_wait_bio_list 373 377 +4
wait_for_commit 401 402 +1
port_fops_read 504 503 -1
__ps2_command 1402 1401 -1
fcntl_setlk 961 947 -14
__pfx_fuse_wait_on_folio_writeback 16 - -16
load_compressed_image 4007 3974 -33
uprobe_notify_resume 3384 3323 -61
fuse_wait_on_folio_writeback 124 - -124
Total: Before=29946099, After=29949714, chg +0.01%