[GIT PULL] Btrfs updates for 4.20, part 1

From: David Sterba
Date: Mon Oct 22 2018 - 13:22:26 EST


this is the first batch with fixes and some nice performance improvements.

Preliminary results show eg. more files/sec in fsmark, better perf on
multi-threaded workloads (filebench, dbench), fewer context switches and
overall better memory allocation characteristics (multiple benchmarks).

Apart from general performance, there's an improvement for qgroups +
balance workload that's been troubling our users.

Note for stable: there are 20+ patches tagged for stable, out of 90. Not
all of them apply cleanly on all stable versions but the conflicts are
mostly due to simple cleanups and resolving should be obvious. The fixes
are otherwise independent.

No merge conflicts expected. Please pull, thanks.

Performance improvements:

* blocking mode of path is gone, means that only the spinning mode is used;
the blocking resulted in more unnecessary wakeups and updates to the path
locks, the effects are measurable and improve latency and scaleability

* qgroups: first batch of changes that should speedup balancing with qgroups
on, skip quota accounting on unchanged subtrees, overall gain is about 30+%
in runtime

* use rb-tree with cached first node for several structures, small improvement
to avoid pointer chasing


* trim
* fix: some blockgroups could have been missed if their logical address was
past the total filesystem size (ie. after a lot of balancing)
* better error reporting, after processing blockgroups and whole device
* fix: continue trimming block groups after an error is encountered
* check for trim support of the device earlier and avoid some unnecessary work
* less interaction with transaction commit that improves latency on slower
storage (eg. image files over NFS)

* fsync
* fix warning when replaying log after fsync of a O_TMPFILE
* fix wrong dentries after fsync of file that got its parent replaced

* qgroups: fix rescan that might misc some dirty groups

* don't clean dirty pages during buffered writes, this could lead to lost
updates in some corner cases

* some block groups could have been delayed in creation, if the allocation
triggered another one

* error handling improvements


* removed unused struct members and variables
* function return type cleanups
* delayed refs code refactoring

* protect against deadlock that could be caused by crafted image that tries to
allocate from a tree that's locked already

The following changes since commit 35a7f35ad1b150ddf59a41dcac7b2fa32982be0e:

Linux 4.19-rc8 (2018-10-15 07:20:24 +0200)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-4.20-part1-tag

for you to fetch changes up to d9352794dad9f28535439d85a815978878c141ab:

btrfs: switch return_bigger to bool in find_ref_head (2018-10-15 17:23:41 +0200)

Anand Jain (2):
btrfs: add assertions where number of devices could go below 0
btrfs: add helper to obtain number of devices with ongoing dev-replace

Chris Mason (1):
Btrfs: don't clean dirty pages during buffered writes

Colin Ian King (2):
btrfs: remove unused pointer inode in relink_file_extents
btrfs: remove unused pointer 'tree' in btrfs_submit_compressed_read

David Sterba (12):
btrfs: tests: add separate stub for find_lock_delalloc_range
btrfs: tests: move testing members of struct btrfs_root to the end
btrfs: tests: group declarations of self-test helpers
btrfs: tests: polish ifdefs around testing helper
btrfs: use common helper instead of open coding a bit test
btrfs: remove btrfs_dev_replace::read_locks
btrfs: open code btrfs_dev_replace_clear_lock_blocking
btrfs: open code btrfs_dev_replace_stats_inc
btrfs: open code btrfs_after_dev_replace_commit
btrfs: dev-replace: avoid useless lock on error handling path
btrfs: dev-replace: move replace members out of fs_info
btrfs: dev-replace: remove pointless assert in write unlock

Filipe Manana (2):
Btrfs: fix warning when replaying log after fsync of a tmpfile
Btrfs: fix wrong dentries after fsync of file that got its parent replaced

Jeff Mahoney (5):
btrfs: fix error handling in free_log_tree
btrfs: fix error handling in btrfs_dev_replace_start
btrfs: iterate all devices during trim, instead of fs_devices::alloc_list
btrfs: don't attempt to trim devices that don't support it
btrfs: keep trim from interfering with transaction commits

Josef Bacik (7):
btrfs: wait on caching when putting the bg cache
btrfs: release metadata before running delayed refs
btrfs: protect space cache inode alloc with GFP_NOFS
btrfs: reset max_extent_size on clear in a bitmap
btrfs: make sure we create all new block groups
btrfs: assert on non-empty delayed iputs
btrfs: drop min_size from evict_refill_and_join

Liu Bo (19):
Btrfs: do not unnecessarily pass write_lock_level when processing leaf
Btrfs: remove always true if branch in btrfs_get_extent
Btrfs: use next_state in find_first_extent_bit
btrfs: free path at an earlier point in btrfs_get_extent
Btrfs: remove confusing tracepoint in btrfs_add_reserved_bytes
Btrfs: fix alignment in declaration and prototype of btrfs_get_extent
Btrfs: set leave_spinning in btrfs_get_extent
Btrfs: use args in the correct order for kcalloc in btrfsic_read_block
Btrfs: unify error handling of btrfs_lookup_dir_item
Btrfs: remove unnecessary level check in balance_level
Btrfs: assert page dirty bit on extent buffer pages
Btrfs: skip set_page_dirty if eb pages are already dirty
Btrfs: remove wait_ordered_range in btrfs_evict_inode
Btrfs: delayed-refs: use rb_first_cached for href_root
Btrfs: delayed-refs: use rb_first_cached for ref_tree
Btrfs: delayed-inode: use rb_first_cached for ins_root and del_root
Btrfs: extent_map: use rb_first_cached
Btrfs: preftree: use rb_first_cached
Btrfs: kill btrfs_clear_path_blocking

Lu Fengqi (10):
btrfs: simplify the send_in_progress check in btrfs_delete_subvolume
btrfs: switch update_size to bool in btrfs_block_rsv_migrate and btrfs_rsv_add_bytes
btrfs: Remove root parameter from btrfs_insert_dir_item
btrfs: remove a useless return statement in btrfs_block_rsv_add
btrfs: qgroup: move the qgroup->members check out from (!qgroup)'s else branch
btrfs: delayed-ref: pass delayed_refs directly to btrfs_select_ref_head
btrfs: delayed-ref: pass delayed_refs directly to btrfs_delayed_ref_lock
btrfs: remove fs_info from btrfs_check_space_for_delayed_refs
btrfs: remove fs_info from btrfs_should_throttle_delayed_refs
btrfs: switch return_bigger to bool in find_ref_head

Misono Tomohiro (2):
btrfs: Remove 'objectid' member from struct btrfs_root
btrfs: remove redundant variable from btrfs_cross_ref_exist

Nikolay Borisov (8):
btrfs: Make btrfs_find_device_by_path return struct btrfs_device
btrfs: Make btrfs_find_device_missing_or_by_path return directly a device
btrfs: Make btrfs_find_device_by_devspec return btrfs_device directly
btrfs: Remove logically dead code from btrfs_orphan_cleanup
btrfs: handle error of get_old_root
btrfs: Factor out ref head locking code in __btrfs_run_delayed_refs
btrfs: Factor out loop processing all refs of a head
btrfs: refactor __btrfs_run_delayed_refs loop

Omar Sandoval (2):
Btrfs: clean up scrub is_dev_replace parameter
Btrfs: get rid of btrfs_symlink_aops

Qu Wenruo (16):
btrfs: qgroup: Dirty all qgroups before rescan
btrfs: Handle owner mismatch gracefully when walking up tree
btrfs: locking: Add extra check in btrfs_init_new_buffer() to avoid deadlock
btrfs: Enhance btrfs_trim_fs function to handle error better
btrfs: Ensure btrfs_trim_fs can trim the whole filesystem
btrfs: relocation: Add basic extent backref related comments for build_backref_tree
btrfs: qgroup: Introduce trace event to analyse the number of dirty extents accounted
btrfs: qgroup: Introduce function to trace two swaped extents
btrfs: qgroup: Introduce function to find all new tree blocks of reloc tree
btrfs: qgroup: Use generation-aware subtree swap to mark dirty extents
btrfs: qgroup: Don't trace subtree if we're dropping reloc tree
btrfs: qgroup: Only trace data extents in leaves if we're relocating data block group
btrfs: tree-checker: Check level for leaves and nodes
btrfs: qgroup: Avoid calling qgroup functions if qgroup is not enabled
btrfs: relocation: Cleanup while loop using rbtree_postorder_for_each_entry_safe
btrfs: relocation: Remove redundant tree level check

Su Yue (1):
btrfs: defrag: use btrfs_mod_outstanding_extents in cluster_pages_for_defrag

zhong jiang (4):
btrfs: remove unneeded NULL checks before kfree
btrfs: change btrfs_free_reserved_bytes to return void
btrfs: change btrfs_pin_log_trans to return void
btrfs: change remove_extent_mapping to return void

fs/btrfs/backref.c | 39 ++--
fs/btrfs/btrfs_inode.h | 8 +-
fs/btrfs/check-integrity.c | 6 +-
fs/btrfs/compression.c | 2 -
fs/btrfs/ctree.c | 68 +-----
fs/btrfs/ctree.h | 56 ++---
fs/btrfs/delayed-inode.c | 41 ++--
fs/btrfs/delayed-inode.h | 4 +-
fs/btrfs/delayed-ref.c | 69 +++---
fs/btrfs/delayed-ref.h | 10 +-
fs/btrfs/dev-replace.c | 64 ++----
fs/btrfs/dev-replace.h | 8 -
fs/btrfs/dir-item.c | 8 +-
fs/btrfs/disk-io.c | 24 +-
fs/btrfs/export.c | 4 +-
fs/btrfs/extent-tree.c | 424 +++++++++++++++++++++--------------
fs/btrfs/extent_io.c | 33 ++-
fs/btrfs/extent_io.h | 4 +-
fs/btrfs/extent_map.c | 32 +--
fs/btrfs/extent_map.h | 4 +-
fs/btrfs/file.c | 33 ++-
fs/btrfs/free-space-cache.c | 16 +-
fs/btrfs/inode.c | 120 ++++------
fs/btrfs/ioctl.c | 18 +-
fs/btrfs/qgroup.c | 455 ++++++++++++++++++++++++++++++++++++--
fs/btrfs/qgroup.h | 8 +
fs/btrfs/ref-verify.c | 8 +-
fs/btrfs/relocation.c | 74 +++----
fs/btrfs/scrub.c | 34 ++-
fs/btrfs/send.c | 24 +-
fs/btrfs/super.c | 6 +-
fs/btrfs/tests/extent-io-tests.c | 10 +-
fs/btrfs/tests/extent-map-tests.c | 4 +-
fs/btrfs/transaction.c | 31 +--
fs/btrfs/tree-checker.c | 14 ++
fs/btrfs/tree-log.c | 86 +++++--
fs/btrfs/tree-log.h | 2 +-
fs/btrfs/volumes.c | 117 +++++-----
fs/btrfs/volumes.h | 9 +-
include/trace/events/btrfs.h | 36 ++-
40 files changed, 1268 insertions(+), 745 deletions(-)