Re: [PATCH v2 5/5] btrfs: enable swap file support

From: David Sterba
Date: Mon Dec 01 2014 - 13:18:36 EST

On Mon, Nov 24, 2014 at 02:03:02PM -0800, Omar Sandoval wrote:
> Alright, I took a look at this. My understanding is that a PREALLOC extent
> represents a region on disk that has already been allocated but isn't in use
> yet, but please correct me if I'm wrong. Judging by this comment in
> btrfs_get_blocks_direct, we don't have to worry about PREALLOC extents in
> general:
> /*
> * We don't allocate a new extent in the following cases
> *
> * 1) The inode is marked as NODATACOW. In this case we'll just use the
> * existing extent.
> * 2) The extent is marked as PREALLOC. We're good to go here and can
> * just use the extent.
> *
> */

Ok, thanks for checking.

> A couple of other considerations that cropped up:
> - btrfs_get_extent does a bunch of extra work if the extent is not cached in
> the extent map tree that would be nice to avoid when swapping
> - We might still have to do a COW if the swap file is in a snapshot
> We can avoid the btrfs_get_extent by pinning the extents in memory one way or
> another in btrfs_swap_activate.

That's preferrable, the overhead should be small.

> The snapshot issue is a little tricker to resolve. I see a few options:
> 1. Just do the COW and hope for the best

Snapshotting of NODATACOW files work, the extents are COWed on the first
write, the original owner sees no change.

> 2. As part of btrfs_swap_activate, COW any shared extents. If a snapshot
> happens while a swap file is active, we'll fall back to 1.

So we should make sure that any write to the swapfile will not lead to
the first-COW, this would require to scan all the extents and see if
we're the owner and COW eventually.

Doing that automatically is IMO better from the user perspective,
compared to an erroring out and requesting a manual "dd" over the file.

Possibly, the implied COW could create preallocated extents so we do not
have to rewrite the data.

> 3. Clobber any swap file extents which are in a snapshot, i.e., always use the
> existing extent.

Easy to implement but could lead to bad suprises.

More thoughts:

There are some guards in the generic code to prevent unwanted
modifications to the swapfiles (eg. no truncate, no fallocate, no
deletion). Further we should audit btrfs ioctls that may interfere with
swap and forbid any action (notably the cloning and deduplication ones).
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at