Re: After unlinking a large file on ext4, the process stalls for a long time

From: John Stoffel
Date: Wed Jul 16 2014 - 11:16:23 EST



Mason> I'm using Linux (3.1.10 at the moment) on a embedded system
Mason> similar in spec to a desktop PC from 15 years ago (256 MB RAM,
Mason> 800-MHz CPU, USB).

Sounds like a Raspberry Pi... And have you investigated using
something like XFS as your filesystem instead?

Mason> I need to be able to create large files (50-1000 GB) "as fast
Mason> as possible". These files are created on an external hard disk
Mason> drive, connected over Hi-Speed USB (typical throughput 30
Mason> MB/s).

Really... so you just need to create allocations of space as quickly
as possible, which will then be filled in later with actuall data? So
basically someone will say "give me 600G of space reservation" and
then will eventually fill it up, otherwise you say "Nope, can't do
it!"

Mason> Sparse files were not an acceptable solution (because the space
Mason> must be reserved, and the operation must fail if the space is
Mason> unavailable). And filling the file with zeros was too slow
Mason> (typically 35 s/GB).

Mason> Someone mentioned fallocate on an ext4 partition.

Mason> So I create an ext4 partition with
Mason> $ mkfs.ext4 -m 0 -i 1024000 -O ^has_journal,^huge_file /dev/sda1
Mason> (Using e2fsprogs-1.42.10 if it matters)

Mason> And mount with "typical" mount options
Mason> $ mount -t ext4 /dev/sda1 /mnt/hdd -o noexec,noatime
Mason> /dev/sda1 on /mnt/hdd type ext4 (rw,noexec,noatime,barrier=1)

Mason> I wrote a small test program to create a large file, then immediately
Mason> unlink it.

Mason> My problem is that, while file creation is "fast enough" (4 seconds
Mason> for a 300 GB file) and unlink is "immediate", the process hangs
Mason> while it waits (I suppose) for the OS to actually complete the
Mason> operation (almost two minutes for a 300 GB file).

Mason> I also note that the (weak) CPU is pegged, so perhaps this problem
Mason> does not occur on a desktop workstation?

Mason> /tmp # time ./foo /mnt/hdd/xxx 5
Mason> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [68 ms]
Mason> unlink(filename): 0 [0 ms]
Mason> 0.00user 1.86system 0:01.92elapsed 97%CPU (0avgtext+0avgdata 528maxresident)k
Mason> 0inputs+0outputs (0major+168minor)pagefaults 0swaps

Mason> /tmp # time ./foo /mnt/hdd/xxx 10
Mason> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [141 ms]
Mason> unlink(filename): 0 [0 ms]
Mason> 0.00user 3.71system 0:03.83elapsed 96%CPU (0avgtext+0avgdata 528maxresident)k
Mason> 0inputs+0outputs (0major+168minor)pagefaults 0swaps

Mason> /tmp # time ./foo /mnt/hdd/xxx 100
Mason> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [1882 ms]
Mason> unlink(filename): 0 [0 ms]
Mason> 0.00user 37.12system 0:38.93elapsed 95%CPU (0avgtext+0avgdata 528maxresident)k
Mason> 0inputs+0outputs (0major+168minor)pagefaults 0swaps

Mason> /tmp # time ./foo /mnt/hdd/xxx 300
Mason> posix_fallocate(fd, 0, size_in_GiB << 30): 0 [3883 ms]
Mason> unlink(filename): 0 [0 ms]
Mason> 0.00user 111.38system 1:55.04elapsed 96%CPU (0avgtext+0avgdata 528maxresident)k
Mason> 0inputs+0outputs (0major+168minor)pagefaults 0swaps


Mason> QUESTIONS:

Mason> 1) Did I provide enough information for someone to reproduce?

Sure, but you didn't give enough information to explain what you're
trying to accomplish here. And what the use case is. Also, since you
know you cannot fill 500Gb in any sorta of reasonable time over USB2,
why are you concerned that the delete takes so long?

I think that maybe using the filesystem for the reservations is the
wrong approach. You should use a simple daemon which listens for
requests, and then checks the filesystem space and decides if it can
honor them or not.

Then you just store the files as they get writen...

Mason> 2) Is this expected behavior?

Sure, unlinking a 1Gb file that's been written too means (on EXT4)
that you need to update all the filesystem structures. Now it should
be quicker honestly, but maybe you're not mounting it with a journal?
And have you tried tuning the filesystem to use larger allocations and
blocks? You're not going to make alot of files on there obviously,
but just a few large ones.

Mason> 3) Are there knobs I can tweak (at FS creation, or at mount
Mason> time) to improve the performance of file unlinking? (Maybe
Mason> there is a safety/performance trade-off?

Sure, there are all kinds of things you can do. For example, how
many of these files are you expecting to store? Will you have to be
able to handle writing of more than one file at a time? Or are they
purely sequential?

If you are creating a small embedded system to manage a bunch of USB2
hard drives and write data to them with a space reservation process,
then you need to make sure you can actually handle the data throughput
requirements. And I'm not sure you can.

John
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/