Re: [PATCH v2] tmpfs: fix race between umount and writepage

From: Konstantin Khlebnikov
Date: Sun May 08 2011 - 08:51:35 EST


Hugh Dickins wrote:
On Sat, 7 May 2011, Konstantin Khlebnikov wrote:
Hugh Dickins wrote:

Here's the patch I was testing last night, but I do want to test it
some more (I've not even tried your unmounting case yet), and I do want
to make some changes to it (some comments, and see if I can move the
mem_cgroup_cache_charge outside of the mutex, making it GFP_KERNEL
rather than GFP_NOFS - at the time that mem_cgroup charging went in,
we did not know here if it was actually a shmem swap page, whereas
nowadays we can be sure, since that's noted in the swap_map).

In shmem_unuse_inode I'm widening the shmem_swaplist_mutex to protect
against shmem_evict_inode; and in shmem_writepage adding to the list
earlier, while holding lock on page still in pagecache to protect it.

But testing last night showed corruption on this laptop (no problem
on other machines): I'm guessing it's unrelated, but I can't be sure
of that without more extended testing.

This patch fixed my problem, I didn't catch any crashes on my test-case:
swapout-unmount.

Thank you, Konstantin, for testing that and reporting back.

I tried using your script on Thursday, but couldn't get the tuning right
for this machine: with numbers too big everything would go OOM, with
numbers too small it wouldn't even go to swap, with numbers on the edge
it would soon settle into a steady state with almost nothing in swap.

Just once, without the patch, I did get to "Self-destruct in 5 seconds",
but that was not reproducible enough for me to test that the patch would
be fixing anything.

I was going to try today on other machines with more cpus and more memory,
though not as much as yours; but now I'll let your report save me the time,
and just add your Tested-by. Big thank you for that!

Besides adding comments, I have changed the patch around since then, at
the shmem_unuse_inode end: to avoid any memory allocation while holding
the mutex (and then we no longer need to drop and retake info->lock,
so it gets a little simpler). It would be dishonest of me to claim your
Tested-by for the changed code (and your mount/write/umount loop would
not have been testing swapoff): since it is an independent fix with a
different justification, I'll split that part off into a 2/3.

Ok, I can test final patch-set on the next week.
Also I can try to add some swapoff test-cases.


3/3 being the fix to the "corruption" I noticed while testing, corruption
being on the filesystem I had on /dev/loop0, over a tmpfs file filling its
filesystem: when I wrote, I'd missed the "I/O" errors in /var/log/messages.

It was another case of a long-standing but largely theoretical race,
now made easily reproducible by recent changes (the preallocation in
between find_lock_page and taking info->lock): when the filesystem is
full, you could get ENOSPC from a race in bringing back a previously
allocated page from swap.

I'll write these three up now and send off to Andrew.

Hugh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/