[PATCH 3.10 30/56] hfs,hfsplus: cache pages correctly between bnode_create and bnode_free

From: Greg Kroah-Hartman
Date: Tue Sep 29 2015 - 09:53:45 EST

3.10-stable review patch. If anyone has any objections, please let me know.


From: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx>

commit 7cb74be6fd827e314f81df3c5889b87e4c87c569 upstream.

Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
hfs_bnode_find() for finding or creating pages corresponding to an inode)
are immediately kmap()'ed and used (both read and write) and kunmap()'ed,
and should not be page_cache_release()'ed until hfs_bnode_free().

This patch fixes a problem I first saw in July 2012: merely running "du"
on a large hfsplus-mounted directory a few times on a reasonably loaded
system would get the hfsplus driver all confused and complaining about
B-tree inconsistencies, and generates a "BUG: Bad page state". Most
recently, I can generate this problem on up-to-date Fedora 22 with shipped
kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller
mounts) and "du /mnt" simultaneously on two windows, where /mnt is a
lightly-used QEMU VM image of the full Mac OS X 10.9:

$ df -i / /home /mnt
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/fedora-root 3276800 551665 2725135 17% /
/dev/mapper/fedora-home 52879360 716221 52163139 2% /home
/dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt

After applying the patch, I was able to run "du /" (60+ times) and "du
/mnt" (150+ times) continuously and simultaneously for 6+ hours.

There are many reports of the hfsplus driver getting confused under load
and generating "BUG: Bad page state" or other similar issues over the
years. [1]

The unpatched code [2] has always been wrong since it entered the kernel
tree. The only reason why it gets away with it is that the
kmap/memcpy/kunmap follow very quickly after the page_cache_release() so
the kernel has not had a chance to reuse the memory for something else,
most of the time.

The current RW driver appears to have followed the design and development
of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec
2001) had a B-tree node-centric approach to
read_cache_page()/page_cache_release() per bnode_get()/bnode_put(),
migrating towards version 0.2 (June 2002) of caching and releasing pages
per inode extents. When the current RW code first entered the kernel [2]
in 2005, there was an REF_PAGES conditional (and "//" commented out code)
to switch between B-node centric paging to inode-centric paging. There
was a mistake with the direction of one of the REF_PAGES conditionals in
__hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the
read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
removed, but a page_cache_release() was mistakenly left in (propagating
the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out
page_cache_release() in bnode_release() (which should be spanned by
!REF_PAGES) was never enabled.

Michael Fox, Apr 2013
("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")

Sasha Levin, Feb 2015
http://lkml.org/lkml/2015/2/20/85 ("use after free")


commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
Author: Andrew Morton <akpm@xxxxxxxx>
Date: Wed Feb 25 16:17:36 2004 -0800

[PATCH] HFS rewrite


commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@xxxxxxxx>
Date: Wed Feb 25 16:17:48 2004 -0800

[PATCH] HFS+ support




Date: Thu Jun 6 09:45:14 2002 +0000
Use buffer cache instead of page cache in bnode.c. Cache inode extents.


commit a5e3985fa014029eb6795664c704953720cc7f7d
Author: Roman Zippel <zippel@xxxxxxxxxxxxxx>
Date: Tue Sep 6 15:18:47 2005 -0700

[PATCH] hfs: remove debug code

Signed-off-by: Hin-Tak Leung <htl10@xxxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Sergei Antonov <saproj@xxxxxxxxx>
Reviewed-by: Anton Altaparmakov <anton@xxxxxxxxxx>
Reported-by: Sasha Levin <sasha.levin@xxxxxxxxxx>
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Cc: Vyacheslav Dubeyko <slava@xxxxxxxxxxx>
Cc: Sougata Santra <sougata@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

fs/hfs/bnode.c | 9 ++++-----
fs/hfsplus/bnode.c | 3 ---
2 files changed, 4 insertions(+), 8 deletions(-)

--- a/fs/hfs/bnode.c
+++ b/fs/hfs/bnode.c
@@ -288,7 +288,6 @@ static struct hfs_bnode *__hfs_bnode_cre
goto fail;
- page_cache_release(page);
node->page[i] = page;

@@ -398,11 +397,11 @@ node_error:

void hfs_bnode_free(struct hfs_bnode *node)
- //int i;
+ int i;

- //for (i = 0; i < node->tree->pages_per_bnode; i++)
- // if (node->page[i])
- // page_cache_release(node->page[i]);
+ for (i = 0; i < node->tree->pages_per_bnode; i++)
+ if (node->page[i])
+ page_cache_release(node->page[i]);

--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -456,7 +456,6 @@ static struct hfs_bnode *__hfs_bnode_cre
goto fail;
- page_cache_release(page);
node->page[i] = page;

@@ -568,13 +567,11 @@ node_error:

void hfs_bnode_free(struct hfs_bnode *node)
-#if 0
int i;

for (i = 0; i < node->tree->pages_per_bnode; i++)
if (node->page[i])

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/