Re: [PATCH 2/2] ocfs2: No need to zero pages past i_size. i_sizev2

From: Tao Ma
Date: Sun Jul 04 2010 - 21:40:52 EST


Hi Joel,

On 07/04/2010 11:13 PM, Tao Ma wrote:
Hi Joel,

On 07/04/2010 05:33 AM, Joel Becker wrote:
Here's the second patch, the one that keeps us from zeroing
pages past i_size. This should keep ocfs2 and Dave's writeback patch
happy.

Joel

-------------------------------------------------------

When ocfs2 fills a hole, it does so by allocating clusters. When a
cluster is larger than the write, ocfs2 must zero the portions of the
cluster outside of the write. If the clustersize is smaller than a
pagecache page, this is handled by the normal pagecache mechanisms, but
when the clustersize is larger than a page, ocfs2's write code will zero
the pages adjacent to the write. This makes sure the entire cluster is
zeroed correctly.

Currently ocfs2 behaves exactly the same when writing past i_size.
However, this means ocfs2 is writing zeroed pages for portions of a new
cluster that are beyond i_size. The page writeback code isn't expecting
this. It treats all pages past the one containing i_size as left behind
due to a previous truncate operation.

Thankfully, ocfs2 calculates the number of pages it will be working on
up front. The rest of the write code merely honors the original
calculation. We can simply trim the number of pages to only cover the
actual file data.

Signed-off-by: Joel Becker<joel.becker@xxxxxxxxxx>
---
fs/ocfs2/aops.c | 15 +++++++++++----
1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 96e6aeb..e90ad74 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
<snip>
@@ -1142,11 +1143,17 @@ static int ocfs2_grab_pages_for_write(struct
address_space *mapping,
/*
* Figure out how many pages we'll be manipulating here. For
* non allocating write, we just change the one
- * page. Otherwise, we'll need a whole clusters worth.
+ * page. Otherwise, we'll need a whole clusters worth. If we're
+ * writing past i_size, we only need enough pages to cover the
+ * last page of the write.
The comments for the whole function before the function name also needs
this change accordingly?
*/
if (new) {
wc->w_num_pages = ocfs2_pages_per_cluster(inode->i_sb);
start = ocfs2_align_clusters_to_page_index(inode->i_sb, cpos);
+ /* This is the index *past* the write */
+ end_index = ((user_pos + user_len)>> PAGE_CACHE_SHIFT) + 1;
should it be
end_index = ((user_pos + user_len - 1) >> PAGE_CACHE_SHIFT) + 1?


+ if ((start + wc->w_num_pages)> end_index)
+ wc->w_num_pages = end_index - start;
I just noticed that the below loop in ocfs2_grab_pages_for_write is
for (i = 0; i < wc->w_num_pages; i++)

I guess w_num_pages should be set to end_index -
start_page_of_the_cluster so that we can make sure we grab all the pages
in this cluster until i_size?
oh, start is set to that value, sorry for this bit.
btw, do we ever have a chance that start + wc->w_num_pages > end_index? I can't find it.

Regards,
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/