On Thu, Apr 25, 2013 at 08:16:45PM +0400, Maxim V. Patlasov wrote:As Mel Gorman pointed out, fuse daemon diving intoI strongly dislike the above.
balance_dirty_pages should not kick flusher judging on
NR_WRITEBACK_TEMP. Essentially, all we need in balance_dirty_pages
is:
if (I'm not fuse daemon)
nr_dirty += global_page_state(NR_WRITEBACK_TEMP);
04/25/2013 07:49 PM, Miklos Szeredi ÐÐÑÐÑ:On Thu, Apr 25, 2013 at 4:29 PM, Maxim V. PatlasovAs Mel Gorman pointed out, fuse daemon diving into balance_dirty_pages
<mpatlasov@xxxxxxxxxxxxx> wrote:
Please elaborate. From a technical perspective "fuse daemon" is verydiff --git a/mm/page-writeback.c b/mm/page-writeback.cPlease drop this patch. As we discussed in LSF/MM, the fix above is correct,
index 0713bfb..c47bcd4 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1235,7 +1235,8 @@ static void balance_dirty_pages(struct address_space
*mapping,
*/
nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
global_page_state(NR_UNSTABLE_NFS);
- nr_dirty = nr_reclaimable +
global_page_state(NR_WRITEBACK);
+ nr_dirty = nr_reclaimable +
global_page_state(NR_WRITEBACK) +
+ global_page_state(NR_WRITEBACK_TEMP);
global_dirty_limits(&background_thresh, &dirty_thresh);
but it's not enough: we also need to ensure disregard of NR_WRITEBACK_TEMP
when balance_dirty_pages() is called from fuse daemon. I'll send a separate
patch-set soon.
hard to define, so anything that relies on whether something came from
the fuse daemon or not is conceptually broken.
should not kick flusher judging on NR_WRITEBACK_TEMP. Essentially, all
we need in balance_dirty_pages is:
if (I'm not fuse daemon)
nr_dirty += global_page_state(NR_WRITEBACK_TEMP);
The way how to identify fuse daemon was not thoroughly scrutinized
during LSF/MM. Firstly, I thought it would be enough to set a
per-process flag handling fuse device open. But now I understand that
fuse daemon may be quite a complicated multi-threaded multi-process
construction. I'm going to add new FUSE_NOTIFY to allow fuse daemon
decide when it works on behalf of draining writeout-s. Having in mind
that fuse-lib is multi-threaded, I'm also going to inherit the flag on
copy_process(). Does it make sense for you?
Also, another patch will put this ad-hoc FUSE_NOTIFY under fusermount
control. This will prevent malicious unprivileged fuse mounts from
setting the flag for malicious purposes.
What about something like the following untested patch?
The idea is that fuse filesystems should not go over the bdi limit even if the
global limit hasn't been reached.
Thanks,
Miklos
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 137185c..195ee45 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -291,6 +291,7 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
inode->i_flags |= S_NOATIME|S_NOCMTIME;
inode->i_generation = generation;
inode->i_data.backing_dev_info = &fc->bdi;
+ set_bit(AS_STRICTLIMIT, &inode->i_data.flags);
fuse_init_inode(inode, attr);
unlock_new_inode(inode);
} else if ((inode->i_mode ^ attr->mode) & S_IFMT) {
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 0e38e13..97f6a0c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -25,6 +25,7 @@ enum mapping_flags {
AS_MM_ALL_LOCKS = __GFP_BITS_SHIFT + 2, /* under mm_take_all_locks() */
AS_UNEVICTABLE = __GFP_BITS_SHIFT + 3, /* e.g., ramdisk, SHM_LOCK */
AS_BALLOON_MAP = __GFP_BITS_SHIFT + 4, /* balloon page special map */
+ AS_STRICTLIMIT = __GFP_BITS_SHIFT + 5, /* strict dirty limit */
};
static inline void mapping_set_error(struct address_space *mapping, int error)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index efe6814..91a9e6e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1226,6 +1226,7 @@ static void balance_dirty_pages(struct address_space *mapping,
unsigned long dirty_ratelimit;
unsigned long pos_ratio;
struct backing_dev_info *bdi = mapping->backing_dev_info;
+ int strictlimit = test_bit(AS_STRICTLIMIT, &mapping->flags);
unsigned long start_time = jiffies;
for (;;) {
@@ -1250,7 +1251,7 @@ static void balance_dirty_pages(struct address_space *mapping,
*/
freerun = dirty_freerun_ceiling(dirty_thresh,
background_thresh);
- if (nr_dirty <= freerun) {
+ if (nr_dirty <= freerun && !strictlimit) {
current->dirty_paused_when = now;
current->nr_dirtied = 0;
current->nr_dirtied_pause =
@@ -1297,7 +1298,7 @@ static void balance_dirty_pages(struct address_space *mapping,
}
dirty_exceeded = (bdi_dirty > bdi_thresh) &&
- (nr_dirty > dirty_thresh);
+ ((nr_dirty > dirty_thresh) || strictlimit);
if (dirty_exceeded && !bdi->dirty_exceeded)
bdi->dirty_exceeded = 1;