Re: [PATCH] cfq: improve fsync performance for small files

From: Jeff Moyer
Date: Thu Sep 16 2010 - 10:41:32 EST


Corrado Zoccolo <czoccolo@xxxxxxxxx> writes:

> Hi Jens,
> any comment on this patch, or the problem we are trying to solve?

Bueller? Bueller?

>
> On Tue, Jul 20, 2010 at 10:09 PM, Corrado Zoccolo <czoccolo@xxxxxxxxx> wrote:
>> Fsync performance for small files achieved by cfq on high-end disks is
>> lower than what deadline can achieve, due to idling introduced between
>> the sync write happening in process context and the journal commit.
>>
>> Moreover, when competing with a sequential reader, a process writing
>> small files and fsync-ing them is starved.
>>
>> This patch fixes the two problems by:
>> - marking journal commits as WRITE_SYNC, so that they get the REQ_NOIDLE
>> Âflag set,
>> - force all queues that have REQ_NOIDLE requests to be put in the noidle
>> Âtree.
>>
>> Having the queue associated to the fsync-ing process and the one associated
>> Âto journal commits in the noidle tree allows:
>> - switching between them without idling,
>> - fairness vs. competing idling queues, since they will be serviced only
>> Âafter the noidle tree expires its slice.
>>
>> Acked-by: Vivek Goyal <vgoyal@xxxxxxxxxx>
>> Reviewed-by: Jeff Moyer <jmoyer@xxxxxxxxxx>
>> Tested-by: Jeff Moyer <jmoyer@xxxxxxxxxx>
>> Signed-off-by: Corrado Zoccolo <czoccolo@xxxxxxxxx>
>> ---
>> Âblock/cfq-iosched.c | Â 18 ++++--------------
>> Âfs/jbd/commit.c   |  Â2 +-
>> Âfs/jbd2/commit.c  Â|  Â2 +-
>> Â3 files changed, 6 insertions(+), 16 deletions(-)
>>
>> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
>> index eb4086f..49dada4 100644
>> --- a/block/cfq-iosched.c
>> +++ b/block/cfq-iosched.c
>> @@ -216,7 +216,6 @@ struct cfq_data {
>> Â Â Â Âenum wl_type_t serving_type;
>> Â Â Â Âunsigned long workload_expires;
>> Â Â Â Âstruct cfq_group *serving_group;
>> - Â Â Â bool noidle_tree_requires_idle;
>>
>> Â Â Â Â/*
>> Â Â Â Â * Each priority tree is sorted by next_request position. ÂThese
>> @@ -2126,7 +2125,6 @@ static void choose_service_tree(struct cfq_data *cfqd, struct cfq_group *cfqg)
>> Â Â Â Âslice = max_t(unsigned, slice, CFQ_MIN_TT);
>> Â Â Â Âcfq_log(cfqd, "workload slice:%d", slice);
>> Â Â Â Âcfqd->workload_expires = jiffies + slice;
>> - Â Â Â cfqd->noidle_tree_requires_idle = false;
>> Â}
>>
>> Âstatic struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd)
>> @@ -3108,7 +3106,9 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>> Â Â Â Âif (cfqq->queued[0] + cfqq->queued[1] >= 4)
>> Â Â Â Â Â Â Â Âcfq_mark_cfqq_deep(cfqq);
>>
>> - Â Â Â if (!atomic_read(&cic->ioc->nr_tasks) || !cfqd->cfq_slice_idle ||
>> + Â Â Â if (cfqq->next_rq && (cfqq->next_rq->cmd_flags & REQ_NOIDLE))
>> + Â Â Â Â Â Â Â enable_idle = 0;
>> + Â Â Â else if (!atomic_read(&cic->ioc->nr_tasks) || !cfqd->cfq_slice_idle ||
>> Â Â Â Â Â Â(!cfq_cfqq_deep(cfqq) && CFQQ_SEEKY(cfqq)))
>> Â Â Â Â Â Â Â Âenable_idle = 0;
>> Â Â Â Âelse if (sample_valid(cic->ttime_samples)) {
>> @@ -3421,17 +3421,7 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
>> Â Â Â Â Â Â Â Â Â Â Â Âcfq_slice_expired(cfqd, 1);
>> Â Â Â Â Â Â Â Âelse if (sync && cfqq_empty &&
>> Â Â Â Â Â Â Â Â Â Â Â Â !cfq_close_cooperator(cfqd, cfqq)) {
>> - Â Â Â Â Â Â Â Â Â Â Â cfqd->noidle_tree_requires_idle |=
>> - Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â !(rq->cmd_flags & REQ_NOIDLE);
>> - Â Â Â Â Â Â Â Â Â Â Â /*
>> - Â Â Â Â Â Â Â Â Â Â Â Â* Idling is enabled for SYNC_WORKLOAD.
>> - Â Â Â Â Â Â Â Â Â Â Â Â* SYNC_NOIDLE_WORKLOAD idles at the end of the tree
>> - Â Â Â Â Â Â Â Â Â Â Â Â* only if we processed at least one !REQ_NOIDLE request
>> - Â Â Â Â Â Â Â Â Â Â Â Â*/
>> - Â Â Â Â Â Â Â Â Â Â Â if (cfqd->serving_type == SYNC_WORKLOAD
>> - Â Â Â Â Â Â Â Â Â Â Â Â Â || cfqd->noidle_tree_requires_idle
>> - Â Â Â Â Â Â Â Â Â Â Â Â Â || cfqq->cfqg->nr_cfqq == 1)
>> - Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â cfq_arm_slice_timer(cfqd);
>> + Â Â Â Â Â Â Â Â Â Â Â cfq_arm_slice_timer(cfqd);
>> Â Â Â Â Â Â Â Â}
>> Â Â Â Â}
>>
>> diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
>> index 28a9dda..d97a0c6 100644
>> --- a/fs/jbd/commit.c
>> +++ b/fs/jbd/commit.c
>> @@ -317,7 +317,7 @@ void journal_commit_transaction(journal_t *journal)
>> Â Â Â Âint first_tag = 0;
>> Â Â Â Âint tag_flag;
>> Â Â Â Âint i;
>> - Â Â Â int write_op = WRITE;
>> + Â Â Â int write_op = WRITE_SYNC;
>>
>> Â Â Â Â/*
>> Â Â Â Â * First job: lock down the current transaction and wait for
>> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
>> index 75716d3..a078744 100644
>> --- a/fs/jbd2/commit.c
>> +++ b/fs/jbd2/commit.c
>> @@ -369,7 +369,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>> Â Â Â Âint tag_bytes = journal_tag_bytes(journal);
>> Â Â Â Âstruct buffer_head *cbh = NULL; /* For transactional checksums */
>> Â Â Â Â__u32 crc32_sum = ~0;
>> - Â Â Â int write_op = WRITE;
>> + Â Â Â int write_op = WRITE_SYNC;
>>
>> Â Â Â Â/*
>> Â Â Â Â * First job: lock down the current transaction and wait for
>> --
>> 1.6.2.5
>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/