Re: ext3 journal commit while seek & write to file

From: Jan Kara
Date: Thu Dec 13 2012 - 12:15:23 EST


Hello,

On Sat 08-12-12 18:04:25, Keith Chew wrote:
> There is a thread in the sqlite mailing list that was started by me,
> but it did not finish because it appears that my findings are more
> related to the kernel instead of sqlite. I really hope someone here
> can give me some guidance.
>
> The summary of my system is:
> - kernel 2.6.39.4 (also tested with 3.6.9)
> - ext3 with data=ordered,commit=5
> - disk has write-cache off
> - sqlite does an insert to the DB every second
>
> I have found that it takes 1ms to write to the DB each second, except
> for when the kernel commits its journal (ie every 5 seconds). At those
> times, the write goes up to 160ms.
>
> You can see from the strace below that the write() after the seek does
> take longer (in this case 148ms) compared to the usual 1ms:
> -------------------
> [pid 17913] 17:58:14.390431 _llseek(98, 4826072, [4826072], SEEK_SET)
> = 0 <0.000013>
> [pid 17913] 17:58:14.390667 write(98,
> "\0\0\0\5\0\0\0\215\"'\201\230\305\360\331\370G\305\25\3358W\234\336",
> 24) = 24 <0.000137>
> [pid 17913] 17:58:14.390956 _llseek(98, 4826096, [4826096], SEEK_SET)
> = 0 <0.000012>
> [pid 17913] 17:58:14.391134 write(98,
> "\r\0\0\0\1\3<\0\3<\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1024) = 1024 <0.148882>
> -------------------
>
> I have also tried to write a small program which appends 1KB to the
> end of a file every second, and I do not see this latency on that app.
> Profiling mysql when doing a write every second also do not suffer
> from this problem. I have looked into the sqlite code, but cannot find
> anything unusual.
>
> Is there anything I can do to improve this situation?
Hum, and does this happen only if you overwrite the same block (block
has 4096 bytes) or does it happen even when writing to distinct blocks?
It might be a contention on j_list_lock or buffer lock and this would
differentiate those.

Maybe you could create a simple program simulating the writes and gather
'perf record' of the writer with commit=1 (so that we always hit the
problematic case) and commit=30 (so that it's rarely hit). And the we could
compare the reports...

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/