Re: [PATCH 1/2] vfs: make sure struct filename->iname is word-aligned
From: Rasmus Villemoes
Date: Mon Feb 22 2016 - 19:00:09 EST
On Fri, Feb 19 2016, Theodore Ts'o <tytso@xxxxxxx> wrote:
> On Thu, Feb 18, 2016 at 09:10:21PM +0100, Rasmus Villemoes wrote:
>>
>> Sure, that would work as well. I don't really care how ->iname is pushed
>> out to offset 32, but I'd like to know if it's worth it.
>
> Do you have access to one of these platforms where unaligned access is
> really painful?
No. But FWIW, I did a microbenchmark on my aging Core2, doing nothing
but lstat() on the same "aaaa..." string in a loop. 'before' is 4.4.2
with a few unrelated patches, 'after' is that plus 1/2 and 2/2. In
perf_x_y, x is length of "aaa..." string and y is alignment mod 8 in
userspace.
$ grep strncpy_from_user *.report
perf_30_0_after.report: 5.47% s_f_u [k] strncpy_from_user
perf_30_0_before.report: 7.40% s_f_u [k] strncpy_from_user
perf_30_3_after.report: 5.05% s_f_u [k] strncpy_from_user
perf_30_3_before.report: 7.29% s_f_u [k] strncpy_from_user
perf_30_4_after.report: 4.88% s_f_u [k] strncpy_from_user
perf_30_4_before.report: 7.28% s_f_u [k] strncpy_from_user
perf_30_6_after.report: 5.43% s_f_u [k] strncpy_from_user
perf_30_6_before.report: 6.74% s_f_u [k] strncpy_from_user
perf_40_0_after.report: 5.68% s_f_u [k] strncpy_from_user
perf_40_0_before.report: 10.99% s_f_u [k] strncpy_from_user
perf_40_3_after.report: 5.37% s_f_u [k] strncpy_from_user
perf_40_3_before.report: 10.62% s_f_u [k] strncpy_from_user
perf_40_4_after.report: 5.61% s_f_u [k] strncpy_from_user
perf_40_4_before.report: 10.91% s_f_u [k] strncpy_from_user
perf_40_6_after.report: 5.81% s_f_u [k] strncpy_from_user
perf_40_6_before.report: 10.84% s_f_u [k] strncpy_from_user
perf_50_0_after.report: 6.29% s_f_u [k] strncpy_from_user
perf_50_0_before.report: 12.46% s_f_u [k] strncpy_from_user
perf_50_3_after.report: 7.15% s_f_u [k] strncpy_from_user
perf_50_3_before.report: 14.09% s_f_u [k] strncpy_from_user
perf_50_4_after.report: 7.64% s_f_u [k] strncpy_from_user
perf_50_4_before.report: 14.10% s_f_u [k] strncpy_from_user
perf_50_6_after.report: 7.30% s_f_u [k] strncpy_from_user
perf_50_6_before.report: 14.10% s_f_u [k] strncpy_from_user
perf_60_0_after.report: 6.81% s_f_u [k] strncpy_from_user
perf_60_0_before.report: 13.25% s_f_u [k] strncpy_from_user
perf_60_3_after.report: 9.48% s_f_u [k] strncpy_from_user
perf_60_3_before.report: 13.26% s_f_u [k] strncpy_from_user
perf_60_4_after.report: 9.90% s_f_u [k] strncpy_from_user
perf_60_4_before.report: 15.09% s_f_u [k] strncpy_from_user
perf_60_6_after.report: 9.91% s_f_u [k] strncpy_from_user
perf_60_6_before.report: 13.85% s_f_u [k] strncpy_from_user
So the numbers vary and it's a bit odd that some of the
userspace-unaligned cases seem faster than the corresponding aligned
ones, but overall I think it's ok to conclude there's a measurable
difference.
Note the huge jump from 30_y_before to 40_y_before. I suppose that's
because we do an unaligned store crossing a cache line boundary when the
string is > 32 bytes.
I suppose 2/2 is also responsible for some of the above, since it not
only aligns the kernel-side stores, but also means we stay within a
single cacheline for strings up to 56 bytes. I should measure the effect
of 1/2 by itself, but compiling a kernel takes forever for me, so I
won't get to that tonight.
[It turns out that 32 is the median length from 'git ls-files' in the
kernel tree, with 33.2 being the mean, so even though I used relatively
long paths above to get strncpy_from_user to stand out, such path
lengths are not totally uncommon.]
> The usual thing is to benchmark something like "git
> stat" which has to stat every single file in a repository's working
> directory.
I tried that as well; strncpy_from_user was around 0.5% both before and
after.
Rasmus