Re: [PATCH] x86: make use of inc/dec conditional

From: Tejun Heo
Date: Thu Aug 27 2009 - 03:34:28 EST


Hello, Jan.

Jan Beulich wrote:
>>>> "H. Peter Anvin" <hpa@xxxxxxxxx> 19.08.09 18:48 >>>
>> On 08/19/2009 12:48 AM, Jan Beulich wrote:
>>> According to gcc's instruction selection, inc/dec can be used without
>>> penalty on most CPU models, but should be avoided on others. Hence we
>>> should have a config option controlling the use of inc/dec, and
>>> respective abstraction macros to avoid making the resulting code too
>>> ugly. There are a few instances of inc/dec that must be retained in
>>> assembly code, due to that code's dependency on the instruction not
>>> changing the carry flag.
>> One thing: I doubt it matters one measurable iota when it comes to
>> locked operations.
>
> Okay, I think I agree to this point.
>
>> Furthermore:
>>
>> - "decl %2 ;\n"
>> + _ASM_DECL "%2 ;\n"
>> "jne 1b ;\n"
>> "adcl $0, %0 ;\n"
>>
>> It looks to me that the carry flag is live across the dec there. The
>
> Indeed, I overlooked that when going through and checking for the
> CF-is-live instances.
>
>> other csum code look scary to me too.
>>
>> The rest of them look technically okay, but you're bloating them by two
>> bytes (one byte in 64-bit mode) for every instance. You may want to
>> consider if any particular instance is more icache-critical than
>> stall-critical. This is probably more of a concern for inlines than for
>> regular single-instance code like the string operations.
>
> So the background really is that I wanted to introduce a percpu_inc()
> operation subsequently (here with the goal to reduce code size by one
> byte in a couple of places - initially just for inc_irq_stat(), didn't look
> for other potential users), but then realized that it wouldn't be nice
> to unconditionally introduce a possible stall here. Hence I went and
> first created said config option, and then also went through and
> identified the uses of inc/dec that could be replaced based on that
> config option.

Given that we're already sprinkling inc/dec's via atomic ops, I think
this part can proceed independently. Also, if the only affected
machine is the hot p4, I don't think it would worth any amount of
code. :-)

For the percpu part, wouldn't it be better to have
__builtin_contant_p() on the add/sub parameter, use inc/dec if the
param is constant and 1 and make simple wrapper for inc/dec if still
necessary?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/