Re: [RFC V2] test_bit before clear files_struct bits

From: Linus Torvalds
Date: Tue Feb 10 2015 - 15:49:52 EST

On Tue, Feb 10, 2015 at 12:22 PM, Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> The patch is good but I'm still wondering if any CPUs can do this
> speedup for us. The CPU has to pull in the target word to modify the
> bit and what it *could* do is to avoid dirtying the cacheline if it
> sees that the bit is already in the desired state.

Sadly, no CPU I know of actually does this. Probably because it would
take a bit more core resources, and conditional writes to memory are
not normally part of an x86 core (it might be more natural for
something like old-style ARM that has conditional writes).

Also, even if the store were to be conditional, the cacheline would
have been acquired in exclusive state, and in many cache protocols the
state machine is from exclusive to dirty (since normally the only
reason to get a cacheline for exclusive use is in order to write to
it). So a "read, test, conditional write" ends up actually being more
complicated than you'd think - because you *want* that
exclusive->dirty state for the case where you really are going to
change the bit, and to avoid extra cache protocol stages you don't
generally want to read the cacheline into a shared read mode first
(only to then have to turn it into exclusive/dirty as a second state)

So at least on current x86 (and for reasons above, likely in the
future, including other architectures with read-modify-write memory
access models), the default assumption is that the bit operations will
actually change the bit, and unlikely bit setting/clearing for
cachelines that are very likely to otherwise stay clean should
probably be conditionalized in software. Like in this patch.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at