Milton Miller wrote:... [ fs POOF scenerio ]On Jul 4, 2006, at 2:47 AM, Jes Sorensen wrote:The idea I wrote the code under was that we are really only concerned to
make sure all bh's related to the device causing the invalidate are hit.
It doesn't matter that there are bh's in the lru from other devices.
And that is fine as far as it goes. The problem is that an unrelated
device might be be hit by this operation.
Hrmpf, maybe it's back to putting the mask into the bdev then.
8 pointers * NR_CPUS - that can be a lot of cachelines you have to pull
in from remote :(
8 pointers in one array, hmm... 8*8 = 64 consecutive bytes, that is
half a cache line in my arch. (And on 32 bit archs, 8*4 = 32 bytes,
typically 1 full cache line). Add an alignment constraint if you like.
I was comparing to fetching a per-cpu variable saying we had any
entries; I'd say the difference is the time to do 8 loads once you have
the line vs the chance the other cpu is actively storing and stealing it
back before you finish. I'd be really surprised if it was stolen twice.
The problem is that if you throw the IPI onto multiple CPUs they will
all try and hit this array at the same time. Besides, on some platforms
this would be 64 * 8 * 8 (or even bigger) - scalability quickly goes
out the window, especially as it's to be fetched from another node :( I
know these are the extreme in today's world, but it's becoming more of
an issue with all the multi-core stuff we're going to see.
If you want to cut down on the cache line passing, then putting
the cpu mask in the bdev (for "I have cached a bh on this bdev
sometime") might be useful. You could even do a bdev specific
lru kill, but then we get into the next topic.
That could be an interesting way out.
Another approach is to age the cache. Only clear the bit during the IPI
call if you had nothing there on the last N calls (N being some
combination of absolute time and number of invalidate calls).
I guess if there's nothing in the LRU you don't get the call and the
number of cross node clears are limited, but it seems to get worse
exponentially with the size of the machine, especially if it's busy
doing IO, which is what worries me :(
Next, I'll look for my asbestos suit and take a peak at the hald source.