Re: [pagevec] resize pagevec to O(lg(NR_CPUS))

From: William Lee Irwin III
Date: Sun Sep 12 2004 - 00:27:44 EST


Marcelo Tosatti wrote:
>> For me Bill's patch (with the recursive thingie) is very cryptic. Its
>> just doing log2(n), it took me an hour to figure it out with his help.

On Sun, Sep 12, 2004 at 10:29:56AM +1000, Nick Piggin wrote:
> Having it depend on NR_CPUS should be avoided if possible.
> But yeah in this case I guess you can't easily make it work
> at runtime.

With some work it could be tuned at boot-time.


Marcelo Tosatti wrote:
>> Oops, right. wli's patch is borked for NUMA. Clamping it at 64 should do
>> fine.

On Sun, Sep 12, 2004 at 10:29:56AM +1000, Nick Piggin wrote:
> Is 16 any good? ;)

There are nontrivial differences in the optimal batching factor
dependent on the distribution of hold times and interarrival times. The
strongest dependencies of all are on ratio of the lock transfer time to
the interarrival time and the lock transfer time itself. These appear
routinely in numerous odd places in the expressions for expected
response time, and the latter often as a constant of proportionality.


On Sun, Sep 12, 2004 at 10:29:56AM +1000, Nick Piggin wrote:
>> Whats the L1 cache size of Itanium2? Each page is huge compared to the
>> pagevec structure (you need a 64 item pagevec array on 64-bits to
>> occupy the space of one 4KB page). So I think you wont blow up the
>> cache even with a really big pagevec.

On Sun, Sep 12, 2004 at 10:29:56AM +1000, Nick Piggin wrote:
> I think it is 16K data cache. It is not the pagevec structure that you
> are worried about, but all the cachelines from all the pages you put
> into it. If you put 64 pages in it, that's 8K with a 128byte cacheline
> size (the structure will be ~512 bytes on a 64-bit arch).
> And if you touch one other cacheline per page, there's 16K.
> So I'm just making up numbers, but the point is you obviously want to
> keep it as small as possible unless you can demonstrate improvements.

It's unclear what you're estimating the size of. PAGEVEC_SIZE of 62
yields a 512B pagevec, for 4 cachelines exclusive to the cpu (or if
stack allocated, the task). The pagevecs themselves are not shared,
so the TLB entries for per-cpu pagevecs span surrouding per-cpu data,
not other cpus' pagevecs, and the TLB entries for stack-allocated
pagevecs are in turn shared with other stack-allocated data.


Marcelo Tosatti wrote:
>> Not very noticeable on reaim. I want to do more tests (different
>> workloads, nr CPUs, etc).

On Sun, Sep 12, 2004 at 10:29:56AM +1000, Nick Piggin wrote:
> Would be good.

On Sun, Sep 12, 2004 at 10:29:56AM +1000, Nick Piggin wrote:
> To get a best case argument for increasing the size of the structure, I
> guess you'll want to setup tests to put the maximum contention on the
> lru_lock. That would mean big non NUMAs (eg. OSDL's stp8 systems),
> lots of page reclaim so you'll have to fill up the caches, and lots
> of read()'ing.

mapping->tree_lock is affected as well as zone->lru_lock. The workload
obviously has to touch the relevant locks for pagevecs to be relevant;
however, the primary factor in the effectiveness of pagevecs is the
lock transfer time, which is not likely to vary significantly on boxen
such as the OSDL STP machines. You should use a workload stressing
mapping->tree_lock via codepaths using radix_tree_gang_lookup() and
getting runtime on OSDL's NUMA-Q or otherwise asking SGI to test its
effects, otherwise you're dorking around with boxen with identical
characteristics as far as batched locking is concerned.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/