[RFC] Alternative raceless page free

From: Daniel Phillips (phillips@arcor.de)
Date: Wed Sep 04 2002 - 23:42:12 EST


For completeness, I implemented the atomic_dec_and_test version of raceless
page freeing suggested by Manfred Spraul. The atomic_dec_and_test approach
eliminates the free race by ensuring that when a page's count drops to zero
the lru list lock is taken atomically, leaving no window where the page can
also be found and manipulated on the lru list.[1] Both this and the
extra-lru-count version are supported in the linked patch:

   http://people.nl.linux.org/~phillips/patches/lru.race-2.4.19-2

The atomic_dec_and_test version is slightly simpler, but was actually more
work to implement because of the need to locate and eliminate all uses of
page_cache_release where the lru lock is known to be held, as these will
deadlock. That had the side effect of eliminating a number of ifdefs vs the
lru count version, and rooting out some hidden redundancy.

The patch exposes __free_pages_ok, which must called directly by the
atomic_dec_and_lock variant. In the process it got a less confusing name -
recover_pages. (The incumbent name is confusing because all other 'free'
variants in addition manipulate the page count.)

It's a close call which version is faster. I suspect the atomic_dec_and_lock
version will not scale quite as well because of the bus-locked cmpxchg on the
page count (optimized version; unoptimized version always takes the spinlock)
but neither version really lacks in the speed department.

I have a slight preference for the extra-lru-count version, because of the
trylock in page_cache_release. This means that nobody will have to spin when
shrink_cache is active. Instead, freed pages that collide with the lru lock
can just be left on the lru list to be picked up efficiently later. The
trylock also allows the lru lock to be acquired speculatively from interrupt
context, without a requirement that lru lock holders disable interrupts.
Both versions are provably correct, modulo implementation gaffs.

The linked patch defaults to atomic_dec_and_lock version. To change to
the extra count version, define LRU_PLUS_CACHE as 2 instead of 1.

Christian, can you please run this one through your race detector?

[1] As a corollary, pages with zero count can never be found on the lru list,
so that is treated as a bug.

-- 
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 07 2002 - 22:00:23 EST