The fast path uses the per cpu caches. No locks are taken and there
is no IRQ disabling. For concurrency protection this comment
explains it best:
/*
* The cmpxchg will only match if there was no additional
* operation and if we are on the right processor.
*
* The cmpxchg does the following atomically (without lock
* semantics!)
* 1. Relocate first pointer to the current per cpu area.
* 2. Verify that tid and freelist have not been changed
* 3. If they were not changed replace tid and freelist
*
* Since this is without lock semantics the protection is only
* against code executing on this cpu *not* from access by
* other cpus.
*/
in the slow path, IRQs and locks have to be taken at the minimum.
The debug options disable ever loading the per CPU caches so it
always falls back to the slow path.
You could add the use of per cpu lists to the slow paths as well in
order
to increase performance. Then weave in the debugging options.
But the performance of the fast path is critical to the overall
performance of the kernel as a whole since this is a heavily used code
path for many subsystems.