Re: [ltt-dev] [PATCH] Fix dirty page accounting inredirty_page_for_writepage()
From: Mathieu Desnoyers
Date: Sat May 02 2009 - 03:01:25 EST
* Christoph Lameter (cl@xxxxxxxxx) wrote:
> On Fri, 1 May 2009, Mathieu Desnoyers wrote:
>
> > Then, if I understand you correctly, you propose :
> >
> > void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
> > {
> > ... assuming p references percpu "u8" counters ...
> > u8 p_new;
> >
> > p_new = percpu_add_return(p, 1);
> >
> > if (unlikely(!(p_new & pcp->stat_threshold_mask)))
> > zone_page_state_add(pcp->stat_threshold, zone, item);
> > }
> >
> > void inc_zone_state(struct zone *zone, enum zone_stat_item item)
> > {
> > ... assuming p references percpu "u8" counters ...
> > u8 p_new;
> >
> > p_new = percpu_add_return_irqsafe(p, 1);
> >
> > if (unlikely(!(p_new & pcp->stat_threshold_mask)))
> > zone_page_state_add(pcp->stat_threshold, zone, item);
> > }
> >
> > (therefore opting for code duplication)
> >
> > Am I correct ?
>
> Well __inc_zone_state is fine by itself. inc_zone_state will currently
> disable irqs. But we can do it your way and duplicate the code.
>
OK, I think I see the source of disagreement here. Leaving
local_irq_save in place in inc_zone_state as you propose will degrade
performance significantly. x86 is not the only architecture not that
fast at disabling interrupts.
Let met get my hands on the numbers I've prepared for my upcoming paper
about tracer locking... here they are. These are in cycles.
Architecture Speedup CAS Interrupts
(cli+sti) / local CAS local sync sti cli
AMD Athlon(tm)64 X2 4.57 7 17 17 15
Intel Core2 6.33 6 30 20 18
Intel Xeon E5405 5.25 8 20 20 22
PowerPC G5 4.00 1 2 3 1
PowerPC POWER6 4.2 GHz 1.77 9 17 14 2
Itanium 2 (single and dual-core, 1.6GHz)
1.33 3 3 2 2
UltraSPARC-IIIi (1.2GHz)(a) 0.64 0.394 0.394 0.094 0.159
(a) (in system bus clock cycles)
I've also got numbers for atomic add return.. it's usually a bit faster
than local CAS, except on architectures where atomic add return must be
emulated by a CAS (it's then very slightly slower). But in any case,
disabling/enabling interrupts is _WAY_ slower than local CAS or local
add return.
> > void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
> > {
> > ... assuming p references percpu "u8" counters ...
> > u8 p_new;
> >
> > p_new = __percpu_add_return_irqsafe(p, 1);
> >
> > if (unlikely(!(p_new & pcp->stat_threshold_mask)))
> > zone_page_state_add(pcp->stat_threshold, zone, item);
> > }
> >
> > void inc_zone_state(struct zone *zone, enum zone_stat_item item)
> > {
> > unsigned long flags;
> >
> > percpu_local_irqsave(flags);
> > __inc_zone_state(zone, item);
> > percpu_local_irqrestore(flags);
> > }
> >
> > Which is more compact and does not duplicate code.
>
> This is almost like the current code. But lets avoid percpu_local_irqs
> etc if we can.
If we want to get good performance out of those percpu ops, we have to
leave interrupts enabled. It's a factor ~5 slowdown otherwise on x86. So
the choice really comes down to : either we duplicate code, or we create
those percpu_local_irqsave-like primitives. BTW if the name is bad, we
may have to just find something better.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/