Re: [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch

From: Suparna Bhattacharya
Date: Wed Dec 03 2003 - 23:37:36 EST


Interesting :)
Could it be that extra atomic_dec in dio_bio_end_io in the current code
that makes the difference ? Or are spin_lock_irq and atomic_inc/dec
really that close ? Would be good to know from other archs just to keep
in mind in the future.

As for the fallback case, performance doesn't even matter - its not a
typical situation. So we don't need to bother with that.

Unless someone thinks otherwise, we should go with your fix.

Regards
Suparna

On Wed, Dec 03, 2003 at 03:14:06PM -0800, Daniel McNeil wrote:
> Suparna,
>
> I did a quick test of your patch and my patch by running my aiocp
> program to write zeros to a file and to read a file. I used
> a 50MB file on ext2 file system on a ramdisk. The machine
> is a 2-proc IBM box with:
>
> model name : Intel(R) Xeon(TM) CPU 1700MHz
> stepping : 10
> cpu MHz : 1686.033
> cache size : 256 KB
>
> The write test was:
>
> time aiocp -n 32 -b 1k -s 50m -z -f DIRECT file
>
> The read test was
>
> time aiocp -n 32 -b 1k -s 50 -w -f DIRECT file
>
> I ran each test more than 10 times and here are the averages:
>
> my patch your patch
>
> aiocp write real 0.7328 real 0.7756
> user 0.01425 user 0.01221
> sys 0.716 sys 0.76157
>
> aiocp read real 0.7250 real 0.7456
> user 0.0144 user 0.0130
> sys 0.07149 sys 0.7307
>
> It looks like using the spin_lock instead of the atomic inc/dec
> is very close performance wise. The spin_lock averages a bit
> faster. This is not testing the fallback base, but both patches
> would be very similar in performance for that case.
>
> I don't have any non-intel hardware to test with.
>
> Daniel
>
>
>
> On Tue, 2003-12-02 at 07:25, Suparna Bhattacharya wrote:
> > I suspect the degree to which the spin_lock_irq is costlier
> > than atomic_inc/dec would vary across architectures - cli/sti
> > is probably more expensive on certain archs than others.
> >
> > The patch I sent just kept things the way they were in terms of
> > locking costs, assuming that those choices were thought through
> > at that time (should check with akpm). Yours changes it by
> > switching to spin_lock(unlock)_irq instead of atomic_dec in
> > the normal (common) path for finished_one_bio, for both sync
> > and async i/o. At the same time, for the sync i/o case, as
> > you observe it takes away one atomic_dec from dio_bio_end_io.
> >
> > Since these probably aren't really very hot paths ... possibly
> > the difference doesn't matter that much. I do agree that your
> > patch makes the locking easier to follow.
> >
> > Regards
> > Suparna
> >
> > On Mon, Dec 01, 2003 at 05:35:18PM -0800, Daniel McNeil wrote:
> > > Suparna,
> > >
> > > Sorry I did not respond sooner, I was on vacation.
> > >
> > > Your patch should also fix the problem. I like mine with the
> > > cleaner locking.
> > >
> > > I am not sure your approach has less overhead. At least
> > > on x86, cli/sti are fairly inexpensive. The locked xchange or locked
> > > inc/dec is what is expensive (from what I understand).
> > >
> > > So comparing:
> > >
> > > my patch: Your patch:
> > >
> > > dio_bio_submit()
> > > spin_lock() atomic_inc(bio_count);
> > > bio_count++ atomic_inc(bios_in_flight);
> > > bios_in_flight++
> > > spin_unlock
> > >
> > > My guess is that the spin_lock/spin_unlock is faster than 2 atomic_inc's
> > > since it is only 1 locked operation (spin_lock) verses 2 (atomic_inc's)
> > >
> > > finished_one_bio() (normal case)
> > >
> > > My patch:
> > > spin_lock() atomic_dec_and_test(bio_count)
> > > bio_count--
> > > spin_unlock()
> > >
> > > 1 locked instruction each, so very close -- atomic_dec_and_test() does
> > > not disable interrupts, so it is probabably a little bit faster.
> > >
> > > finished_one-bio (fallback case):
> > >
> > > spin_lock() spin_lock()
> > > bio_count--; dio->waiter = null
> > > spin_unlock() spin_unlock()
> > >
> > > Both approaches are the same.
> > >
> > > dio_bio_complete()
> > >
> > > spin_lock() spin_lock()
> > > bios_in_flight-- atomic_dec()
> > > spin_unlock spin_unlock()
> > >
> > > My patch is faster since it removed 1 locked instruction.
> > >
> > > Conclusion:
> > >
> > > My guess would be that both approaches are close, but my patch
> > > has less locked instructions but does disable interrupts more.
> > > My preference is for the cleaner locking approach that is easier
> > > to understand and modify in the future.
> > >
> > > Daniel
> > >
> > > On Tue, 2003-11-25 at 23:55, Suparna Bhattacharya wrote:
> > > > On Tue, Nov 25, 2003 at 03:49:31PM -0800, Daniel McNeil wrote:
> > > > > Suparna,
> > > > >
> > > > > Yes your patch did help. I originally had CONFIG_DEBUG_SLAB=y which
> > > > > was helping me see problems because the the freed dio was getting
> > > > > poisoned. I also tested with CONFIG_DEBUG_PAGEALLOC=y which is
> > > > > very good at catching these.
> > > >
> > > > Ah I see - perhaps that explains why neither Janet nor I could
> > > > recreate the problem that you were hitting so easily. So we
> > > > should probably try running with CONFIG_DEBUG_SLAB and
> > > > CONFIG_DEBUG_PAGEALLOC as well.
> > > >
> > > > >
> > > > > I updated your AIO fallback patch plus your AIO race plus I fixed
> > > > > the bio_count decrement fix. This patch has all three fixes and
> > > > > it is working for me.
> > > > >
> > > > > I fixed the bio_count race, by changing bio_list_lock into bio_lock
> > > > > and using that for all the bio fields. I changed bio_count and
> > > > > bios_in_flight from atomics into int. They are now proctected by
> > > > > the bio_lock. I fixed the race, by in finished_one_bio() by
> > > > > leaving the bio_count at 1 until after the dio_complete()
> > > > > and then do the bio_count decrement and wakeup holding the bio_lock.
> > > > >
> > > > > Take a look, give it a try, and let me know what you think.
> > > >
> > > > I had been trying a slightly different kind of fix -- appended is
> > > > the updated version of the patch I last posted. It uses the bio_list_lock
> > > > to protect the dio->waiter field, which finished_one_bio sets back
> > > > to NULL after it has issued the wakeup; and the code that waits for
> > > > i/o to drain out checks the dio->waiter field instead of bio_count.
> > > > This might not seem very obvious given the nomenclature of the
> > > > bio_list_lock, so I was holding back wondering if it could be
> > > > improved.
> > > >
> > > > Your approach looks clearer in that sense -- its pretty unambiguous
> > > > about what lock protects what fields. The only thing that bothers me (and
> > > > this is what I was trying to avoid in my patch) is the increased
> > > > use of spin_lock_irq 's (overhead of turning interrupts off and on)
> > > > instead of simple atomic inc/dec in most places.
> > > >
> > > > Thoughts ?
> > > >
> > > > Regards
> > > > Suparna
> > >
> > > --
> > > To unsubscribe, send a message with 'unsubscribe linux-aio' in
> > > the body to majordomo@xxxxxxxxxx For more info on Linux AIO,
> > > see: http://www.kvack.org/aio/
> > > Don't email: <a href=mailto:"aart@xxxxxxxxx";>aart@xxxxxxxxx</a>
>

--
Suparna Bhattacharya (suparna@xxxxxxxxxx)
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/