Re: 2.6.12-rc6-mm1 & 2K lun testing

From: Dave Chinner
Date: Wed Jun 15 2005 - 18:25:48 EST

Next message: Hemant Mohapatra: "cdev_init() problems in char device driver (kernel 2.6.4-52) (newbie ques)"
Previous message: Nikita Danilov: "Re: FIGETBSZ and FIBMAP for directorys"
In reply to: Nick Piggin: "Re: 2.6.12-rc6-mm1 & 2K lun testing"
Next in thread: Chen, Kenneth W: "RE: 2.6.12-rc6-mm1 & 2K lun testing"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Jun 16, 2005 at 04:30:25AM +1000, Nick Piggin wrote:
> Badari Pulavarty wrote:
>
> > ------------------------------------------------------------------------
> >
> > elm3b29 login: dd: page allocation failure. order:0, mode:0x20
> >
> > Call Trace: <IRQ> <ffffffff801632ae>{__alloc_pages+990} <ffffffff801668da>{cache_grow+314}
> > <ffffffff80166d7f>{cache_alloc_refill+543} <ffffffff80166e86>{kmem_cache_alloc+54}
> > <ffffffff8033d021>{scsi_get_command+81} <ffffffff8034181d>{scsi_prep_fn+301}
>
> They look like they're all in scsi_get_command.

I've seen this before on a system with lots of luns, lots of memory
and lots of dd write I/O. basically, all the memory being flushed
was being pushed into the elevator queues before block congestion
was triggered (58GiB of RAM in the elevator queues waiting for I/O
to be done on them!) This caused OOM problems when things like slab
allocations were necessary and the above was a common location for
failures.

If you've got command tag queueing turned on, then you need a
scsi command structure for every I/O on the fly. Assuming the default
depth for linux (32 IIRC) - that's 2048 x 32 = 64k request structures.
Hence you're doing a few allocations here.

However, when you are oversubscribing the system like this, you can run
out of memory by the time you get to the SCSI layer because there's
so many block devices that none of them get enough I/O queued to
trigger congestion and throttle the incoming writers.

You can WAR this is to reduce the /sys/block/*/queue/nr_requests to a
small number (say 4 or 8). This should cause the system to throttle
writers at /proc/sys/vm/dirty_ratio percent of memory dirtied and
prevent these failures. The system responsiveness should be far
better as well.

HTH.

Cheers,

Dave.
--
Dave Chinner
R&D Software Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Hemant Mohapatra: "cdev_init() problems in char device driver (kernel 2.6.4-52) (newbie ques)"
Previous message: Nikita Danilov: "Re: FIGETBSZ and FIBMAP for directorys"
In reply to: Nick Piggin: "Re: 2.6.12-rc6-mm1 & 2K lun testing"
Next in thread: Chen, Kenneth W: "RE: 2.6.12-rc6-mm1 & 2K lun testing"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]