[PATCH] mm: teach mm by current context info to not do I/O during memory allocation

From: Ming Lei
Date: Fri Feb 22 2013 - 19:34:08 EST


This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
'struct task_struct'), so that the flag can be set by one task to avoid
doing I/O inside memory allocation in the task's context.

The patch trys to solve one deadlock problem caused by block device, and
the problem may happen at least in the below situations:

- during block device runtime resume, if memory allocation with
GFP_KERNEL is called inside runtime resume callback of any one of its
ancestors(or the block device itself), the deadlock may be triggered
inside the memory allocation since it might not complete until the block
device becomes active and the involed page I/O finishes. The situation
is pointed out first by Alan Stern. It is not a good approach to
convert all GFP_KERNEL[1] in the path into GFP_NOIO because several
subsystems may be involved(for example, PCI, USB and SCSI may be
involved for usb mass stoarage device, network devices involved too in
the iSCSI case)

- during block device runtime suspend, because runtime resume need to
wait for completion of concurrent runtime suspend.

- during error handling of usb mass storage deivce, USB bus reset will
be put on the device, so there shouldn't have any memory allocation with
GFP_KERNEL during USB bus reset, otherwise the deadlock similar with
above may be triggered. Unfortunately, any usb device may include one
mass storage interface in theory, so it requires all usb interface
drivers to handle the situation. In fact, most usb drivers don't know
how to handle bus reset on the device and don't provide .pre_set() and
.post_reset() callback at all, so USB core has to unbind and bind driver
for these devices. So it is still not practical to resort to GFP_NOIO
for solving the problem.

Thanks,
Junxiao.
>
>>>
>>> And the superblock shrinker is a good example of why this shouldn't be
>>> the case. The main thing that code does is to reclaim clean fs objects
>>> without performing IO. AFAICT the proposed patch will significantly
>>> weaken PF_MEMALLOC_NOIO allocation attempts by needlessly preventing
>>> the kernel from reclaiming such objects?
>> Even fs didn't do io in superblock shrinker, it is possible for a fs
>> process who is not convenient to set GFP_NOFS holding some fs lock and
>> call back fs again?
>>
>> PF_MEMALLOC_NOIO is only set for some special processes. I think it
>> won't affect much.
>
> Maybe not now. But once we add hacks like this, people say "goody" and
> go and use them rather than exerting the effort to sort out their
> deadlocks properly :( There will be more PF_MEMALLOC_NOIO users in
> 2019.
>
> Dunno, I'd like to hear David's thoughts but perhaps it would be better
> to find some way to continue to permit PF_MEMALLOC_NOIO to shrink VFS
> caches for most filesystems and find some fs-specific fix for ocfs2.
> That would mean testing PF_MEMALLOC_NOIO directly I guess.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/