Re: [PATCH] mm: ratelimit end_swap_bio_write() error

From: Sergey Senozhatsky
Date: Mon Jan 08 2018 - 05:22:47 EST


On (01/08/18 09:37), Michal Hocko wrote:
[..]
> > the lockup is not the main problem and I'm not really trying to
> > address it here. we simply can fill up the entire kernel logbuf
> > with the same "Write-error on swap-device" errors.
>
> Your changelog is rather modest on the information.

fair point!

> Could you be more specific on how the problem actually happens how
> likely it is?

ok. so what we have is

slow_path / swap-out page
__zram_bvec_write(page)
compressed_page = zcomp_compress(page)
zs_malloc(compressed_page)
// no available zspage found, need to allocate new
alloc_zspage()
{
for (i = 0; i < class->pages_per_zspage; i++)
page = alloc_page(gfp);
if (!page)
return NULL
}

return -ENOMEM
...
printk("Write-error on swap-device...");


zspage-s can consist of up to ->pages_per_zspage normal pages.
if alloc_page() fails then we can't allocate the entire zspage,
so we can't store the swapped out page, so it remains in ram
and we don't make any progress. so we try to swap another page
and may be do the whole zs_malloc()->alloc_zspage() again, may
be not. depending on how bad the OOM situation is there can be
few or many "Write-error on swap-device" errors.

> And again, I do not think the throttling is an appropriate counter
> measure. We do want to print those messages when a critical situation
> happens. If we have a fallback then simply do not print at all.

sure, but with the ratelimited printk we still print those messages.
we just don't print it for every single page we failed to write
to the device. the existing error messages can (*sometimes*) be noisy
and not very informative - "Write-error on swap-device (%u:%u:%llu)\n";
it's not like 1000 of those tell more than 1 or 10.

-ss