Re: [PATCH] mm: ratelimit end_swap_bio_write() error

From: Michal Hocko
Date: Mon Jan 08 2018 - 03:37:50 EST


On Mon 08-01-18 10:58:18, Sergey Senozhatsky wrote:
> On (01/06/18 14:34), Michal Hocko wrote:
> > > zsmalloc allocation is just one possibility; an error in
> > > compressing algorithm is another one, yet is rather unlikely.
> > > most likely it's OOM which can cause problems. but in any case
> > > it's sort of unclear what should be done. an error can be a
> > > temporary one or a fatal one, just like in __swap_writepage()
> > > case. so may be both write error printk()-s can be dropped.
> >
> > Then I would suggest starting with sorting out which of those errors are
> > critical and which are not and report the error accordingly. I am sorry
> > to be fuzzy here but I am not familiar with the code to be more
> > specific. Anyway ratelimiting sounds more like a paper over than a real
> > solution. Also it sounds quite scary that you can see so many failures
> > to actually lock up the system just by printing a message...
>
> the lockup is not the main problem and I'm not really trying to
> address it here. we simply can fill up the entire kernel logbuf
> with the same "Write-error on swap-device" errors.

Your changelog is rather modest on the information. Could you be more
specific on how the problem actually happens, how likely it is?

And again, I do not think the throttling is an appropriate counter
measure. We do want to print those messages when a critical situation
happens. If we have a fallback then simply do not print at all.
--
Michal Hocko
SUSE Labs