Re: [PATCH] mm: ratelimit end_swap_bio_write() error

From: Michal Hocko
Date: Sat Jan 06 2018 - 08:34:25 EST


On Sat 06-01-18 19:03:13, Sergey Senozhatsky wrote:
> Hello,
>
> On (01/06/18 10:41), Michal Hocko wrote:
> > On Sat 06-01-18 13:34:07, Sergey Senozhatsky wrote:
> > > Use the ratelimited printk() version for swap-device write error
> > > reporting. We can use ZRAM as a swap-device, and the tricky part
> > > here is that zsmalloc() stores compressed objects in memory, thus
> > > it has to allocates pages during swap-out. If the system is short
> > > on memory, then we begin to flood printk() log buffer with the
> > > same "Write-error on swap-device XXX" error messages and sometimes
> > > simply lockup the system.
> >
> > Should we print an error in such a situation at all? Write-error
> > certainly sounds scare and it suggests something went really wrong.
> > My understading is that zram failed swap-out is not critical and
> > therefore the error message is not really useful.
>
> I don't mind to get rid of it. up to you :)

I do not think we can get rid of it for all swap backends.

> > Or what should an admin do when seeing it?
>
> zsmalloc allocation is just one possibility; an error in
> compressing algorithm is another one, yet is rather unlikely.
> most likely it's OOM which can cause problems. but in any case
> it's sort of unclear what should be done. an error can be a
> temporary one or a fatal one, just like in __swap_writepage()
> case. so may be both write error printk()-s can be dropped.

Then I would suggest starting with sorting out which of those errors are
critical and which are not and report the error accordingly. I am sorry
to be fuzzy here but I am not familiar with the code to be more
specific. Anyway ratelimiting sounds more like a paper over than a real
solution. Also it sounds quite scary that you can see so many failures
to actually lock up the system just by printing a message...
--
Michal Hocko
SUSE Labs