Re: swap_info_get: Bad swap offset entry 0200f8a7

From: Minchan Kim
Date: Thu Oct 19 2017 - 20:33:58 EST


Hello,

On Sun, Oct 15, 2017 at 05:17:36PM -0700, Christian Kujau wrote:
> Hi,
>
> every now and then (and more frequently now) I receive the following
> message on this Atom N270 netbook:
>
> swap_info_get: Bad swap offset entry 0200f8a7
>
> This started to show up a few months ago but appears to happen more
> frequently now:
>
> 4 May < Linux version 4.11.2-1-ARCH
> 4 Jun < Linux version 4.11.3-1-ARCH
> 7 Jul < Linux version 4.11.9-1-ARCH
> 4 Aug < Linux version 4.12.8-2-ARCH
> 24 Sep < Linux version 4.12.13-1-ARCH
> 158 Oct < Linux version 4.13.5-1-ARCH
>
> I've only found (very) old reports for this[0][2] with either no
> solution[1] or some hinting that this may be caused by hardware errors.

Since 4.11, there are lots of happenings in swap subsystem to be optimized
so it might be related to one of those changes but I'm not sure.
Worth to Ccing Huang who may know somethings since then.

Thanks.

>
> In my case howerver no kernel BUG messages or oopses are involved and no
> PTE errors are logged. The machine appears to be very stable, although
> memory usage is quite high on that machine (but no OOM situations so
> far either). As the machine is only equipped with 1GB of RAM, I'm
> using ZRAM on this system, which usually looks something like this:
>
> $ zramctl
> NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT
> /dev/zram0 lz4 248.7M 195.7M 74M 78.7M 2 [SWAP]
>
> I suspect that, when memory pressure is high, zram may not be quick enough
> to decompress a page leading to these messages, but then I'd have expected
> a zram error message too.
>
> Can anybody comment on these messages? If they're really indicating a
> hardware error, shouldn't there be other messages too? So far, rasdaemon
> has not logged any errors.
>
> Thanks,
> Christian.
>
> [0] http://lkml.iu.edu/hypermail/linux/kernel/0204.3/0165.html
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=432337
> [2] https://access.redhat.com/solutions/218733
> --
> BOFH excuse #323:
>
> Your processor has processed too many instructions. Turn it off immediately, do not type any commands!!