Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc

From: Mason
Date: Tue Jul 05 2016 - 16:26:51 EST


On 05/07/2016 18:20, Florian Fainelli wrote:
> On 07/05/2016 08:56 AM, Mason wrote:
>> On 05/07/2016 17:28, Florian Fainelli wrote:
>>
>>> nb8800.c does not currently show suspend/resume hooks implemented, are
>>> you positive that when you suspend, you properly tear down all HW, stop
>>> transmit queues, etc. and do the opposite upon resumption?
>>
>> I am currently testing the error path for my suspend routine.
>> Firmware is, in fact, denying the suspend request, and immediately
>> returns control to Linux, without having powered anything down.
>>
>> I expected not having to save any context in that situation.
>> Am I mistaken?
>
> It depends what power state you are going to and resuming from, and how
> much of this is platform dependent, on the platforms I work with S2
> preserves register states for our On/Off domain, while S3 only keeps an
> always-on power island and shuts off the On/Off domain, you therefore
> need to have your drivers in the On/Off domain suspend any activity and
> preserve important register states, or re-initialize them from scratch
> whichever is the most convenient.

Thanks for bringing these details to my attention, they will
definitely prove useful when I test an actual suspend/resume
sequence. However, I must stress that the platform did NOT
power down in my test case, because the firmware currently
denies all suspend requests.

Therefore, loss of context cannot possibly explain the
warning I am seeing.

>> You mention "stop transmit queues". Can you say more about this?
>
> See drivers/net/ethernet/broadcom/genet/bcmgenet.c which is a driver
> that takes care of that for instance, look for bcmgenet_{suspend,resume}

Thanks. I will look into it.

If I understand correctly, something is missing in the
network interface code? (My system is using an NFS root
filesystem, so network is an important subsystem.)

>>> Is your system clocksource also correctly saved/restored, or if you go
>>> through a firmware in-between could it be changing the counter values
>>> and make Linux think that more time as elapsed than it really happened?
>>
>> Thanks for pointing this out, I was not aware I was supposed to save
>> and restore the tick counter on suspend/resume. (This is not an issue
>> in this specific situation, as the platform is NOT suspended.)
>
> You don't have to save and restore the clocksource counter, although if
> you want proper time accounting to be done across suspend states, you
> would want to use a clocksource which is persistent across these suspend
> states.

The clocksource is a 27 MHz 32-bit tick counter. In other words,
the counter wraps around every 159 seconds. If Linux suspends
for several hours, how can it determine how much time went by?

Regards.