Eric W. Biederman wrote:"Kok, Auke" <auke-jan.h.kok@xxxxxxxxx> writes:
Ingo Molnar wrote:Sorry for the slow delay. I was out of town for my brothers wedding the last few* Kok, Auke <auke-jan.h.kok@xxxxxxxxx> wrote:yes.
the bug was the warning message (a WARN_ON()) above - not an oops. So thatI tried the 3-patch series "[PATCH 0/3] Basic msi bug fixes.." and they fixBUG: at drivers/pci/msi.c:611 pci_enable_msi()I would poke Eric Biederman(sp?) about this one. Maybe its even solved by
the MSI-enable-related patch he posted in the past 24-48 hours.
this problem for me. Were you expecting the OOPS in the first place? [...]
warning message is gone in your testing?
days.
I wasn't exactly expecting the WARN_ON to trigger. What I fixed was
an inconsistency in handling our state bits. Fixing that
inconsistency appears to have fixed the e1000 usage scenario mostly by
accident.
The basic issue is that pci_save_state saves the current msi state
along with other registers, and then the e1000 driver goes and
disables the msi irq after we have saved the irq state as on.
My code notices that the msi irq was disabled before restore time, so
it skips the restore. However we now have a leak of the msi saved cap
because we are not freeing it.
This leaves with some basic questions.
- Does it make sense for suspend/resume methods to request/free irqs?
- Does it make sense for suspend/resume methods to allocate/free msi irqs?
- Do we want pci_save/restore_cap to save/restore msi state?
The path of least resistance is to just free the extra state and we
are good. I'm just not quite certain that is sane and it has been a
long day.
we used to have a lengthy e1000_pci_save|restore_state in our code, which is now gone, so I'm all for that. A separate pci_save_pxie|msi(x)_state for every driver seems completely unnecessary. I can't think of a use case where saving+restoring everything hurts. That's what you want I presume.
We currently free all irq's and msi before going into suspend in e1000, and I think that is probably a good thing, somehow I can think of bad things happening if we dont, but I admit that I haven't tried it without alloc/free. We do this in e100 as well and it works.
Another motivation would be to leave this up to the driver: if the driver chooses to free/alloc interrupts because it makes sense, you probably would want to keep that choice available. Devices that don't need this can skip the alloc/free, but leave the choice open for others.