Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared

From: Alexander Huemer
Date: Mon Oct 26 2009 - 11:02:17 EST


Jean Delvare wrote:
> Le mercredi 21 octobre 2009, Alexander Huemer a écrit :
>
>> Jean Delvare wrote:
>>
>>> OK, here I am, sorry for the delay. I've read the discussion thread.
>>> Here are the few data points I can offer, in the hope it will help:
>>>
>>> * While the i2c-i801 driver received some changes in kernel 2.6.30,
>>> none of these are related to PCI nor interrupts. So as the problem
>>> is new in kernel 2.6.30, the i2c-i801 driver alone is unlikely to
>>> cause it. This may, however, be a combination of something i2c-i801
>>> does and something the pci subsystem does since kernel 2.6.30. For
>>> this reason, I would still recommend a bisection if the problem can
>>> be reliably reproduced. I know it takes time, but it is always
>>> easier to fix a bug when we know which commit introduced it.
>>>
>>> * The i2c-i801 driver does _not_ make use of interrupts. It is
>>> poll-based (I am not exactly proud of that, but that's the way it
>>> is.)
>>>
>>> #define ENABLE_INT9 0 /* set to 0x01 to enable - untested */
>>>
>>> So I am very surprised to read that this driver would cause an IRQ
>>> storm.
>>>
>>> * One thing the i2c-i801 driver does on the PCI device is:
>>>
>>> err = pci_enable_device(dev);
>>>
>>> I presume this is what causes the following message in dmesg:
>>>
>>> i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23
>>>
>>> Basically, even though the driver doesn't make use of interrupts,
>>> the IRQ is still registered because this is how the hardware is
>>> setup.
>>>
>>> As a conclusion, I suspect that 2 things may be happening: either
>>> the SMBus is triggering interrupts when told not to. The ICH6 is a
>>> bit different from all the other supported chips, I'll double check
>>>
>
> My bad, it's an 63xxESB-based board, not ICH6. I must have been
> mixing data from a different bug.
>
>
>>> if we may have missed something. Or, something else is triggering
>>> SMBus transactions. SMI and ACPI come to mind. If this is the case
>>> then you do not want to use i2c-i801 on this motherboard.
>>>
>>> Questions to Alexander :
>>>
>>> * Can I please see the output of "sensors" on your system?
>>> * What are the brand and model of your motherboard?
>>> * Can we get an acpidump for your system?
>>>
>>>
>>>
>> many thanks for your response. i appreciate that.
>> first, the data you requested:
>>
>> sensors: http://xx.vu/~ahuemer/sensors-ahuemer-20091021.txt
>> acpidump: http://xx.vu/~ahuemer/acpidump-ahuemer-20091021.txt
>>
>
> The good news is that I can't see any access to the SMBus in the
> ACPI tables. Nothing can be said about the SMIs though, without an
> intimate knowledge of the BIOS.
>
>
>> motherboard: tyan tempest i5400pw/s5397 with one intel xeon e5420.
>>
>> the output of sensors was made _without_ i801_smbus in the kernel.
>>
>
> Then please once again with it. My whole point was to know whether
> there was any hardware monitoring chip connected to the SMBus. Your
> initial kernel configuration suggests that you have a W83793G chip
> there.
>
>
>> i noticed that the data of w83627hf-isa-0290 is quite weird. i do not
>> have an explanation for that.
>>
>
> I do. This happens when the manufacturer decides that the hardware
> monitoring features of the Super-I/O are insufficient for their
> needs. They add a dedicated chip for the hardware monitoring. This
> is particularly frequent on server boards from Tyan and SuperMicro.
> Ideally they would _also_ disable the feature on the Super-I/O side,
> but often then do not, so the driver still loads, but outputs
> garbage.
>
> You can see the following messages in your log:
> [ 3.878703] w83627hf w83627hf.656: Enabling temp2, readings might not make sense
> [ 3.881708] w83627hf w83627hf.656: Enabling temp3, readings might not make sense
> This is a good hint that this is the case (if the nonsensical data
> displayed by "sensors" wasn't enough to convince you.)
>
> So you should stop loading/including kernel module w83627hf.
>
>
>> if a bisection is what will bring light into this, i am willing to take
>> the time.
>> so that would be a bisection between 2.6.29 and 2.6.30 ?
>> a quicker test case would be good for that, but i don't have one yet,
>> just the compilation of gcc, which takes time, even on this machine with
>> tmpfs and ccache.
>>
>
>
here is the output you requested:
http://xx.vu/~ahuemer/sensors_ahuemer_with_i801_20091026.txt
i am currently in the middle of a bisection between 2.6.29 and 2.6.30, 8
steps left.
many thanks for the info on hardware monitoring.
i'll report back when bisection is finished.

regards
-alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/