RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online

From: Seiji Aguchi
Date: Fri Dec 07 2012 - 18:43:12 EST


> Can all these things really happen (did you run into this problem on a real system?). Or is this just a theoretical problem. Ugly (but
> practical) hacks might be OK to solve real problems.

It is a theoretical problem right now.
But it is a timing issue and there is a possibility to happen actually.

> But do we really want them to fix problems that actually never happen?

If we find a problem (even if it is theoretical), we can't say "It actually never happen.".

I have some reasons to submit this patch before reproducing actually.

1)
It is too late if we fix a problem after it actually happened in case where we apply Linux, including pstore,
to mission critical systems, because the failure of those systems has a great impact on a whole society.
Customers in this area ask us to fix a problem as soon as possible.
On the other hand, this kind of timing issue is hard to reproduce.
So, our support service engineers often work all night to reproduce it.
It is a nightmare for us.

If we can fix it with a small patch in adance, it is really helpful for us.

2)
In the long term, I plan to add a kmsg_dump to a kexec path because kdump may fail in the real world.
In that case, we need another troubleshooting material like pstore to detect a root cause of failure.

Actually, someone blamed for a reliability of kdump in LinuxCON Europe.
http://events.linuxfoundation.org/images/stories/pdf/lceu2012_holzheu.pdf

To convince a kexec maintainer to add a kmsg_dump, I need to prove that there is no problem in pstore code
causing a failure of kdump.

Seiji

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/