James Sutherland writes:
> On Tue, 25 Jul 2000, Malcolm Beattie wrote:
> > Me:
>
> > > Upgrading firmware would typically be an unusual enough event the reboot
> > > wouldn't be an issue. In the environments where a reboot is unacceptable,
> > > a firmware change on-the-fly would probably be unacceptable too...
> >
> > On the contrary, on important 24x7 systems you don't want to have to
> > arrange for downtime just to update firmware/microcode.
>
> I didn't say you would. My point was that changing firmware live on a
> production system where any downtime is critical would not be very safe.
Your point is wrong. You are wrong. Changing firmware on a live
production S/390 system is safe, supported and has been done for
years if not decades (I don't know when CMLIC was introduced).
> > S/390 already supports this (running Linux under VM and maybe running
> > it in an LPAR or raw, but I'm not sure) under the name "Concurrent
> > Maintenance of LIC" (LIC being Licensed Internal Code, the equivalent
> > of microcode or firmware). This includes processors. As other
> > architectures begin to play in the five-nines arena, we don't want
> > Linux being left behind on those architectures just because of some
> > "oh, you can always recompile the kernel or reboot" mentality.
>
> In the case of hard drives, they should be hot-swappable in this
> environment - in which case, you can remove them, upgrade them off-line on
> a spare workstation, then restore them once you know they work.
Stop thinking only about small systems. Linux runs on large systems
too. We are talking about systems with maybe hundreds of disks. Not
only would it be time consuming to hot-swap them all one at a time
but, in the case of S/390, you're going to have difficulty finding a
separate system in which to hot-plug the disks. Your "spare
workstation" would have to be a P/390 with real channel hardware
attached. This is not the way that things are done.
> What would you have done in your production system if the firmware upgrade
> had, say, upset the SCSI ID selection so it conflicted with another drive
> in the RAID array? You just lost two drives at once. Or, worse still, if
> it starts flooding the bus with crap, because the firmware upgrade went
> wrong, or the image had a bug in??
Stop thinking about small systems. You have RAID across multiple busses
(not SCSI) with channel multipathing and hot-boxing. If firmware
upgrades caused such problems then IBM and the other PCMs would have
had much egg on their faces and fixed it. Trying to physically remove
every disk, channel and PU just to do a firmware upgrade on it is far
worse. It would be similar to requiring disk removal just to do mke2fs.
And why are your expecting firmware upgrades to go wrong? Do you really
think a large system vendor would supply untested or broken firmware
upgrades for a high-reliability system? Once again, stop thinking about
small systems. (Oh, and even if a channel is flooded with crap, the
system will automatically hot-box it and choose another path to the
necessary devices transparently, assuming you've multipathed them as
you should).
--Malcolm
-- Malcolm Beattie <mbeattie@sable.ox.ac.uk> Unix Systems Programmer Oxford University Computing Services- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Mon Jul 31 2000 - 21:00:21 EST