[Cialug] Suse upgrades and grub issues
Ken MacLeod
ken at bitsko.slc.ut.us
Thu Jul 24 11:34:41 CDT 2008
On Thu, Jul 24, 2008 at 10:58 AM, Matt Patterson <matt at usrlocal.com> wrote:
> I'm not sure how many people run SuSE on here but I seem to have a recurring
> issue that is really starting to irk me.
I don't run Open/SuSE so take this as more general info.
> The issue happens when installing the kernel patches. I would say that I
> have a 90% failure rate with this. Not with the installation of the kernel
> itself, but the grub updates that happen in relation to the kernel updates.
> I have had issues where the menu.lst file is corrupted or the bigger, more
> popular error is that the stage2 file becomes corrupt and when the server
> reboots, it will go into a reboot loop.
The biggest outstanding problem with package updates is the
interaction between different packages during an update. Between the
kernel, grub, and the stage2 builder in this case, made worse because
of how important it is to booting.
> I've opened a ticket before with Novell for one of the SLES servers and
> provided a b0rked stage2 file for them to look at. Nothing really ever came
> out of this incident since I was able to get the system up on my own. For
> the most part, I have been able to restore the servers by booting a CD in
> rescue mode and untar a backup of the /boot directory and everything is fine
> in the world.
Yes, you shouldn't let Novell slip out of this because you've found a
workaround, and don't let them get away with only giving you a manual
workaround as a final solution. Package updates should "just work".
> Anyone have any ideas here? I'm honestly thinking about removing grub from
> all the servers and installing lilo and hoping that it resolves this issue.
If your short term solution is to continue manually maintaining the
bootloader, and you're more comfortable with lilo, then that might be
a good solution for you. I doubt YaST/package updates will be better
with lilo since as I recall, Open/SuSE now uses grub by default so
lilo wouldn't be as well maintained either.
> Though I think that the real underlying issue is just that YaST is not
> handling an error condition correctly and moving on like nothing has
> happened.
> But this is rather annoying when you are trying to patch
> servers that are hundreds of miles away.
The solution I've found best is to make sure all the systems are as
consistent with each other as possible and you have a test server you
can easily revert back to a known state at any time. This way, when I
run into these problems on my test server locally, then debug, revert
to earlier state, and re-test, I'm more confident it'll work when I
roll-out the change to more servers.
In this case, you're probably looking at overwriting the menu.lst and
stage2 with known good copies after you've done the update but before
you reboot.
Are you/can you use AutoYaST on the remote systems? Re-autoinstalling
is one of the simplest ways I've found to maintain lots of systems,
along with change management tools like cfengine/puppet to maintain
small ongoing changes.
http://c2.com/cgi/wiki?SystemsAdministrationPractices
-- Ken
More information about the Cialug
mailing list