[Cialug] Crashing with errors in mcelog
David Champion
dave at dchamp.net
Tue Mar 3 10:16:11 CST 2009
Daniel A. Ramaley wrote:
> On 2009-03-03 at 09:32:52, Matthew Nuzum wrote:
>
>> Regarding crashes coming in pairs, is it possible the reason for the
>> second crash is a warm-boot vs. cold-boot problem? For example, I've
>> seen in several instances where a computer will not properly reset
>> itself on warm-boot (reboot command or ctrl+alt+del or etc) and crash
>> very shortly after boot. However if you hit the power button and give
>> the computer 30s of rest then it works.
>>
>
> I've seen situations similar to what you describe, where warm and cold
> boots differ in their result. In my case the machine crashes too
> totally for a warm boot to be possible, so i reboot by hitting the
> reset button on the front of the case. I didn't power cycle it. But i
> do let the BIOS RAM checks run to completion (does that zero out the
> RAM?). Hitting the reset button in most cases *should* be equivalent to
> a power outage, but i know it isn't *entirely* identical. The hard
> drives keep spinning for one thing, and i'm guessing miscellaneous
> device memory (such as drive controllers, graphics card, sound card
> buffer) might not be reset the same.
>
> After the second boot the machine seems to run fine for awhile. Several
> months ago it had this double crash problem, and then it was fine until
> this weekend. I figured i'd have a few more months again, but then it
> did it this morning. Arrrgh.
>
> I hope the problem turns out to be something relatively cheap and easy
> to fix, like RAM. All the components in the machine are name-brand and
> i've had it running 24/7 for about a year and a half though, so i'm not
> sure why it would start having trouble now.
>
A warm boot and cold boot are pretty different. I've had situations
where a device's firmware had to be initialized by booting into Windows
to load it, then doing a warm boot into Linux, because the Linux driver
wouldn't load the firmware. Don't recall off hand what that was, maybe a
scsi controller, or a modem, and that was probably more than 10 years
ago. :)
There have been security notices about viruses that can survive a warm
boot, by loading into a higher memory location in RAM.
I'm sure there's probably a utility to look through RAM for interesting
things. When I was taking mainframe programming classes at DMACC, it was
always interesting when you crashed a program and got a hex dump of your
memory space, and it contained un-initialized memory from someone
elses's stuff. Sometimes you'd get some or all of another student's code
or data. Of course, you'd get into a lot of trouble if they caught you
doing this on purpose...
-dc
More information about the Cialug
mailing list