[Cialug] Crashing with errors in mcelog
Daniel A. Ramaley
daniel.ramaley at drake.edu
Sun Mar 1 13:19:03 CST 2009
My main machine at home had 2 crashes this morning. A few months ago it
also had 2 crashes in one day. Normally, of course, the machine doesn't
crash at all. When the crashes happen, the screen freezes, the keyboard
lights start blinking, and the network dies. So the machine is totally
unresponsive to anything other than the reset button. I opened it up
and all fans are working properly. But i did notice
/var/log/mcelog contains a bunch of stuff. The last ~100 lines are
below. Any idea what these messages actually mean? Some of them mention
the North Bridge, others look like a problem with one of the DIMMs. Do
i have a bad DIMM or a bad North Bridge, or a bad something else? The
RAM in the computer is 4 sticks of 2 GB ECC (8 GB total). The machine
is running Debian Testing amd64 with kernel 2.6.26.
DDR2 DIMM 333 Mhz Synchronous Width 64 Data Width 72 Size 2 GB
Device Locator: DIMM3
Bank Locator: BANK3
Manufacturer: Manufacturer3
Serial Number: SerNum3
Asset Tag: AssetTagNum3
Part Number: PartNum3
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 0 data cache TSC c38ea32751
ADDR 1d7e62740
Data cache ECC error (syndrome 15)
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
data read mem transaction
memory access, level generic'
STATUS d40ac00000000833 MCGSTATUS 0
WARNING: SMBIOS data is often unreliable. Take with a grain of salt!
DDR2 DIMM 333 Mhz Synchronous Width 64 Data Width 72 Size 2 GB
Device Locator: DIMM3
Bank Locator: BANK3
Manufacturer: Manufacturer3
Serial Number: SerNum3
Asset Tag: AssetTagNum3
Part Number: PartNum3
MCE 1
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 2 bus unit TSC c38ea32c2d
L2 cache ECC error
Bus or cache array error
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
prefetch mem transaction
memory access, level generic'
STATUS d000400000000863 MCGSTATUS 0
MCE 2
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC c38ea3305a
MISC c008000f00000000 ADDR 1d7e63358
Northbridge RAM ECC error
ECC syndrome = 15
bit33 = err cpu1
bit46 = corrected ecc error
bit59 = misc error valid
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS 9c0ac00200000813 MCGSTATUS 0
DDR2 DIMM 333 Mhz Synchronous Width 64 Data Width 72 Size 2 GB
Device Locator: DIMM3
Bank Locator: BANK3
Manufacturer: Manufacturer3
Serial Number: SerNum3
Asset Tag: AssetTagNum3
Part Number: PartNum3
MCE 3
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 2 bus unit TSC c38ea19306
L2 cache ECC error
Bus or cache array error
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
prefetch mem transaction
memory access, level generic'
STATUS d000400000000863 MCGSTATUS 0
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge TSC 2d7050dbd77
MISC c008001100000000 ADDR 1d7e63358
Northbridge RAM ECC error
ECC syndrome = 15
bit32 = err cpu0
bit46 = corrected ecc error
bit59 = misc error valid
bus error 'local node origin, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS 9c0ac00100000813 MCGSTATUS 0
WARNING: SMBIOS data is often unreliable. Take with a grain of salt!
DDR2 DIMM 333 Mhz Synchronous Width 64 Data Width 72 Size 2 GB
Device Locator: DIMM3
Bank Locator: BANK3
Manufacturer: Manufacturer3
Serial Number: SerNum3
Asset Tag: AssetTagNum3
Part Number: PartNum3
MCE 1
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 2 bus unit TSC 2d704fc7e14
L2 cache ECC error
Bus or cache array error
bit46 = corrected ecc error
bus error 'local node origin, request didn't time out
prefetch mem transaction
memory access, level generic'
STATUS 9000400000000863 MCGSTATUS 0
--
------------------------------------------------------------------------
Dan Ramaley Dial Center 118, Drake University
Network Programmer/Analyst 2407 Carpenter Ave
+1 515 271-4540 Des Moines IA 50311 USA
More information about the Cialug
mailing list