Memory disabled with scrub failures in the system log - IBM Servers



Source

RETAIN tip: H204935

Symptom

When a failure is detected during early boot 'memory initialization' phase on a Dual In-Line Memory Module (DIMM), the Integrated Management Module (IMM) will log memory disabled as well as log scrub failure messages for that DIMM.

Note: This event is not a true Memory Scrub failure. This event is generated by the Unified Extensible Firmware Interface (UEFI) from a Power On Self Test (POST) Memory Failure, such as Failed Training or Failed MRC.

This feature was added across all IMM systems in the Third Quarter 2011 life cycle release and will be incorporated in IMM2. For example, when the system detects the memory error on slot 8, the log will have scrub failure in the log as well.

 

10 I 01/01/1970; 01:00:59 0x806f040c2008ffff "Memory Device 8" disabled on subsystem "System Memory"

11 E 01/01/1970; 01:00:59 0x806f030c2008ffff Scrub Failure for "Memory Device 8" on subsystem "System Memory"

Runtime memory failures that cannot be corrected will be logged with the following error:

Uncorrectable error detected for "Memory Device n" on subsystem "System Memory".(n = DIMM number)

SPECIAL Note:
See RETAIN Tip H211626, as it relates to reporting DIMM failures during POST time for some of the affected systems.

Affected configurations

The system can be any of the following IBM servers:

This tip is not software specific.

This tip is not option specific.

The following system BIOS/UEFI levels are affected:

See the Additional Information section for affected code levels by system model and type.

The system has the symptom described above.

Workaround

The DIMM should be replaced according to the standard action plan for the memory scrub failure documented in the Problem Determination and Service Guide (PDSG) for the affected product.

Additional information

This behavior was changed in the Third Quarter 2011 release of IMM firmware. It is used to distinguish between POST and Runtime memory failures and ensures debug efforts can be focused in the right area. No UEFI change was required to enable this enhancement.

The error is reported by UEFI and notifies the IMM via the 'vgpio' command. Memory init code will disable DIMM(s) indicated by this error message. User should follow proper documented PDSG steps.

The following IMM firmware level(s) are affected:

  • BladeCenter HS22, HS22V and HX5 1.31 YUOOC7F or later
  • iDataPlex dx360 M2/M3 1.31 YUOOC7F or later
  • iDataPlex dx360 M4 1.25 1AOO26K or later
  • System x3200 M3 and x3250 M3 1.31 YUOOC7F or later
  • System x3400 M2/M3 and x3500 M2/M3 1.31 YUOOC7F or later
  • System x3500 M4 and x3550 M4 1.25 1AOO26K or later
  • System x3550 M2/M3 and x3650 M2/M3 1.31 YUOOC7F or later
  • System x3620 M3 and x3630 M3 1.08 HSE119AUS 1.31 YUOOC7F or later
  • System x3650 M4 1.26 1AOO26N or later
  • System x3530 M4 and x3630 M4 1.60 BEE136C or later
  • System x3690 X5 1.31 YUOOC7F or later
  • System x3850 X5 and x3950 X5 1.31 YUOOC7F or later

SPECIAL Note:
See RETAIN Tip H211626, as it relates to reporting DIMM failures during POST time for some of the affected systems.


Applicable countries and regions

 


Document id:  MIGR-5089246
Last modified:  2014-04-09
Copyright © 2014 IBM Corporation