H207952: Potential undetected data corruption when expanding a RAID 1/10 array - IBM System Storage DS Storage Controller



Source

RETAIN tip: H207952

Symptom

A potential undetected data corruption issue was discovered during internal testing of the IBM System Storage DS Storage Controller.

The issue is present only with Redundant Array of Independent Disks (RAID) 1/10 and only if there is a controller restart during a Dynamic Capacity Expansion (DCE) operation.

This issue has not been reported by any external users; however, this issue will occur if a controller running a level of firmware affected by this issue restarts during a DCE on a RAID 1/10 array.

Affected configurations

The system can be any of the following IBM servers:

  • IBM System Storage DCS3700 Storage Subsystem, type 1818, any model
  • IBM System Storage DS3512, type 1746, any model
  • IBM System Storage DS3524, type 1746, any model
  • IBM System Storage DS3950 Express, type 1814, any model
  • IBM System Storage DS5020 Disk Controller (1814-20A), any model
  • IBM System Storage DS5100 Storage Controller, type 1818, any model
  • IBM System Storage DS5300 Storage Controller, type 1818, any model

This tip is not software specific.

This tip is not option specific.

The firmware levels 7.75, 7.77, 7.83, and 7.84 for the DS Storage Controller are affected.

Solution

This behavior is corrected in the 7.84.46.00 and later releases of the controller firmware.

The file is available by selecting the appropriate Product Group, type of System, Product name, Product machine type, and Operating system on IBM Support's Fix Central web page, at the following URL:

Workaround

The workarounds for this issue are to use a RAID level other than 1/10 or to not perform a DCE on a RAID 1/10 array until the fix has been applied.

However, if these workarounds are not viable in the user's environment and a DCE of a RAID 1/10 array must be performed before installing the indicated firmware fix, then note the following:

If a DCE must be performed, backup the array, perform the DCE and, after the DCE completes, perform a media scan with redundancy check of the array.

If that completes successfully, then the DCE was successful. If parity errors were detected, then the DCE did not complete successfully and a restore of the backup is required. The restore of the backup can occur over the newly expanded array without issues.

If there are any concerns about whether a prior DCE was successful, perform a media scan with redundancy check of the array. If parity errors are detected, then the prior DCE did not complete successfully and a restore from the backup is required.

Additional information

In the 7.75 firmware release, a change was made to the way that DCE was implemented for RAID 1/10 arrays. The change was made to preserve tray loss protection (TLP) that the user could have established by deliberate selection of data and mirror drive relationships.

However, if a controller restart occurs during the DCE, the drive mapping for the mirrored pair gets corrupted, which eventually will result in data corruption.

Applicable countries and regions

 


Document id:  MIGR-5092826
Last modified:  2013-06-04
Copyright © 2014 IBM Corporation