H196538: Controller reboots if multiple CASDs attempted - IBM System Storage



Source

RETAIN tip: H196538

Symptom

A System Storage DS3000, DS4000, or DS5000 storage subsystem will reboot with a Memory Free panic if multiple Collect All Support Data (CASD) bundles are requested too close together. If only one CASD is in process at any one time, then this panic will not occur.

For example, if multiple DS Storage Manager 10.60.x5.17 Clients are used to manage a DS3000, DS4000 or DS5000 controller, and the Support Monitor default collection time of 2am is used to automatically collect a CASD, then a CASD collection request will come from each DS Storage Manager Client at the same time. These multiple requests will cause the storage subsystem to panic with a Memory Free error.

Affected configurations

The system may be any of the following IBM storage products:

  • DS3200, type 1726, any model
  • DS3300, type 1726, any model
  • DS3400, type 1726, any model
  • DS3950 Express, type 1814, any model
  • DS4200 Storage Server, type 1814, any model
  • DS4300 (FAStT600) Dual Controller and Turbo Storage Server, type 1722, any model
  • DS4500 (FAStT900) Storage Server, type 1742, any model
  • DS4700 Storage Server, type 1814, any model
  • DS4800 Storage Server, type 1815, any model
  • DS5020 Disk Controller (1814-20A), any model
  • DS5100 Storage Controller (1818-51A), any model
  • DS5300 Storage Controller (1818-53A), any model

This tip is not software specific.
This tip is not option specific.

Solution

This behavior will be corrected in a future release of controller firmware for the System Storage DS3000, DS4000, and DS5000 products.

The target date for this release is scheduled for third quarter 2010.

The file will be available from the IBM System Storage Support web site at the following URL:

  http://www.ibm.com/systems/support/

Workaround

In order to avoid this panic, ensure that multiple requests for CASDs are not made at the same time to the same storage subsystem.

If you are using multiple copies of the DS Storage Manager Client 10.60.x5.17 to manage your storage subsystems, then the following options can be used to ensure that the Support Monitor does not cause the storage subsystem to panic:

Workaround 1:

Stagger the times that the various Support Monitors are scheduled to collect CASDs from the same subsystem. The default time to collect a CASD is 2am. For each distinct Support Monitor, ensure that there is at least a 15 minute gap between the scheduled times. However, the amount of time that it takes to complete a CASD may vary based upon your system configuration. Some systems with large configurations have been seen to take 20 minutes or longer to complete a CASD. To change the time that a CASD is collected, perform the following steps:

  a. On the DS Storage Manager Client workstation, point your Web browser to http://localhost:9000/
b. Click on the Calendar icon next to each subsystem to schedule the data collection frequencies and time.

Workaround 2:

Ensure that any one storage subsystem is managed by only one DS Storage Manager 10.60.x5.17 Client. To determine which storage subsystems are being managed by a particular DS Storage Manager Client, from the Enterprise Management window, right-click on the subsystem that you want to remove from the tree view and select Remove Storage Subsystem from the pop-up menu.

Workaround 3:

Change the frequency that the Support Monitor collects a CASD to "never". This is not recommended as it eliminates the benefit of the Support Monitor as the required data needed to resolve a problem, should one occur, may not be available. In order to change the frequency to "never", perform the following steps:

  a. On the DS Storage Manager Client workstation, point your Web browser to http://localhost:9000/
b. Click on the Calendar icon next to each subsystem and select Never in the Schedule Support Data Collection option.

Additional information

This issue arises due to a static variable being used to save data related to the CASD collection process. As long as there is just one CASD being collected at a time, this method works properly. If multiple collections are performed too close together, then the data gets overwritten and the Memory Free panic occurs.

The new Support Monitor in the DS Storage Manager Client 10.60.x5.17 now automatically schedules CASD collections at 2am. If multiple copies of the Client are used to manage the same storage subsystems and the default collection time is not made unique for each Client, then multiple requests will come to the storage subsystem too close together such that the first CASD will not have time to complete before the other requests come in. This will cause the overwriting of data.

The Storage Manager Client 10.60.x5.17 README file does state the following restriction related to the Support Monitor function of the Storage Manager Profiler:

  1. Monitor each storage subsystem from one Storage Manager Profiler instance. Gathering data from a storage subsystem with multiple Storage Manager Profiler instances can cause problems. No mechanisms exist that prevent multiple Storage Manager Profiler instances from trying to find data from the same storage subsystem.

  2. When multiple Storage Manager instances are installed in the environment, do not define the same storage subsystem twice in the Storage Manager Enterprise Management window.

The Workarounds ensure that multiple requests for CASD bundles will not come too close together to cause the panic.

The Fix will change the method of storing data from using static storage to allocating storage for each unique request.

 

Applicable countries and regions

 


Document id:  MIGR-5083168
Last modified:  2010-07-07
Copyright © 2014 IBM Corporation