Troubleshooting Video issues - IBM BladeCenter



Source

RETAIN tip: H187971

Symptom

This document gives troubleshooting guidance for troubleshooting local and remote video problems on the BladeCenter. There is also information included about the troubleshooting remote disk which is a feature of remote video.

Affected configurations

The system may be any of the following IBM servers:

  • BladeCenter Chassis, type 7967, any model
  • BladeCenter Chassis, type 8677, any model
  • BladeCenter H, type 8852, any model
  • BladeCenter H, type 7989, any model
  • BladeCenter HS20, type 1883, any model
  • BladeCenter HS20, type 1884, any model
  • BladeCenter HS20, type 7981, any model
  • BladeCenter HS20, type 8678, any model
  • BladeCenter HS20, type 8832, any model
  • BladeCenter HS20, type 8843, any model
  • BladeCenter HS21, type 8853, any model
  • BladeCenter HS40, type 8839, any model
  • BladeCenter JS20, type 8842, any model
  • BladeCenter JS21, type 8844, any model
  • BladeCenter JS21, type 7989, any model
  • BladeCenter JS41, type 8844, any model
  • BladeCenter LS20, type 8850, any model
  • BladeCenter LS21, type 7971, any model
  • BladeCenter LS41, type 7972, any model
  • BladeCenter QS20, any model
  • BladeCenter T, type 8720, any model
  • BladeCenter T, type 8730, any model

This tip is not software specific.

This tip is not option specific.

Solution

This document gives troubleshooting guidance for troubleshooting local and remote video problems on the BladeCenter. There is also information included about troubleshooting remote disk, which is a feature of remote video.

INITIAL STEPS:

  1. Search RETAIN tips with the keywords "BladeCenter video" to find the most recent tips on specific defects that have already been identified by IBM.

  2. Examine Management Module (MM) logs for errors. Be aware that most video problems do not result in log entries on the Management Module.

  3. Determine whether the problem is with local or remote video, or both. If the failure only occurs during remote video, troubleshooting should begin with the Management Module section.

  4. Determine whether the problem occurs in all slots with all blades, or a subset of them.

  5. Switch a few of the blades with working video with a few of the blades with video problems. Note whether the problems follow the blades or stay with the slots.

  6. Troubleshooting local video problems requires a local keyboard and monitor. Make sure you have one availible. Except for the BladeCenter JS20 and JS21, all blades have a video controller. The blades send their video signal to the midplane, which in turn sends the signals to the MM. The MM then decides whether to output the video signal to the physical RGB connector on the back of the MM, or to redirect it over TCP/IPto a Java-based remote control session. A logical diagram of video signals looks like this:

    Local video:

    OS or BIOS --> video chip --> midplane --> MM (physical video cable) --> possible RCM --> monitor

    Remote video:

    OS or BIOS --> video chip --> midplane --> MM (TCP/IPredirection) --> network --> JVM/web browser

    It can be seen from the discussion and diagram above that if local video works but remote video does not work, the problem cannot be in the operating system/BIOS, blade, or midplane, since the first three steps in video signal path are the same in Local and Remote video. Use the information below to further isolate the failure:
BLADE RELATED FAILURES

OS and BIOS

Operating Systems cause very few problems with either local or remote video on the BladeCenter. There have been a few issues with linux that are explained in RETAIN tips. The simple way to determine whether the OS is causing the problem is to boot a blade to BIOS and see if local and/or remote video are working properly. The BIOS does not need any extra support to display video locally or remotely.

VIDEO CHIP

The video chip has caused very few problems. Mostly, one would expect video chip problems to occur on just one or a few blades. However, if the customer uses the same software image on multiple blades, a problematic video driver or video configuration could cause local or remote video problems to occur across an entire chassis. Useful steps in problem isolation include:

  1. Booting the blade in VGA mode.

  2. Moving the blade to a chassis not having video problems.

  3. Trying a default, clean installation of the OS being used.

  4. Switch the Hard Drive between working and non-working blades.

CHASSIS RELATED FAILURES

MIDPLANE

The video components on early versions of the midplane are daisy chained together into one long bus. If one component in that chain fails, all higher slots past the failure will loose video. Symptomatically, a failure on the bus would result in video working on (for example) slots 1-5, and neither local nor remote video working on slots 6-14. That would indicate that the component for slot 6, or the link between slots 5 and 6, has failed.

The video bus on newer versions of the midplane eliminate the daisy chain by employing 4 muxes. This puts blades 1-4 on mux 1, blades 5-8 on mux 2, blades 9-12 on mux 3, and blades 13-14 on mux 4. If a component fails on this midplane, a group of slots will loose video. Symptomatically, a failure on the bus would result in video working on (for example) all the slots except 1-4, or all the slots except 5-8, etc. That symptom indicates that a particular mux had failed.

Steps to isolate a failure to a midplane:

  1. Verify that the video fails both locally and remotely. Note which slots fail and which slots work, and combine that infomation with the design of the midplane to create a working hypothesis of the failed component.

  2. If there are 2 MMs in the chassis, fail over to the redundant MM. If the failure occurs on the same slots same with the other MM slot active, it is nearly impossible for a root cause to be the midplane. If the problem resolves after failing over to the redundant MM, label this MM as "good" MM and set it aside. If the problem occurs again after the "good" MM has been removed, proceed with the next steps to see if the midplane or MM is the cause of the problem.

  3. Move the MM from the current "bad" MM slot to the other MM slot. If the failure occurs on the same blades with the MM in the other slot, it is nearly impossible for a root cause to be the midplane. Try resetting the MM to the default configuration and see if the problem persists. If the problem persists with the MM in either slots and running the default configuration, replace the MM.

  4. If the failure does not occur when the MM is moved from one slot to the other, reset the MM to the default configuration and move the MM back to the previous slot. If the problem returns as before, replace the midplane.

MANAGEMENT MODULE and ADVANCED MANAGEMENT MODULE

Once the Management Module receives the signal, it has to redirect it to either the RGB connector or over the network. Redirecting over the network is done by the Remote Control portion of the MM firmware. Though the Remote Control portion of the MM firmware should always be kept at the same level as the Main Application, it is possible to have them out of sync. This is not a good idea, since the IBM test team does not test this configuration. Flash the Main App and Remote Control firmware to the same level if the customer is experiencing remote control problems, but the local video works.

The following steps address situations in which local video has problems. If local video works, but remote video is experiencing problems, skip to the NETWORK section to continue troublehooting.

  1. If the chassis has two MMs, fail over to the redundant one and see if the problem continues. If it does continue, you have ruled out the midplane as the culprit. The midplane has two redundant paths, and the odds of seeing the same failure on both paths are extremely remote. Skip to step #4 if the problem continues.

  2. If the problem stops, remove the suspect MM and examine the female connectors for damage. Damage indicates bent pins on the midplane connector for a slot in this chassis, or any other chassis the MM has been in during its life. If you use a flashlight to examining the MM slots, you can often see bent pins if they exist. Bent pins on the midplane require midplane replacement to resolve. MMs with damaged female connectors should also be replaced.

  3. If no damage can be found on the MM or in the I/O slot, lay the MM aside for the time being and proceed to the next step. Remember that this MM is the original primary MM in slot X.

  4. Remove the MM from slot X and examine the female connectors for damage. Damage indicates bent pins on the midplane connector for a slot in this chassis, or any other chassis the MM has been in during its life. If you use a flashlight to examining the MM slots, you can often see bent pins if they exist. Bent pins on the midplane require midplane replacement to resolve.

  5. Assuming you do not see evidence of bent pins in the step above, put the MM in slot Y. If the problem continues, you just have ruled out the midplane as the culprit. The midplane has two redundant paths, and the odds of seeing the same failure on both paths are very long. If the problem , stops,move the MM back to the other slot. If the problem starts again, the midplane is your suspect and you should focus your action plan on it.

  6. If the problem occurs in both slots with this MM, make a backup copy of the MM config, then reset the MM to defaults. Remember that when you do this, the IPconfig will try to obtain an IPaddress from a DHCP server. To avoid having the MM get a DHCP address, disconnect the ethernet cable from the MM as soon as it powers off. After 5 minutes the MM will stop requesting a DHCP address and will take the address 192.168.70.125/255.255.255.0.

If resetting the config resolves the problem, have the customer send the config to the BackOffice for examination. If the problem still occurs after this step, replace the MM.

RCM OR KVM

If the customer is using a RCM/KVM, try directly connecting a monitor to eliminate it as a possible cause. If that is not possible, and there are other chassis connected to the same KVM/RCM that are working correctly, swapping cables between the working and non-working chassis will also rule out the KVM/RCM as a potential cause of the issue. Make sure to move only the BladeCenter end of the cables. This will eliminate the cabled and RCM/KVM ports at the same time.

NETWORK

If local video works and remote video does not display, the MM could be having a problem sending the data over the network, or the JVM/web browser could be having problems receiving and/or displaying it. To eliminate the network as a source of the problem, connect a laptop directly to the MM ethernet connection with a crossover cable. Make sure the laptop used is known to work when using remote control to other chassis. If this laptop connects correctly, either the network or the previously used workstation is part of the problem.

If the network appears to be the issue, make sure the customer understands that any firewalls between the MM and any remote control workstation must allow traffic to and from the IPaddress of the MM on the TCP port running remote video. Both the MM and AMM default to 5900. Keyboard and mouse functionality are sent together with video over port 5900. If remote video works, but the remote keyboard or mouse experience problems, this is most likely a MM code defect. If the MM is running the current version of firmware, contact IBM support for addition troubleshooting. Remote disk, which is displayed in the remote control window, uses TCP ports 1044 and 1045 by default on the MM and AMM.

If firewalls are not being used and the network still appears to be the cause, the customer should engage their network support staff to continue troubleshooting the problem. The customer should also engage IBM support to assist with the network troubleshooting if needed.

JVM / WEB BROWSER / REMOTE WORKSTATION

If the Remote Control appears to start, but the video box is solid black, the resolution/refresh rate on the workstation may be incorrect. The supported resolution and refresh rates for the MM are:

Resolution Refresh Rate

  • 640 X 480 60Hz
  • 640 X 480 72Hz
  • 640 X 480 75Hz
  • 640 X 480 85Hz
  • 800 X 600 60Hz
  • 800 X 600 72Hz
  • 800 X 600 75Hz
  • 800 X 600 85Hz
  • 1024 X 768 60Hz
  • 1024 X 768 70Hz
  • 1024 X 768 75Hz

If the workstation is configured with incorrect refresh rates for the AMM, theAMM will display the proper refresh rates in the video window.

If a workstation has problems when connecting to any chassis, the problem is either in the network or with the JVM / web browser. If the network has been eliminated as a possible source, remember the following points about the JVM / web browser:

  1. Some workstations have local firewalls installed and could have network problems connecting to the networking ports needed by remote control.

  2. JVM 1.42 or higher is required.

  3. Try using different web browsers such as i.e., Mozilla, and Firefox to see if the problem only occurs with a particular browser.

  4. If remote control works, but remote disk does not work, close remote control, delete all local copies of the file remotedrive.dll and restart remote control.

 

Applicable countries and regions

 


Document id:  MIGR-66890
Last modified:  2011-02-25
Copyright © 2014 IBM Corporation