Chapter 1. Introducing Altix UV System Control Topology

This manual describes controller software commands on SGI Altix UV 100 and SGI Altix UV 1000 systems.


Note: This manual does not apply to SGI Altix UV 10 systems. For information, see the SGI Altix UV 10 System User's Guide.


Altix UV 1000 Overview

The SGI Altix UV 1000 system is a blade-based, cache-coherent non-uniform memory access (ccNUMA), computer system that is based on the Intel Xeon 7500 series processor. The UV 1000 system scales, as follows:

  • From 32 to 2048 threads in a single system image (SSI)

  • A maximum of 2048 processor cores with hyper-threading turned off

  • A maximum of 4096 processor threads (2048 processor cores) with hyper-threading turned on


    Note: Each processor core supports two threads. A processor with hyper-threading enabled is treated by the operating system as two processors instead of one. This means that only one processor is physically present but the operating system sees two logical processors, and shares the workload between them. At initial release, the maximum SSI supported by the Linux operating system is 2048.


The main component is an 18U-high individual rack unit (IRU) shown in Figure 1-1 that supports 16 compute blades and is configurable to support multiple topology options.

The compute blades in the IRU are interconnected using NUMAlink 5 technology. NUMAlink 5 has a peak aggregate bi-directional bandwidth of 15 GB/s. Multiple IRUs are also interconnected with NUMAlink 5 technology.

A maximum of two IRUs can be placed into a custom 42U rack as shown in Figure 1-2. Each rack supports a maximum of 512 processor cores; therefore, the largest SSI system requires four racks. A maximum of 128 four rack cells can be interconnected to create a 512 rack system (256K processor cores).

Figure 1-1. Individual Rack Unit

Individual Rack Unit

Figure 1-2. Basic System Building Blocks for Altix UV 1000 Systems

Basic System Building Blocks for Altix UV 1000 Systems

The Altix UV system supports direct attach I/O on the compute blade. The compute blade is designed to host one of four different I/O riser cards. Various PCI express based I/O components are supported. Figure 1-3 shows a full SGI Altix UV system rack.

Figure 1-3. SGI Altix UV System Rack

SGI Altix UV System Rack

For a detailed hardware description, see the SGI Altix UV 1000 Systems User's Guide. Figure 1-3.

The SGI hardware manuals contain detailed descriptions of Altix system architecture. For a list of these manuals, see “Related Publications”.


Note: Online and postscript versions of SGI documentation is available at SGI Technical Publications Library at http://docs.sgi.com .


Altix UV 100 Overview

The SGI Altix UV 100 system is a small, blade-based, cache-coherent, non-uniform memory access (ccNUMA), computer system that is based on the Intel Xeon 7500 series processor. The SGI Altix UV 100 system scales, as follows:

A maximum of 768 processor cores

From 16 to 1536 threads in a single system image (SSI)


Note: Each processor core supports two threads.

The main component is a 3U-high IRU, shown in Figure 1-4, that supports two compute blades and is configurable to support multiple topology options.

Figure 1-4. SGI Altix UV 100 IRU Front View

SGI Altix UV 100 IRU Front View

The two compute blades in the IRU are interconnected using NUMAlink 5 technology. NUMAlink 5 has a peak aggregate bi-directional bandwidth of 15 GB/s. Multiple IRUs are also interconnected with NUMAllink 5 technology.

A maximum of twelve IRUs can be placed into a standard 42U 19" custom tall rack. Each rack supports a maximum of 384 processor cores.

The Altix UV system supports direct attach I/O on the compute blade. The compute blade is designed to host one of four different I/O riser cards. Various PCI express based I/O components are supported. For a detailed hardware description, see the SGI Altix UV 100 Systems User's Guide.

System Management

The system management provides a single control point for system power up, initialization, booting and maintenance. System management on an SGI Altix UV 1000 consists of three levels. The first level of system management is the board management controllers (BMCs) on the node boards. The second level is the chassis management controllers (CMC) in the rear of the IRU. The third level is the system management node (SMN). The SMN is required on SGI Altix UV 1000 series systems. It is not required for the SGI Altix UV 100 series systems.


Important: The UV 1000 and UV 100 system control network is a private, closed network. It is not to be reconfigured in any way different from the standard UV installation, nor is it to be directly connected to any other network. The UV system control network does not accommodate additional network traffic, routing, address naming other than its own schema, and DCHP controls other than its own configuration. The system control network also is not security hardened, nor is it tolerant of heavy network traffic, and is vulnerable to Denial of Service attacks.

The System Management Node acts as a gateway between the UV system control network and any other networks.

SGI Management Center (SMC) software running on the system management node (SMN) provides a robust graphical interface for system configuration, operation, and monitoring. This manual describes commands that can be used on systems without an SMN or not running the SMC. For more information, see SGI Management Center System Administrator's Guide.

Chassis Management Controller

The chassis management controller (CMC) in the rear of the IRU, as shown in Figure 1-5, and Figure 1-6, supports powering up and down of the compute blades and environmental monitoring of all units within the IRU. The CMC sends operational requests to the baseboard manager controller (BMC) on each compute node. The CMC provides data collected from the compute nodes within the IRU to the system management node upon request. The CMC blade on the right side of the IRU is the primary CMC. A secondary CMC is currently not supported.

Figure 1-5. Chassis Manager Controller

Chassis Manager Controller

System Control Network

The chassis management controller (CMC) for SGI Altix UV 1000 systems has seven RJ45 Ethernet ports, as shown in Figure 1-6.

The Ethernet ports are used, as follows:

  • SMN - the system management node port is used to connect to the SMN.

  • SBK - Each 16 rack group is called a super block. A building block is four racks. A super block is four building blocks. The SBK connects one super block to another super block.

  • CMC0 and CMC1 - these two ports are used to interconnect multiple IRUs within a building block together.

  • EXT0, EXT1, EXT2 - connects to external devices such as I/O chassis and smart PDUs.

CONSOLE - the console connection supports a serial channel connection directly to the CMC for system maintenance.

Figure 1-6. CMC Ethernet Ports on SGI Altix UV 1000 Systems

CMC Ethernet Ports on SGI Altix UV 1000 Systems

For information on finding the CMC IP address and hostname, see “Finding the CMC IP Address” in Chapter 2.

The chassis management controller (CMC) for SGI Altix UV 100 systems is a board assembly integrated into the IRU and has four RJ45 Ethernet ports, as shown in Figure 1-4.

Figure 1-7. CMC Ethernet Ports on SGI Altix UV 100 Systems

CMC Ethernet Ports on SGI Altix UV 100 Systems

The Ethernet ports are used, as follows:

  • ACC - the accessory is used to connect miscellaneous devices to the CMC network, for example smart power distribution units (PDUs).

  • SMN - the system management node port is used to connect to the SMN.

  • CMC0 and CMC1 - these two ports are used to interconnect multiple IRUs together to form a string topology.

CONSOLE - the console connection supports a serial channel connection directly to the CMC for system maintenance.

Determining Rack Numbers

The system controller network has strict requirements for rack numbering. The requirements minimize the amount of information that must be manually configured for each CMC when it is plugged into an IRU. Currently, only the rack and u-position of the IRU must be set. The u-position is the physical location of the IRU in the rack. The rack and u-position values are found in the /etc/sysconfig/module_id file. Besides uniquely identifying the physical location of the CMCs, the values are used to generate several IP address for the various VLANs on the CMC and are used by any software interacting with the system controller network to target operations.

For large Altix UV 1000 configurations, a building block consists of four racks with two IRUs in each rack with the CMCs in those IRUs interconnected via their CMC0 and CMC1 jacks. In order for racks to be considered part of the same building block, their rack numbers must be consecutive and satisfy the following equation:

(rack - 1) MOD 4 = 0, 1, 2 or 3

or

(rack - 1) DIV 4 = the same value for all racks in the building block

For example, a system with four racks numbered 1, 2, 3, and 4 has one building block. Similarly, a system with four racks number 9, 10, 11, and 12 has one building block.

A system with racks numbered 10, 11, 12, 13 would have to two building blocks with 10, 11 and 12 in one building block; 13 is in a second building block. The system controller network must be cabled appropriately for each configuration.

A super block (SBK) consists four building blocks. Two primary CMCs in each building block are used to interconnect the building blocks via their SBK jacks. For racks to be considered part of the same SBK their rack numbers must be consecutive and satisfy the following equation:

(rack - 1) MOD 16 = 0,1,2,... 15

or

(rack - 1) DIV 16 = the same value for all racks in the SBK

In summary, a single SBK can support up to four building blocks, or in other words, 16 racks.

Accessories

Accessories are third party devices that connect to the system controller network. On the Altix UV 1000 chassis CMC, these accessories are connected to the EXT0, EXT1, and EXT2 jacks. On the Altix UV 100 chassis CMC, these accessories are connected to the ACC jack. If there are more accessories than available jacks an external switch can be used.  

Currently, two accessories are supported, as follows:

  • Magma PCIE Expansion chassis

  • Eaton ePDU

Accessories connected to the CMC's accessory jacks should be configured to get the their IP address via dynamic host configuration protocol (DHCP). The DHCP server on the CMC will assign IP address in the 10.<rack>.<upos>.100 to 10.<rack>.<upos>.199 range where rack and upos are the rack and upos of the CMC. This is the CMC's VACC VLAN.

The CMC searches for accessories and using Simple Network Management Protocol (SNM) attempts to determine the accessory type. Specifically, the CMC querie the SMNP sysName.0 Object Identifier (OID) and looks for: "Monitored ePDU" is assumed to be an Eaton ePDU. "Magma Chassis" is assumed to be a Magma PCIE expansion chassis.

A physical location can also be configured into each accessory using the SMNP sysLocation.0 OID. Each accessory must be configured one at a time:

  • Connected by a cable from the accessory's SNMP port to an open accessory jack on a CMC

  • The CMC config -v command will initially show the accessory, as follows:

    uv14-cmc CMC:r1i1c>  config -v
      CMCs:            1
          r001i01c UV1000
      BMCs:            4
          r001i01b00 IP93-BASEIO    
          r001i01b01 IP93-DISK      
          r001i01b02 IP93           
          r001i01b03 IP93           
      Partitions:      1
          partition000 BMCs:    4
      Accessories:     1
          undefined      10.1.1.101 (Magma PCIE Expansion)
    

  • Use the CMC config -acc command to set a location, as follows:

    uv14-cmc CMC:r1i1c> config --acc 10.1.1.101@1.30
      ==== r001i01c (PRI) ====
      10.1.1.101 (Magma Chassis) configured as r001u30io
    

    Verify using the config -v command, as follows:

    uv14-cmc CMC:r1i1c>  config -v
      CMCs:            1
          r001i01c UV1000
      BMCs:            4
          r001i01b00 IP93-BASEIO    
          r001i01b01 IP93-DISK      
          r001i01b02 IP93           
          r001i01b03 IP93           
      Partitions:      1
          partition000 BMCs:    4
      Accessories:     1
          r001u30io        10.1.1.101 (Magma PCIE Expansion)
    

The location description formats of the two types of accessories currently supported are, as follows:

Smart PDUs: location description format is r<rack>pdu[0|1]
      Eaton ePDU
I/O: location description format is r<rack>u<uposition>io
      Magma PCIE Expansion

Power control only affects I/O type accessories. When the IRU chassis power is turned on, off, or cycled, the accessories are treated the same.

The CMC power on|off|cycle commands will accept a noio option to exclude the I/O accessories from the power operation.

Altix UV System Controller Software

The controller is designed to manage and monitor the individual blades in SGI Altix UV systems. Depending on your system configuration, you can monitor and operate the system from the system management node (SMN) or on smaller systems, such as, the Altix UV 100 from the CMC itself. UV 1000 systems up to 16 racks (four building blocks, also called one super block) can also be controlled and monitored from a CMC in the system.

The following list summarizes the control and monitoring functions that the CMC performs. Many of the controller functions are common across both IRU and routers; however, some functions are specific to the type of enclosure.

  • Controls and monitors IRU and router fan speeds

  • Reads system identification (ID) PROMs

  • Monitors voltage levels and reports failures

  • Monitors and controls warning LEDs on the enclosure

  • Provides the ability to create multiple system partitions (single system image) running their own operating system.

  • Provides ability to flash system BIOS