This chapter describes environment variables that can be used to specify options to the global reference unit (GRU) driver and GRU libraries. For a description of the GRU, see Chapter 1, “Altix UV GRU Direct Access API”.
If an instruction references a virtual address that is not in the GRU translation lookaside buffer (TLB), a TLB miss occurs. TLB misses can be handled in several ways:
user_polling
TLB dropins are done as a side effect of users calling gru_wait or gru_check_status on the coherence buffer request (CBR).
interrupt
The GRU sends an interrupt to the CPU. The TLB dropin is done in the GRU interrupt handler.
The default mode is "interrupt" although you can override this default using an option on the gru_create_context() request. The environment variable can be used to override both, as follows:
setenv GRU_TLBMISS_MODE [interrupt|user_polling] |
The GRU execution unit timeslices across all active instructions. By default, the GRU issues four NUMAlink get/put messages for an active instruction, then switches the next active instruction. You can override the default, as follows:
setenv GRU_CCH_REQUEST_SLICE [0|1|2|3] 0 - issue 4 requests 1 - issue 8 requests 2 - issue 16 requests 3 - not sliced. All requests are issued |
The GRU driver can be configured to do anticipatory TLB dropins for GRU BCOPY instructions that take a TLB miss. When a TLB miss occurs, and the instruction is a BCOPY, the GRU driver will dropin multiple TLB entries. To configure the GRU driver to do anticipatory TLB dropins for GRU, perform the following:
setenv GRU_EXCEPTION_RETRY <num> <num> number of consecutive retries before returning an error |
You can collect statistics of a task's usage of GRU contexts by using this option to specify a statistics file, as follows:
setenv GRU_STATISTICS_FILE <filename> |
Whenever a task exits or a GRU context is destroyed, statistics are written to this file. A sample file is, as follows:
Pid: 23020 Mon Oct 19 20:46:56 2009
Command: ./sgup2
CBRs: 4
DSRs: 24576 bytes
Gseg vaddr: 0x7fe3a1e80000
46740 instructions
23 instruction_wait
0 exceptions
9903 FMM tlb dropin
1 UPM tlb dropin
1040 context stolen |
You can collect detailed trace of GRU instructions. Use this option to specify the name of the file for the trace information. There are levels of tracing, as follows:
All GRU instructions
GRU instructions that return error EXCEPTIONS to users
GRU instructions that fail and are automatically retried
To collect detailed trace of GRU instructions, perform the following:
setenv GRU_TRACE_FILE <filename> |
Setting this option enables tracing of every GRU instruction, as follows:
setenv GRU_TRACE_INSTRUCTIONS |
This option enables tracing of GRU instruction that cause exceptions. Note that some exceptions for GRU MESQ instructions are automatically handled by the GRU mesq library routines. These exceptions are not traced if <val> is equal to 1 (or not specified). If you want to see these exceptions ( mesq_full, amo_nacked, and so on), set < val> to 2.
setenv GRU_EXCEPTION_RETRY <num> <num> number of consecutive retries before returning an error |
This option enables tracing of GRU instructions that fail due to transient errors. The GRU library routine normally retry the instruction and the failure is hidden from the user. If you want to see these failure that are retried successfully, enable this option, as follows:
setenv GRU_TRACE_INSTRUCTION_RETRY |
Pid: 25276 - gru_wait
opc: NOP, xtype: BYTE, ima: ImmResp
istatus: IDLE
Pid: 25276 - gru_wait
opc: VLOAD, xtype: DWORD, ima: DelResp, baddr0: 0x604450, tri0: 0x0, nelem: 0x1, stride: 0x1
istatus: IDLE
Pid: 25276 - gru_wait
opc: VSTORE, xtype: DWORD, ima: DelResp, baddr0: 0x604450, tri0: 0x0, nelem: 0x1, stride: 0x1
istatus: IDLE
Pid: 25276 - gru_wait
opc: IVLOAD, xtype: DWORD, ima: DelResp, baddr0: 0x0, tri0: 0x0, tri1: 0x40, nelem: 0x1
istatus: IDLE
Pid: 25276 - gru_wait
opc: IVSTORE, xtype: DWORD, ima: DelResp, baddr0: 0x0, tri0: 0x0, tri1: 0x40, nelem: 0x1
istatus: IDLE
Pid: 25276 - gru_wait
opc: VSET, xtype: DWORD, ima: DelResp, baddr0: 0x604450, value: 0x483966aa127ded1d, nelem: 0x1, stride: 0x1
istatus: IDLE
Pid: 25284, Tid: 25289 - gru_wait
opc: MESQ, xtype: CACHELINE, ima: DelResp, baddr0: 0x606000, tri0: 0x0, nelem: 0x1
istatus: EXCEPTION, isubstatus: QLIMIT, avalue: 0f0000000f
execstatus: EXCEPTION
state: 0x1, exceptdet0: 0x606000, exceptdet1: 0x8
Pid: 25284, Tid: 25288 - gru_wait
opc: MESQ, xtype: CACHELINE, ima: DelResp, baddr0: 0x606000, tri0: 0x0, nelem: 0x1
istatus: EXCEPTION, isubstatus: AMO_NACKED, avalue: 00
execstatus: EXCEPTION
state: 0x1, exceptdet0: 0x606000, exceptdet1: 0x8 |
The /proc/sgi_uv/gru directory contains several files that have information about GRU state, as follows:
gru_options
Bit-field that can be used to enable or disable options
cch_status
List of tasks using GRU contexts
gru_status
List of available GRU resources
statistics
Detailed GRU driver statistics (if enabled)
mcs_status
Timing information for kernel GRU commands
Some examples of the files in /proc/sgi_uv/gru are, as follows:
Example 2-1. gru_status - Available Resources
The file shows the free resources available in each GRU chiplet, as follows:
% cat gru_status
# gid nid ctx cbr dsr ctx cbr dsr
# busy busy busy free free free
0 0 8 36 32768 8 92 0
1 0 1 4 4096 15 124 28672
2 1 7 56 28672 9 72 4096
3 1 7 28 28672 9 100 4096 |
Example 2-2. gru_options - Enable or Disable Driver Features
Various GRU options (mostly debugging) can be enabled or disabled by writing values to /proc/sgi_uv/gru/gru_options file. Use cat command, to view the file to see the current settings or to see a description of the various options.
% cat debug_options # bitmask: 1=trace, 2=statistics, 0x10=No_4k_dsr_AU_war # bitmask: 0x20=no_iabort_war, 0x40=no_chiplet_affinity # bitmask: 0x80=no_tlb_war, 0x100=no_mesq_war 0x0001 - enable statistics (they are not free) 0x0002 - enable VERY verbose driver trace information to /var/log/messages |
Example 2-3. statistics - Very Detailed Driver Statistics
You can collect detailed driver statistics, as follows:
% echo 2 > /proc/sgi_uv/gru/gru_options |
This enabled, detailed statistic collection occurs in numerous places in the driver. There is system usage overhead associated with this collection, especially on large systems.
% cat /proc/sgi_uv/gru/statistics
45806 vdata_alloc
45771 vdata_free
195712 gts_alloc
195668 gts_free
34351 gms_alloc
34333 gms_free
149398 gts_double_allocate
... (lots more) |
You can use the grustats command, to view GRU statistics. You will see output similar to the following:
uv15-sys TOTAL GRU STATISTICS SINCE COMMAND START
0 vdata_alloc 0 copy_gpa
0 vdata_open 0 read_gpa
0 vdata_free 0 mesq_receive
0 gts_alloc 0 mesq_receive_none
0 gts_free 0 mesq_send
0 gms_alloc 0 mesq_send_failed
0 gms_free 0 mesq_noop
0 gts_double_allocate 0 mesq_send_unexpected_error
0 assign_context 0 mesq_send_lb_overflow
0 assign_context_failed 0 mesq_send_qlimit_reached
0 free_context 0 mesq_send_amo_nacked
0 load_user_context 0 mesq_send_put_nacked
0 load_kcontext 0 mesq_qf_locked
0 load_kcontext_assign 0 mesq_qf_noop_not_full
0 load_kcontext_steal 0 mesq_qf_switch_head_failed
0 lock_kcontext 0 mesq_qf_unexpected_error
0 unlock_kcontext 0 mesq_noop_unexpected_error
0 get_kcontext_cbr 0 mesq_noop_lb_overflow
0 get_kcontext_cbr_busy 0 mesq_noop_qlimit_reached
0 lock_async_resource 0 mesq_noop_amo_nacked
0 unlock_async_resource 0 mesq_noop_put_nacked
0 steal_user_context 0 mesq_noop_page_overflow
0 steal_kernel_context 0 implicit_abort
0 steal_context_failed 0 implicit_abort_retried
... and much more |
For a usage statement, once the grustats command is executing, enter the letter h for help. A usage statement appears, as follows:
Intstats help:
h - help (this screen)
q - quit
r - reset command-start statistics
t or <TAB> - toggle between total and incremental mode
CTL-L - redraw screen
CR - to return to display
|