Posted by jkula on Sep 02, 2017 in Cray-2, Marketing Material
CRAY-2 Architecture and Design
In addition to the cooling technology, the CRAY-2's extremely high processing rates were achieved by a balanced integration of scalar and vector capabilities and a large Common Memory in a multiprocessing environment.
The significant architectural components of the CRAY-2 Computer System included four identical Background Processors, 256 million 64-bit words of Common Memory, a Foreground Processor and a maintenance control console.
Each of the four identical Background Processors contained registers and functional units to perform both vector and scalar operations. The single Foreground Processor supervised the four Background Processors, while the large Common Memory complemented the processors and provided architectural balance, thus assuring extremely high throughput rates.
Onsite maintenance is possible via the maintenance control console.
Background Processors
Each Background Processor consisted of a computation section, a control section, and a high-speed Local Memory. The computation section performed arithmetic and logical calculations. These operations and the other functions of a Background Processor were coordinated through the control section. Local Memory was used to store temporarily scalar and vector data during computations. Each Local Memory was 16,384 64-bit words.
Control and data paths for one Background Processor are shown in the block diagram below
Each Background Processor consisted of a computation section, a control section, and a high-speed Local Memory. The computation section performed arithmetic and logical calculations. These operations and the other functions of a Background Processor were coordinated through the control section. Local Memory was used to store temporarily scalar and vector data during computations. Each Local Memory was 16,384 64-bit words.
Control and data paths for one Background Processor are shown in the block diagram below
Computation Section
The computation section contains registers and functional units that operate together to execute a program of instructions stored in memory.
Computation Section Characteristics
Twos complement integer and signed magnitude floating-point arithmetic
Address and Arithmetic Registers
- Eight 32-bit address (A) registers
- Eight 64-bit address (S) registers
- Eight 64-element vector (V) registers; 64-bits per element
Twos complement integer and signed magnitude floating-point arithmetic
Address and Arithmetic Registers
- Eight 32-bit address (A) registers
- Eight 64-bit address (S) registers
- Eight 64-element vector (V) registers; 64-bits per element
Address Functional Units
- Add/Subtract
- Multiply
- Add/Subtract
- Multiply
Scaler Functional Units
- Add/Subtract
- Shift
- Logical
- Population/Parity
- Leading Zero Count
- Add/Subtract
- Shift
- Logical
- Population/Parity
- Leading Zero Count
Vector Functional Units
- Logical
- Integer
- Shift
- Add/Subtract
- Population/Parity
- Leading Zero Count
- Compressed IOTA
Floating-point Functional Units
- Add/Subtract
- Multiply/Reciprocal/Square Root
- Scatter and Gather vector operations to and from Common Memory
Local Memory
Each Background Processor contained 16,384 64-bit words of Local Memory. Local Memory was treated as a register file to hold scalar operands during computation. It could also be used for temporary storage of vector segments where these segments were used more than once in a computation in the vector registers. The access time for Local Memory is four clock periods, and accesses could overlap accesses to Common Memory. This Local Memory replaced the B and T registers on the CRAY-1 and was readily available for user jobs. One application was for small matrices.
Each Background Processor contained 16,384 64-bit words of Local Memory. Local Memory was treated as a register file to hold scalar operands during computation. It could also be used for temporary storage of vector segments where these segments were used more than once in a computation in the vector registers. The access time for Local Memory is four clock periods, and accesses could overlap accesses to Common Memory. This Local Memory replaced the B and T registers on the CRAY-1 and was readily available for user jobs. One application was for small matrices.
Local Memory Characteristics
- 16,384 64-bit words
- Holds scaler and vector operands during computation
- Temporary storage of vector segments
- Four-clock period accesses with Common Memory accesses
- Replaces CRAY-1 B and T registers
- 16,384 64-bit words
- Holds scaler and vector operands during computation
- Temporary storage of vector segments
- Four-clock period accesses with Common Memory accesses
- Replaces CRAY-1 B and T registers
Control Section
Each Background Processor contained an identical independent control section of registers and instruction buffers for instruction issue and control. Each Background Processor had a 64-bit real-time clock. These clocks and the Foreground Processor real-time clock were synchronized at system start-up and were advanced by one count in each clock period.
Each Background Processor contained an identical independent control section of registers and instruction buffers for instruction issue and control. Each Background Processor had a 64-bit real-time clock. These clocks and the Foreground Processor real-time clock were synchronized at system start-up and were advanced by one count in each clock period.
Control Section Characteristics
- Eight instruction buffers, each holding 64 16-bit instruction parcels
- 128 basic instruction codes
- 32-bit Program Address register
- 32-bit Base Address register
- 32-bit Limit Address register
- 64-bit real-time clock
- Eight Semaphore flags to provide interlocks for Common Memory access
- 32-bit Status register
- Eight instruction buffers, each holding 64 16-bit instruction parcels
- 128 basic instruction codes
- 32-bit Program Address register
- 32-bit Base Address register
- 32-bit Limit Address register
- 64-bit real-time clock
- Eight Semaphore flags to provide interlocks for Common Memory access
- 32-bit Status register
Background Processor Intercommunication
Synchronization of two or more Background Processors cooperating on a single job was achieved through the use of one of the eight Semaphore flags shared by the Background Processors. These flags were one-bit registers providing interlocks for common access to shared memory fields, A Background Processor was assigned access to one Semaphore flag by a field in the Status register. The Background Processor had instructions to test and branch, set and clear a Semaphore flag.
Synchronization of two or more Background Processors cooperating on a single job was achieved through the use of one of the eight Semaphore flags shared by the Background Processors. These flags were one-bit registers providing interlocks for common access to shared memory fields, A Background Processor was assigned access to one Semaphore flag by a field in the Status register. The Background Processor had instructions to test and branch, set and clear a Semaphore flag.
Common Memory
One of the primary technological advantages of the CRAY-2 Computer System was its extremely large directly addressable Common Memory. Featuring 268,435,456 words, this Common Memory was significantly larger than that offered by any other commercially available computer system. It allowed the individual user to run programs that would be impossible to run on any other system. It also enhanced multiprogramming by allowing an exponential increase in the number of jobs that can reside concurrently In memory (that is, that can be multi-programmed).
One of the primary technological advantages of the CRAY-2 Computer System was its extremely large directly addressable Common Memory. Featuring 268,435,456 words, this Common Memory was significantly larger than that offered by any other commercially available computer system. It allowed the individual user to run programs that would be impossible to run on any other system. It also enhanced multiprogramming by allowing an exponential increase in the number of jobs that can reside concurrently In memory (that is, that can be multi-programmed).
Common Memory was arranged in four quadrants of 32 banks each, for a total of 128 banks. A word of memory consisted of 64 data bits and 8 error correction bits (SECDED). This memory was shared by the Foreground Processor, Background Processors, and peripheral equipment controllers. Each bank of memory had an independent data path to each of the four Common Memory ports. Each bi-directional Common Memory port connected to a Background Processor and a foreground communications channel. Total memory bandwidth was 64 gigabits or 1 billion words per second.
Common Memory Characteristics
- 256 million words
- 64 data bits, 8 error correction bits per word
- 128 banks; 2 million words per bank
- Dynamic MOS memory technology
- 256 million words
- 64 data bits, 8 error correction bits per word
- 128 banks; 2 million words per bank
- Dynamic MOS memory technology
Foreground Processor and I/O section
The Foreground Processor supervised overall system activity among the Foreground Processor, Background Processors, Common Memory and peripheral controllers. System communication occurred through four high-speed synchronous data channels.
The Foreground Processor supervised overall system activity among the Foreground Processor, Background Processors, Common Memory and peripheral controllers. System communication occurred through four high-speed synchronous data channels.
Firmware control programs for normal system operation and a set of diagnostic routines for system maintenance were integral to the Foreground Processor.
Control circuitry for external devices was also located within the CRAY-2 mainframe.
Control circuitry for external devices was also located within the CRAY-2 mainframe.
Foreground Communication Channels
The Foreground Processor was connected to four 4-Gigabit communication channels. These channels linked the Background Processors, Foreground Processor, peripheral controllers and Common Memory. Each channel connected one Background Processor, one group of peripheral controllers, one Common Memory port, and the Foreground Processor. Data traffic traveled directly between controllers and Common Memory.
The Foreground Processor was connected to four 4-Gigabit communication channels. These channels linked the Background Processors, Foreground Processor, peripheral controllers and Common Memory. Each channel connected one Background Processor, one group of peripheral controllers, one Common Memory port, and the Foreground Processor. Data traffic traveled directly between controllers and Common Memory.
Feedback awaiting moderation
This post has 1 feedback awaiting moderation...
Leave a comment