ti asc

8/19/2019 TI ASC

http://slidepdf.com/reader/full/ti-asc 1/7

TI ASC

TI ASC stands for an advanced scientific computer designed by Texas

Instruments. It was a marvel capable of doing 80 million floating point

computations per seconds using a 50 nanoseconds cloc. T!is pipeline

vector processor demonstrated t!e viability of t!e design and set t!e

standards for subse"uent vector processors. T!e main features of t!e TI

ASC is its memory structure suc! t!at it !as only one memory w!ic! is

!ig! speed and s!ared memory w!ic! is accessed by number of

processors. T!e ma#or subsystems of t!e TI ASC was t!e central memory$

t!e central processor$ t!e perip!eral processor$ on%line bul storage$ a

digital communications interface$ plus a selection of standard perip!erals.

It was made in suc! a way t!at only memory control unit can access t!e

memory and it could support up to eig!t processors. T!e perip!eral processor !as been designed for executing t!e operating system. T!e

central processor !as been designed expressly to provide !ig! computing

power for large arrays of data. T!e central processor operates as a slave to

t!e perip!eral processor. T!is design approac! was c!osen to maximi&e

t!e overlapping of system over!ead tass wit! t!e execution of user

programs. In operation t!e #ob stream is analysed by t!e perip!eral

processor. T!e language processors$ plus user ob#ect code$ are executed

by t!e central processor. System control and I/O tass are processed by

t!e perip!eral processor. I'( is routed t!roug! !ig!%speed$ !ead%per%trac

disc storage. A data communications interface for t!e common carriers is

provided for t!e support of remote !atc! and interactive terminals.

Standard types of perip!erals are also provided. T!e central memory

serves as t!e common access communications and access storage medium

for t!ese subsystems. T!e )C* is designed to operate async!ronously$

independent of cable delays$ processor cloc rates$ and memory unit

access and cycle times. T!is capability allows for a great deal of

flexibility to accommodate improvements in memory or processortec!nologies w!ic! may be desired. T!e )C* is capable of !andling a

maximum data transfer rate of 80 million words per second per port$

giving a total transfer capacity of +,0) words per second. T!erefore$ a

significant capacity beyond today-s memory and processor speeds is

available in t!e )C*. T!e )C* provides t!e facilities for controlling

access from t!e eig!t processor ports to a C) !aving a ,%bit address

space /+ million words1. A port expander can be utili&ed to expand t!e

number of processor ports. T!e semiconductor !ig!%speed central

memory modules !ave a cycle time of +0 ns and a read time of ,0 ns.

8/19/2019 TI ASC


Additionally$ all transfers are 5+ bits /eig!t 2%bit words1 wit! a

3amming code providing single%bit error correction and double%bit error

detection for eac! 2%bit word. 3ig!%speed central memory is typically

divided into eig!t e"ual si&ed modules w!ic! permits eig!t%way

interleaving. A patc! board wit!in t!e l4IC* controls t!e memoryaddress decoding and sets t!e interleaving pattern. TI ASC provided a

uni"ue design for multifunction arit!metic pipelines for simplified

control circuitry. T!ere are four pipeline arit!metic units built into t!e TI%

ASC system. T!e instruction processing unit !andles t!e fetc!ing and

decoding of instructions. T!ere are a large number of woring registers in

t!e processor w!ic! also controls t!e operations of t!e memory buffer

unit and t!e arit!metic units. Central memory management and access

control of memory ports is ac!ieved t!roug! t!e use of two facilities6

map registers and protect registers. 7ac! user program !as its own uni"ue

page address map. age addresses not re"uired by t!e program are

mapped into absolute page &ero w!ic! is not accessible to t!e C. 9!en a

program is loaded into memory$ it will liely be loaded into discontiguous

memory pages. :uring program execution$ program developed page

addresses are converted$ wit!out execution time penalty$ to actual page

addresses by t!e map registers. ;ecause a reference to page &ero is denied

and t!e relevant processor notified$ t!e map registers provide for inter%

user memory protection. <igure 2 s!ows t!e mapping sc!eme. :esired page si&es depend on t!e amount of central memory and t!e problem mix

of a particular installation. <our different page si&es may be specified for

an ASC system$ varying from ,= to 5+= words. A program may utili&e

anyone of t!e page si&es available. In the memory area capacity,

performance, connectivity, protection, and mapping are all variable over

wide bounds. The central processor can be tailored to provide a wide

range of processing power by using one, two, three, or four pipes. The

peripheral processor provides for dynamically matching the execution

rates of up to eight independent instruction streams with the task

requirements.

8/19/2019 TI ASC


8/19/2019 TI ASC


STAR 100

T!e STA>%00 was a vector supercomputer designed$ manufactured$ and

mareted by Control :ata Corporation /C:C1. It was one of t!e first mac!inesto use a vector processor to improve performance on appropriate scientific

applications. In general organi&ation$ t!e STA> was similar to C:C-s earlier

supercomputers$ w!ere a simple C* was supported by a number of perip!eral

processors t!at offloaded !ouseeeping tass and allowed t!e C* to crunc!

numbers as "uicly as possible. In t!e STA>$ bot! t!e C* and perip!eral

processors were deliberately furt!er simplified$ to lower t!e cost and

complexity of implementation. T!e main innovation in t!e STA> was t!e

inclusion of instructions for vector processing. T!ese new and more complex

instructions approximated w!at was available to users of t!e A? programming

language and operated on !uge vectors t!at were stored in consecutive locations

in t!e main memory. T!e C* was designed to use t!ese instructions to set up

additional !ardware t!at fed in data from t!e main memory as "uicly as

possible. <or instance$ a program could use single instruction wit! a few

parameters to add all t!e elements in two vectors t!at could be as long as +5$525

elements. T!e C* only !ad to decode a single instruction$ set up t!e memory

!ardware$ and start feeding t!e data into t!e mat! units. As wit! instruction

pipelines in general$ t!e time needed to complete any one instruction was no better t!an it was before$ but since t!e C* was woring on a number of

instructions at once /or in t!is case$ data points1 t!e overall performance

dramatically improves due to t!e assembly line nature of t!e tas. T!e STA>%

00 !as two pipelines w!ere arit!metic is performed. T!e first pipeline contains

a floating point adder and multiplier$ w!ereas t!e second pipeline is

multifunctional$ capable of executing all scalar instructions. It also contains a

floating point adder$ multiplier$ and divider. ;ot! pipelines are +,%bit for

floating point operations and are controlled by microcode. T!e STA>%00 can

split its floating point pipelines into four 2%bit pipelines$ doubling t!e pea

performance of t!e system to 00 )<?(S at t!e expense of !alf t!e

precision.T!e limitations on store si&e !ave not gone away wit! t!e

improvements in storage density and access time provided by semiconductor

tec!nology$ because logic densities and processing speeds !ave also improved

and !ave led to !ig!er user expectations@ transmission delays still account for

roug!ly t!e same percentage of t!e total system delay in t!e C;7> 05 as

t!ey did in t!e original STA>%00.)emory bandwidt! is obtained in t!e STA>%

00 by a combination of eig!t%way interleaving of memory bans wit! widestore words /5 bits accessed per store cycle1 and t!e pipelined transmission of

8/19/2019 TI ASC


eac! of t!ese super words to t!e processor in four se"uential groups of 8 bits.

In order to !andle t!is data rate in t!e processor$ !owever$ t!e designers were

faced wit! c!oices ranging$ in t!e extreme$ between using a multiplicity of

++00%lie arit!metic units and a single pipelined arit!metic unit capable of

executing 2%bit operations at a rate of one every 0 ns. In practice t!e solutionc!osen for t!e STA>%00 uses two pipelined arit!metic units. T!is preliminary

study !as s!own t!at a new explicit met!od for solving t!e transonic small

disturbance potential e"uation on t!e STA>%00 computer can almost !alve t!e

computer time re"uired for t!is type of computation w!en compared to

successive line over%relaxation on t!e C:C C;7> B5 computer. T!ese

results are limited to a relatively simple problem wit! a uniform Cartesian grid.

Alt!oug! t!e speedup is not as great as desired$ it is enoug! to #ustify furt!er

study of t!is met!od. (n t!e convergence rates of t!e sc!emes s!ould be

investigated. T!e effects of lift and of grid stretc!ing Also$ t!e new explicit

sc!eme s!ould be applied to t!e full potential e"uation. $ eac! of w!ic! can act

as a single +,%bit or a twin 2%bit unit. T!ere are several possibilities for

obtaining furt!er reductions in computer time. Improvements mig!t be made in

t!e convergence rate of explicit algorit!ms$ or ot!er vectori&able algorit!ms

mig!t be developed. Anot!er possibility is t!roug! programing tec!ni"ues to

get successive line over%relaxation to run as efficiently as possible on t!e STA>%

00 computer. met!od is semi%implicit$ t!ere are portions of it w!ic! can be

written in vector instructions of s!ort lengt!.

ti asc

Documents