adding custom instructions to simplescalar/gcc architecture somasundaram
TRANSCRIPT
Adding custom instructions to Adding custom instructions to Simplescalar/GCC architectureSimplescalar/GCC architecture
SomasundaramSomasundaram
AgendaAgenda
MotivationMotivation GCC overall architectureGCC overall architecture Simplescalar architectureSimplescalar architecture Adding a custom instructionAdding a custom instruction ConclusionConclusion
MotivationMotivation GCC overall architectureGCC overall architecture Simplescalar architectureSimplescalar architecture Adding a custom instructionAdding a custom instruction ConclusionConclusion
MotivationMotivation
Extensible processorsExtensible processors• What regular ISA instructions can be What regular ISA instructions can be
combined?combined?
• Which regular ISA instructions are to be Which regular ISA instructions are to be combined into a CFU instruction?combined into a CFU instruction?
• Retarget the compiler to produce Retarget the compiler to produce optimised code with CFU instructionsoptimised code with CFU instructions
• Simulate the Simulate the extendedextended processor with processor with CFU instructionsCFU instructions
GNU Compiler CollectionGNU Compiler Collection
Many front-endsMany front-ends• CC• FortranFortran• C++/Java/AdaC++/Java/Ada
Backend targeted at many Backend targeted at many processorsprocessors• x86, Alpha, Sparcx86, Alpha, Sparc• ARC, ARM, MIPS . . .ARC, ARM, MIPS . . .
GCC Compiler FlowGCC Compiler Flow
Language Front-end
High-level IR Optimisations
Low-level IR Optimisations
Program
GIMPLE IR
RTL IR
Scheduled assembly code
Machine dependent files [.c, .h,.md]
RTL?
Are we interested in everything?
Combine small RISC ISA like patterns into bigger CISC ISA
like patterns
GCC – Low Level OptimisationGCC – Low Level Optimisation Uses Lisp like RTL as IRUses Lisp like RTL as IR
Example: Example: Tip: use –da compiler option to get the IR outputTip: use –da compiler option to get the IR output
(insn 48 47 50 (set (reg/v:SI 36)(insn 48 47 50 (set (reg/v:SI 36) (mult:SI (reg:SI 42)(mult:SI (reg:SI 42) (reg:SI 41))) 41 {mulsi3} (nil)(reg:SI 41))) 41 {mulsi3} (nil) (nil))(nil))
(call_insn 94 93 97 (parallel[ (call_insn 94 93 97 (parallel[ (set (reg:SI 0 r0)(set (reg:SI 0 r0) (call (mem:SI (symbol_ref:SI ("printf")) 0)(call (mem:SI (symbol_ref:SI ("printf")) 0) (const_int 0 [0x0])))(const_int 0 [0x0]))) (clobber (reg:SI 14 lr))(clobber (reg:SI 14 lr)) ] ) -1 (nil)] ) -1 (nil) (nil)(nil) (expr_list (use (reg:SI 1 r1))(expr_list (use (reg:SI 1 r1)) (expr_list (use (reg:SI 0 r0))(expr_list (use (reg:SI 0 r0)) (nil))))(nil))))
GCC - Target Machine DescriptionGCC - Target Machine Description
Use a similar language in Use a similar language in mdmd [machine [machine description] filedescription] file
(define_insn "mulsi3"(define_insn "mulsi3" [(set (match_operand:SI 0 "s_register_operand" "=&r,&r")[(set (match_operand:SI 0 "s_register_operand" "=&r,&r")
(mult:SI (match_operand:SI 2 "s_register_operand" "r,r")(mult:SI (match_operand:SI 2 "s_register_operand" "r,r") (match_operand:SI 1 "s_register_operand" "%?r,0")))](match_operand:SI 1 "s_register_operand" "%?r,0")))]
"""" "mul%?\\t%0, %2, %1""mul%?\\t%0, %2, %1"[(set_attr "type" "mult")])[(set_attr "type" "mult")])
GCC Combine PhaseGCC Combine Phase
Combines some standard IR pattern Combines some standard IR pattern into a single user-defined IR patterninto a single user-defined IR pattern
User-defined IR patterns are defined User-defined IR patterns are defined in the target.md filein the target.md file
Operand constraints should be Operand constraints should be satisfiedsatisfied
Example: MAC (Multiply-Accumulate)Example: MAC (Multiply-Accumulate) MergeMerge mulsi3mulsi3 andand addsi3addsi3 mulsi3addsimulsi3addsi
GCC Combine PhaseGCC Combine Phase
How is it done?How is it done?
Let us assume that the following patterns Let us assume that the following patterns are defined in the machine descriptionare defined in the machine description
addsi3 addsi3 Matches C=A+B (all 32-bit regs) Matches C=A+B (all 32-bit regs)
mulsi3 mulsi3 Matches C=A*B (all 32-bit regs) Matches C=A*B (all 32-bit regs)
mulsi3addsi mulsi3addsi Matches D=A*B+C (all 32-bit regs) Matches D=A*B+C (all 32-bit regs)
mulsi4addsi mulsi4addsi Matches E=A*B+C*D (all 32-bit regs) Matches E=A*B+C*D (all 32-bit regs)
GCC Combine PhaseGCC Combine Phase
47 45
48
55
53
52 50
addsi3
mulsi3
mem mem mem mem
mulsi3
Assume this DDG sub-graph
GCC Combine PhaseGCC Combine Phase47 45
48
55
53
52 50
addsi3
mulsi3
mem mem mem mem
mulsi3
Try to combine 48, 55 and see if a pattern which multiplies two
operands and adds a third operand to the result exists
47 45
55
53
52 50
mulsi3addsi
mulsi3
mem mem mem memTry 55,45:No matching pattern
Try 55,47:No matching pattern
Try 55,53:We have a match
GCC Combine phaseGCC Combine phase
Try 55,52:No matching pattern
Try 55,50:No matching pattern
47 45
55
52 50
mulsi4addsi
mem mem mem mem
Try 55,45:No matching pattern
Try 55,47:No matching pattern
Try 55,52,50: No matching pattern
Try 55,52,45: No matching pattern
Try 55,52,47: No matching pattern
Try 55,50,45: No matching pattern
Try 55,50,47: No matching pattern
Try 55,47,45: No matching pattern
Cannot try to combine more than 3 patterns! Hence, stop!
GCC Combine phase: SummaryGCC Combine phase: Summary
Can combine upto 3 instructions Can combine upto 3 instructions togethertogether
Can recursively combine more Can recursively combine more instructionsinstructions
Deletes a smaller instruction once Deletes a smaller instruction once combinedcombined
Always works on a functionAlways works on a function
Retargetting GCC for CFURetargetting GCC for CFU
Build a better Combiner phaseBuild a better Combiner phase• Write a new combiner with better Write a new combiner with better
pattern merger which works on inputs pattern merger which works on inputs from RTLfrom RTL
• Replace existing combiner with this Replace existing combiner with this combinercombiner
New patterns for the CFU instruction New patterns for the CFU instruction in the target.md filein the target.md file
Changes in GAS (included in binutils Changes in GAS (included in binutils package) to generate insn. wordpackage) to generate insn. word
SimpleScalar isSimpleScalar is
Instruction Set simulatorInstruction Set simulator Profiles programs Profiles programs Simulates micro-architectural Simulates micro-architectural
featuresfeatures Different levels of speed of Different levels of speed of
simulation Vs accuracy trade-offsimulation Vs accuracy trade-off Written in CWritten in C Easily retargettableEasily retargettable
Simplescalar: CFU issuesSimplescalar: CFU issues
More arguments than used by RISC More arguments than used by RISC instructionsinstructions• Out-of-order execution needs to take Out-of-order execution needs to take
care of the increase in dependenciescare of the increase in dependencies
New instructions in decode treeNew instructions in decode tree• Easy to add new instructions to the Easy to add new instructions to the
decode tree (machine.def)decode tree (machine.def)
Let us add a new instructionLet us add a new instruction
Achieve the operation E=A*B+C*D Achieve the operation E=A*B+C*D using one instructionusing one instruction
4 input operands and 1 output 4 input operands and 1 output operandoperand
Extension to ARM ISAExtension to ARM ISA ProvideProvide
• CompilerCompiler• AssemblerAssembler• SimulatorSimulator
Pattern for the instructionPattern for the instruction
gcc/config/arm/arm.mdgcc/config/arm/arm.md
(define_insn "*mulsi4addsi"(define_insn "*mulsi4addsi"
[(set (match_operand:SI 0 "s_register_operand" "=r")[(set (match_operand:SI 0 "s_register_operand" "=r")
(plus:SI(plus:SI
(mult:SI (match_operand:SI 2 "s_register_operand" "r")(mult:SI (match_operand:SI 2 "s_register_operand" "r")
(match_operand:SI 1 "s_register_operand" "r"))(match_operand:SI 1 "s_register_operand" "r"))
(mult:SI (match_operand:SI 4 "s_register_operand" "r")(mult:SI (match_operand:SI 4 "s_register_operand" "r")
(match_operand:SI 3 "s_register_operand" "r"))))](match_operand:SI 3 "s_register_operand" "r"))))]
""""
"ml2a%?\\t%0, %2, %1, %4, %3""ml2a%?\\t%0, %2, %1, %4, %3"
[(set_attr "type" "mult")])[(set_attr "type" "mult")])
Simplescalar changesSimplescalar changes
Instruction Decode TreeInstruction Decode Tree• Chain of decoders: Each looking at a set Chain of decoders: Each looking at a set
of bitsof bits target-arm/arm.deftarget-arm/arm.def
• New chain of decoder macros for CFU New chain of decoder macros for CFU class of instructionsclass of instructions
• Increase the number of input Increase the number of input dependencies in all the instructio dependencies in all the instructio macros from 5 to 6 (predication in ARM)macros from 5 to 6 (predication in ARM)
Simplescalar changesSimplescalar changes
sim-outorder.csim-outorder.c• Increase the number of input Increase the number of input
dependencies to be monitored in the dependencies to be monitored in the reservation unitreservation unit
• Both macros and code has to be Both macros and code has to be changedchanged
Other files need to be changed for Other files need to be changed for the same purposethe same purpose
Compile ‘test program’ and verify!Compile ‘test program’ and verify!
SummarySummary
Identify the ways to add new Identify the ways to add new instructions to Simplescalar and GCCinstructions to Simplescalar and GCC
Determine the capabilities of the Determine the capabilities of the current combiner in GCCcurrent combiner in GCC
Demonstrate the addition of a new Demonstrate the addition of a new custom instructioncustom instruction
Understand GCC to some extent!Understand GCC to some extent!