ࡱ> j(x/ / 0DArialn Old Styltthz 0DComic Sans MSyltthz 0B DWingdings MSyltthz 00DTimes New Romantthz 0@DSymbolew Romantthz 0PDBookman Old Stylethz 0@ .  @n?" dd@  @@``   #1 jR -r[3|&+-G(!4--PPw4vQ7r "f     8-/!.R?_)   4(3 mJ v-   )p 60>!.+/ N] 7"f5 04vD6mC@0d:(W   +xK7 $ %"!L1O 6C#"L <6@Ek7}(`:\3Q4'$#("**'!/!,,][j~D;dAEef8[mzD Krhh.W ' \ =,W=Y.)  `c8 0AA fff@8rTʚ;3 ;ʚ;g4dddd콎 0fppp@ <4dddd@w 0tDz <4BdBd@x 0tg4BdBdl  0p p0___PPT10 ZZ___PPT9x`?  %O  =HX Performance  ,COE 308 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and MineralsB4Z 0Zd/Z ,OWhat is Performance?How can we make intelligent choices about computers? Why is some computer hardware performs better at some programs, but performs less at other programs? How do we measure the performance of a computer? What factors are hardware related? software related? How does machine s instruction set affect performance? Understanding performance is key to understanding underlying organizational motivation0xFResponse Time and Throughput Response Time Time between start and completion of a task, as observed by end user Response Time = CPU Time + Waiting Time (I/O, OS scheduling, etc.) Throughput Number of tasks the machine can run in a given period of time Decreasing execution time improves throughput Example: using a faster version of a processor Less time to run a task more tasks can be executed Increasing throughput can also improve response time Example: increasing number of processors in a multiprocessor More tasks can be executed in parallel Execution time of individual sequential tasks is not changed But less waiting time in scheduling queue reduces response time   > . d 5   >.G5 @Book s Definition of PerformanceGFor some program running on machine X X is n times faster than Y &`d $"What do we mean by Execution Time?Real Elapsed Time Counts everything: Waiting time, Input/output, disk access, OS scheduling, & etc. Useful number, but often not good for comparison purposes Our Focus: CPU Execution Time Time spent while executing the program instructions Doesn't count the waiting time for I/O or OS scheduling Can be measured in seconds, or Can be related to number of CPU clock cycles F F? F: F d F?:  Clock Cycles,Clock cycle = Clock period = 1 / Clock rate Clock rate = Clock frequency = Cycles per second 1 Hz = 1 cycle/sec 1 KHz = 103 cycles/sec 1 MHz = 106 cycles/sec 1 GHz = 109 cycles/sec 2 GHz clock has a cycle time = 1/(2109) = 0.5 nanosecond (ns) We often use clock cycles to report CPU execution time_7.127Improving PerformancehTo improve performance, we need to Reduce number of clock cycles required by a program, or Reduce clock cycle time (increase the clock rate) Example: A program runs in 10 seconds on computer X with 2 GHz clock What is the number of CPU cycles on computer X ? We want to design computer Y to run same program in 6 seconds But computer Y requires 10% more cycles to execute program What is the clock rate for computer Y ? Solution: CPU cycles on computer X = 10 sec 2 109 cycles/s = 20 109 CPU cycles on computer Y = 1.1 20 109 = 22 109 cycles Clock rate for computer Y = 22 109 cycles / 6 sec = 3.67 GHz# j     #j )-/,$                             "Clock Cycles per Instruction (CPI)Instructions take different number of cycles to execute Multiplication takes more time than addition Floating point operations take longer than integer ones Accessing memory takes more time than accessing registers CPI is an average number of clock cycles per instruction Important point Changing the cycle time often changes the number of cycles required for various instructions (more later)8 2$ 2K$ 2k% 28 2jPerformance EquationTo execute, a given program will require & Some number of machine instructions Some number of clock cycles Some number of seconds We can relate CPU clock cycles to instruction count Performance Equation: (related to instruction count)<+ 2W 2k 2+Wk"Understanding Performance Equation Using the Performance EquationSuppose we have two implementations of the same ISA For a given program Machine A has a clock cycle time of 250 ps and a CPI of 2.0 Machine B has a clock cycle time of 500 ps and a CPI of 1.2 Which machine is faster for this program, and by how much? Solution: Both computer execute same count of instructions = I CPU execution time (A) = I 2.0 250 ps = 500 I ps CPU execution time (B) = I 1.2 500 ps = 600 I ps Computer A is faster than B by a factor = = 1.2 nH P@ Hx; Pa ( ADetermining the CPIDifferent types of instructions have different CPI Let CPIi = clocks per instruction for class i of instructions Let Ci = instruction count for class i of instructions Designers often obtain CPI by a detailed simulation Hardware counters are also used for operational CPUs|3~4 25;$,8;Example on Determining the CPI?Problem A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: class A, class B, and class C, and they require one, two, and three cycles per instruction, respectively. The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C Compute the CPU cycles for each sequence. Which sequence is faster? What is the CPI for each sequence? Solution CPU cycles (1st sequence) = (21) + (12) + (23) = 2+2+6 = 10 cycles CPU cycles (2nd sequence) = (41) + (12) + (13) = 4+2+3 = 9 cycles Second sequence is faster, even though it executes one extra instruction CPI (1st sequence) = 10/5 = 2 CPI (2nd sequence) = 9/6 = 1.5kD k j ESecond Example on CPI MIPS as a Performance Measure~MIPS: Millions Instructions Per Second Sometimes used as performance metric Faster machine larger MIPS MIPS specifies instruction execution rate We can also relate execution time to MIPS^L, *L WDrawbacks of MIPSThree problems using MIPS as a performance metric Does not take into account the capability of instructions Cannot use MIPS to compare computers with different instruction sets because the instruction count will differ MIPS varies between programs on the same computer A computer cannot have a single MIPS rating for all programs MIPS can vary inversely with performance A higher MIPS rating does not always mean better performance Example in next slide shows this anomalous behavior20x<:0vx<o0x<20vx<=0x<)0vx<q0x<2:o2 =)q MIPS exampleDTwo different compilers are being tested on the same program for a 4 GHz machine with three different classes of instructions: Class A, Class B, and Class C, which require 1, 2, and 3 cycles, respectively. The instruction count produced by the first compiler is 5 billion Class A instructions, 1 billion Class B instructions, and 1 billion Class C instructions. The second compiler produces 10 billion Class A instructions, 1 billion Class B instructions, and 1 billion Class C instructions. Which compiler produces a higher MIPS? Which compiler produces a better execution time? E #XSolution to MIPS ExampleFirst, we find the CPU cycles for both compilers CPU cycles (compiler 1) = (51 + 12 + 13)109 = 10109 CPU cycles (compiler 2) = (101 + 12 + 13)109 = 15109 Next, we find the execution time for both compilers Execution time (compiler 1) = 10109 cycles / 4109 Hz = 2.5 sec Execution time (compiler 2) = 15109 cycles / 4109 Hz = 3.75 sec Compiler1 generates faster program (less execution time) Now, we compute MIPS rate for both compilers MIPS = Instruction Count / (Execution Time 106) MIPS (compiler 1) = (5+1+1) 109 / (2.5 106) = 2800 MIPS (compiler 2) = (10+1+1) 109 / (3.75 106) = 3200 So, code from compiler 2 has a higher MIPS rating !!!1 s 4  f  6 1.  /  4#   1   9-/ "   *    6Amdahl s LawAmdahl's Law is a measure of Speedup How a computer performs after an enhancement E Relative to how it performed previously Enhancement improves a fraction f of execution time by a factor s and the remaining time is unaffectedv% W  g0nZ%W"&Example on Amdahl's LawXSuppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? Solution: suppose we improve multiplication by a factor s 25 sec (4 times faster) = 80 sec / s + 20 sec s = 80 / (25  20) = 80 / 5 = 16 Improve the speed of multiplication by s = 16 times How about making the program 5 times faster? 20 sec ( 5 times faster) = 80 sec / s + 20 sec s = 80 / (20  20) = " Impossible to make 5 times faster!-lki8$ H -$ : BenchmarkszPerformance best obtained by running a real application Use programs typical of expected workload Representatives of expected classes of applications Examples: compilers, editors, scientific applications, graphics, ... SPEC (System Performance Evaluation Corporation) Funded and supported by a number of computer vendors Companies have agreed on a set of real program and inputs Various benchmarks for & CPU performance, graphics, high-performance computing, client-server models, file systems, Web servers, etc. Valuable indicator of performance (and compiler technology)80n-0n-10n-0n-n0n-<0n-81n<The SPEC CPU2000 Benchmarks#SPEC 2000 Ratings (Pentium III & 4) Performance and Power@Power is a key limitation Battery capacity has improved only slightly over time Need to design power-efficient processors Reduce power by Reducing frequency Reducing voltage Putting components to sleep Energy efficiency Important metric for power-limited applications Defined as performance divided by power consumption <7 <: <@ < <d <7:@d Performance and Power Energy EfficiencyJ Energy efficiency of the Pentium M is highest for the SPEC2000 benchmarksK0KThings to RememberPerformance is specific to a particular program Any measure of performance should reflect execution time Total execution time is a consistent summary of performance For a given ISA, performance improvements come from Increases in clock rate (without increasing the CPI) Improvements in processor organization that lower CPI Compiler enhancements that lower CPI and/or instruction count Algorithm/Language choices that affect instruction count Pitfalls (things you should avoid) Using a subset of the performance equation as a metric Expecting improvement of one aspect of a computer to increase performance proportional to the size of improvementt0u4#0u4# /     "Shl  0` 33` Sf3f` 33g` f` www3PP` ZXdbmo` \ғ3y`Ӣ` 3f3ff` 3f3FKf` hk]wwwfܹ` ff>>\`Y{ff` R>&- {p_/̴>?" dd@$?vdd(@#?  n?" dd@   @@``PR    ?  ` p>>    (    6  `  T Click to edit Master title style! !  0  `x  RClick to edit Master text styles Second level Third level Fourth level Fifth level!     SX  0" `* Performance COE 308  Computer Architecture Muhamed Mudawar  slide *H 2Hc F H  0޽h ? 3380___PPT10.И@r Default Design  0 QI`(    6  `  T Click to edit Master title style! !  0  `b  W#Click to edit Master subtitle style$ $H  0޽h ? 3380___PPT10.88/k&d 0  t(    04Q B   P*    0^  wB  R*  d  c $ ?qU    08b  K  RClick to edit Master text styles Second level Third level Fourth level Fifth level!     S  6g .   P*  :  6k  w.` H@___PPT9"@ l*"  H  0rllC ? 3380___PPT10.J F   h(  h h NDP9tt B  9 \* p88pp h N\[9tt  wB 9 ^* p88pp h TT^9tt .  9 \* p88ppt h TP`9tt  w.` 9H@___PPT9"@ *"  p88ppH h 0rllC ? 3380___PPT10.Jp: 0  06(  0~ 0 s * `r  x 0 c $| `  H 0 0޽h ? 33___PPT10i.И0+D=' = @B +c  0L0  5 NF (   ^  6?u  "   fxaxaG  ? `x  ~  s * `   H  0޽h ? U>=UU(___PPT10i.L`0g+D=' = @B +  0 L0  5  (   ^  6? ~  s * `   r  S  `x  H  0޽h ? U>=UU(___PPT10i.L`0g+D=' = @B +  0L0  5 ph (   ^  6? "   f]xaxaG  ?E `x  0$`~  s *^ `   8   =    `_1?S"`?)  Execution timeX20( 2BJ B   TDjJ?)     f?1?S"`?)  O120( 2BJ   Z Ԕ?R    `D1?S"`?[ P PerformanceX =N0 2 BJBJ  8 $ ,   $ , JT    #  ^  ,   Z@1?S"`?'  t PerformanceY2 0( 2 BJ ~B   NDjJ? ,   ZO1?S"`?  t PerformanceX2 0( 2 BJ `T J @   # b ^ } 7   ZT1?S"`?J ' $  Execution timeX20( 2BJ ~B   NDjJ?J @ 7   ZpZ1?S"`?J $  Execution timeY20( 2BJ    Z Ԕ?$ ,     `8_1?S"`?S   _= n@0( 2BBJ     `d1?S"`? b   O=20( 2BJH  0޽h ? U>=UU(___PPT10i.L` ]g+D=' = @B +  0L0  5 tl0 (   ^  6?G "   f yxaxaG  ? `x     By"` `   H  0޽h ? U>=UU(___PPT10i.L` ]g+D=' = @B +  0L0  5 P }(     s *|'YU   0 $`~  s *} `   w L c  # \    ,0e0e    BmCDEHFP o 8c8c     ?1 d0u0@Ty2 NP'p<'pA)BCD|E||WWW W m%(@cB   Z D1?]   # l1?"`Jv ACycle 10 2~B   N D1?g~B   N D1?]g]B   Z D1?]    # l1?"`C J  ACycle 20 2~B   N D1? g B   Z D1? ]   # l,1?"` Jv ACycle 30 2~B   N D1?]g]L    # y ,)    `Ԕ?"`  / CPU Execution Time = CPU cycles cycle time800 2#B  N l -   } g   Z1?S"`? - X Clock rate2 0( 2 BJ~B   NDjJ?Z     Z1?S"`? -  X CPU cycles2 0( 2 BJ   3 r1?"`l<   == 0 BH  0޽h ? U>=UU(___PPT10i.L` δg+D=' = @B +  0( L0  5 d\p (   ^  6? ~  s *а `      Bб"` p<$@  0  H  0޽h ? U>=UU(___PPT10.L` ?g+Q7D^' = @B D' = @BA?%,( < +O%,( < +Dd' =%(D ' =%(D' =4@BBBB%()))D' =1:Bvisible*o3>+B#style.visibility<* %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<* D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<* D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<* %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<* D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<* Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<* *%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<* *D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<* *Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<* *i%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<* *iD' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<* *i+-  0]L0  5 b$Z$-- #(   ^  6?"   f0xaxaG  ? `x<$@ 0  ~  s * `   L 0    #       # l1?S"`?   ?1"0 2   # l1?"`   >I1 0 2B   T Do?0 }     # lH1?S"`?   Dcycles"0 2~B    N Do?  ~B    N Do?  ~B    N Do?  ~B    N Do?  ~B   N Do?  ~B   N Do?  ~B   N Do?  ~B   N Do?  ~B   N Do?  ~B   N Do?  ~B   N Do?  ~B   N Do?  ~B   N Do?  ~B   N Do?  ~B   N Do?     # l1?"`   >I2 0 2   # lH1?"`  >I3 0 2   # l1?"`   >I6 0 2   # lh1?"`  >I4 0 2   # l1?"`  >I5 0 2   # l$1?"`   >I7 0 2    # lx1?S"`?   ?2"0 2     # l1?S"`?   ?3"0 2  !  # l,1?S"`?   ?4"0 2  "  # l1?S"`?   ?5"0 2  #  # l1?S"`?  ?6"0 2  $  # lD1?S"`?   ?7"0 2  %  # l1?S"`?   ?8"0 2  &  # l1?S"`?   ?9"0 2  '  # l 1?S"`?   @10"0 2  (  # lH1?S"`?    @11"0 2  )  # l1?S"`?   @12"0 2  *  # l,1?S"`?   @13"0 2  +  # l1?S"`?   @14"0 2 , # l1?S"`? "f  ECPI =$0 2> - # l1?S"`? Cf ,$ 0 H14/7 = 2$ 0 2 H  0޽h ? U>=UU( ___PPT10.L`g+.u}Du' = @B D0' = @BA?%,( < +O%,( < +DA' =%(D' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*- %(D' =%(D' =%(D3' =4@BB BB%(D' =1:Bvisible*o3>+B#style.visibility<* "%(D' =-o6Bdissolve*<3<* "D3' =4@BB BB%(D' =1:Bvisible*o3>+B#style.visibility<* "%(D' =-o6Bdissolve*<3<* "+8+0+- 0 +  0L0  5  Y(   ^  6?G "   f0.xaxaG  ? `x  ~  s */ `   %   `<3Ԕ?"` I  ' CPU cycles = Instruction Count CPID(0( 2!BF   `|wԔ?"`* I . Time = Instruction Count CPI cycle time^/0( 2B B  H  0޽h ? U>=UU(___PPT10i.L`kg+D=' = @B +   0L0 % (( (    P   3 #"*YM    ZJ1?2P SX(x$$``   ZW1? 2 SX(x$$``   Z`1?  SX(x$$``   Z@j1? SISA x$$``   Z[1?2Pe SX(x$$``   Z{1? 2e SX(x$$``    ZT1? e J(x$$``    Z1?e \ Organization   x$$``    Zv1?2eP  SX(x$$``    Z81? e2  J(x$$``    Z81?e   J(x$$``   Z1?e  Z Technology   x$$``   Zt1?2kP J(x$$``   Z1? k2 SX(x$$``   Z1?k  SX(x$$``   Z(1?k XCompiler   x$$``   Z 1?2Pk J(x$$``   Z/1? 2k J(x$$``   Z1? k SX(x$$``   Zh1?k WProgram x$$``   Z1?2P WCycle(x$$``   Z1? 2 UCPI(x$$``   Z1?  YI-Count(x$$``   Z1? H x$$``xB   H o ?PrB   B 1 ?PrB   B 1 ?kPkrB   B 1 ?PxB   H o ? P xB    H o ? rB !  B 1 ? rB "  B 1 ?   rB #  B 1 ?22 xB $  H o ?PP rB %  B 1 ?ePerB &  B 1 ?P~ ' s *" `   ; (  `L$Ԕ?"`C -Time = Instruction Count CPI cycle timeT.0( 2B B  H  0޽h ? X(=^y___PPT10Y+D=' = @B +    0L0  5  "(   ^  6? "   fxaxaG  ? `x<$@  0  ~  s * `     )  # u FB,$D  0~B   N DjJ?     # l1?"` )w  G600 I$0( 2   # lȘ1?"` ) G500 I$0( 2H  0޽h ? U>=UU(___PPT10.L`g+֒Dj' = @B D%' = @BA?%,( < +O%,( < +D4' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<* %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<* D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<* D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<* :%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<* :D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<* :Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<* :q%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<* :qD' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<* :qDn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<* q%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<* qD' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<* qD4' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<* %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<* D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<* D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<* %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<* D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<* +L  0 c[$ (  $ ~ $ s *dD `    $ s *T `x  0$`8 p  $ p y $  3 r`VԔ?"`p  CPU cycles = (CPIi Ci)T0 2BJBJB,VT =]#  $ # p  $  # l@1?"`= ]#  Gi = 1&0 2c $  # l`1?"`=]V  Cn&0 2c  $  # l c1?"`=]  B"$0 2c,X L     $ # 2 ,B   $  # lgԔ?"`   C CPI = 0 2By  $  # l<Ԕ?S"`?a-   (CPIi Ci)T 0 2BJBJB,~B  $  N Do?k Lk VT :ZV  $ # E  $  # lo1?"`:ZV  Gi = 1&0 2c $  # ll1?"`:Z Cn&0 2c $  # lw1?"`:Z  B"$0 2c,T s &  $ # b 2  $  # lLu1?"`@   Gi = 1&0 2c $  # l~1?"`s   Cn&0 2c $  # l1?"` y  B"$0 2c,< $  # l zԔ?S"`?Z & ]  rCi:0 2BJBH $ 0޽h ? a(___PPT10i.ƶ@TE+D=' = @B +   0L0  5 jb ( (  ( ^ ( 6?~ ( s *Џ `    ( s *D `<$@  0  8$ $`H ( 0޽h ? U>=UU(___PPT10.L`&g+Q7D' = @B D_' = @BA?%,( < +O%,( < +D4' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*( "+%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*( "+D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*( "+D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*( +r%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*( +rD' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*( +rDn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*( r%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*( rD' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*( rDn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*( %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*( D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*( Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*( @%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*( @D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*( @+   0L0   @0  (  0 P 0 Txxaxa1?'Y  Given: instruction mix of a program on a RISC processor What is average CPI? What is the percent of time used by each instruction class? Classi Freqi CPIi ALU 50% 1 Load 20% 5 Store 10% 3 Branch 20% 20 $0  8Q   .>/pg 0 TTxaxa1?' 'Yp,$  0 qHow faster would the machine be if load time is 2 cycles? What if two ALU instructions could be executed at once?"r0 r~ 0 s * `   z 0 Txaxa1?J  ,$  0 <CPIi Freqi 0.51 = 0.5 0.25 = 1.0 0.13 = 0.3 0.22 = 0.4$0( $0(  F F  FF  $F$(($1H e 0 Tdxaxa1?IY ,$  0 =%Time 0.5/2.2 = 23% 1.0/2.2 = 45% 0.3/2.2 = 14% 0.4/2.2 = 18%:0( *0(  >H  0 Tx1?  ,$  0 a#Average CPI = 0.5+1.0+0.3+0.4 = 2.2"$0  $H 0 0޽h ? X(=^6.___PPT10+KD' = @B D' = @BA?%,( < +O%,( < +D{' =%(D#' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*0 %(D' =+4 8?dCB1+#ppt_w/2BCB#ppt_xB*Y3>B ppt_x<*0 D' =+4 8?\CB#ppt_yBCB#ppt_yB*Y3>B ppt_y<*0 D{' =%(D#' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*0 %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*0 D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*0 D{' =%(D#' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*0 %(D' =+4 8?dCB1+#ppt_w/2BCB#ppt_xB*Y3>B ppt_x<*0 D' =+4 8?\CB#ppt_yBCB#ppt_yB*Y3>B ppt_y<*0 D{' =%(D#' =%(D' =A@BBBB0B%(D' =1:Bvisible*o3>+B#style.visibility<*0 %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*0 D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*0 ++0+0 0 ++0+0 0 ++0+0 0 ++0+0 0 +   0L0 `8 (  8  8 S ~xaxa1 ? `x  6"`$`~ 8 s * `   L Z  8 # !Cz T C3 8 # 6> ~B 8  N DjJ?3 8  N1?C3c MInstruction Count 0( 2 8  N1?3 xExecution Time 106H0( 2B T C3  8 # > ~B  8  N DjJ?3  8  N 1?C3c F Clock Rate 0( 2   8  N1?3 m CPI 106H 0( 2B   8  N1?  AMIPS =0 2 8  NP1?Tt  == 0( 2 8  T Ԕ?Z 8 7R C 8  !-T C3 8 # o w ~B 8  N DjJ?3 8  N1?C3c F Inst Count 0( 2  8  N !1?3 n MIPS 106H 0( 2B T C3 8 # m ~B 8  N DjJ?3 8  N0'1?C3c fInst Count CPI:0( 2 B 8  N,1?3 R Clock Rate, 0( 2   8  Nh11?  < JExecution Time =0 2 8  N51?  Z< == 0( 2 8  T Ԕ?7R CH 8 0޽h ? a(y___PPT10Y+D=' = @B +  0 @ Z(  @ ~ @ s *= `    @ s *F `m  * 8`@HH @ 0޽h ? a(___PPT10i. C+D=' = @B +c   0L0  5 NFD (  D ^ D 6?" D  fVxaxaG  ? `x  ~ D s *\W `   H D 0޽h ? U>=UU(___PPT10i.L`&g+D=' = @B +4  0 L (  L ~ L s *x_ `    L  Bd`"`'<$D  0  H L 0޽h ? a(11___PPT101.pc+qZDh1' = @B D#1' = @BA?%,( < +O%,( < +Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*L 1j%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*L 1jD' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*L 1jDn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*L j%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*L jD' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*L jDn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*L %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*L D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*L Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*L %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*L D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*L Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*L [%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*L [D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*L [Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*L [%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*L [D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*L [D4' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*L %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*L D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*L D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*L %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*L D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*L Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*L *%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*L *D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*L *Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*L *c%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*L *cD' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*L *cDn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*L c%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*L cD' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*L c+*  0 A9P (  P ~ P s *p `   ~ P s *hq `V  H8 =b P T = P # w  P   `Ls1?"`= LPerformance with E0 2 P   `t1?"`= LPerformance before0 2~B P  N DjJ?= T Z P #   P   `,1?"`Z a ExTime before0 2  P   `w1?"`Z a ExTime with E0 2~B  P  N DjJ?Z  P   `4Ԕ?"`=b tSpeedup(E) = =0 2  Z   P  `7Ԕ?"`   @dExTime with E = ExTime before (f / s + (1  f ))30( 2B$   W8  ] P  v P   `Ԕ?"` ] r Speedup(E) = 0 2    T  0  P #  C"6 P   `1? +  $(f / s + (1  f ))^0  P  T1? 0  ;10 ~B P  N Do?   H P 0޽h ? a(___PPT10i.>0&+D=' = @B +#  0hL0 T *(  T d T <1?" T 3 rtG 1 ?"` `m<$D  0  ~ T s *L `   H T 0޽h ? U>=UU(  ___PPT10l .O@p+qZD@ ' = @B D' = @BA?%,( < +O%,( < +Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*T %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*T D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*T Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*T =%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*T =D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*T =Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*T =_%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*T =_D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*T =_Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*T _%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*T _D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*T _Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*T %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*T D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*T Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*T %(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*T D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*T Dn' =%(D' =%(D' =4@BBBB%(D' =1:Bvisible*o3>+B#style.visibility<*T -%(D' =+4 8?\CB#ppt_xBCB#ppt_xB*Y3>B ppt_x<*T -D' =+4 8?dCB1+#ppt_h/2BCB#ppt_yB*Y3>B ppt_y<*T -+'  0L0  5  \ (  \ ^ \ 6?M~ \ s * `   ~ \ s * `m  H \ 0޽h ? U>=UU(___PPT10i.L`og+D=' = @B +FP  0 ]OUOX_d N(  d ~ d s * `   L 'Y)  _d  #"RFrrrrrrrrrrrrrr'Y)   d   f1? Y eShallow water model0xx$$`` d   f1?b   Vswim0xx$$``  d   f1?}b  lFPGA placement and routing0xx$$``  d   f1?'} ovpr0xx$$``0 d   f1? Y> &Multigrid solver in 3D potential field'0x' x$$``  d   f1?b  > qmgrid0xx$$``  d   f01?}b > `GNU C compiler0xx$$``   d   f1?'}> ogcc0xx$$``   d   f,1? >Y oPartial differential equation0xx$$``  d   f(61?b >  qapplu0xx$$``  d   fX?1?}>b  lCombinatorial optimization0xx$$``  d   f91?'>} omcf0xx$$`` d   fJ1? Y t"Three-dimensional graphics library#0x#x$$`` d   fZ1?b   Vmesa0xx$$`` d   f,]1?}b  _ Chess program0xx$$`` d   fU1?'} Xcrafty0xx$$``  d   fo1? Y! nComputational fluid dynamics0xx$$`` d   f|1?b  ! rgalgel0xx$$`` d   f1?}b ! iWord processing program0xx$$`` d   fT1?'}! Xparser0xx$$`` d   f 1? !Y s!Neural networks image recognition"0x"x$$`` d   f1?b !  Uart0xx$$`` d   fԮ1?}!b  hComputer visualization0xx$$`` d   f01?'!} Ueon0xx$$`` d   fԹ1? Yc  u#Seismic wave propagation simulation$0x$x$$`` d   fH1?b  c  requake0xx$$`` d   f1?}b c  bPerl application0xx$$`` d   f1?'}c  sperlbmk0xx$$``   d   f1? c Y  lImage recognition of faces0xx$$`` !d   f1?b c   sfacerec0xx$$``  "d   f1?}c b   kGroup theory, interpreter0xx$$`` #d   f1?'c }  Ugap0xx$$`` $d   f1?  Y  iComputational chemistry0xx$$`` %d   f1?b   pammp0xx$$`` &d   f<1?} b  jObject-oriented database0xx$$`` 'd   fL(1?' }  Xvortex0xx$$`` (d   f*1? YF  }Primality testing0x  x$$`` )d   fP#1?b F  qlucas0xx$$`` *d   fE1?} b F  ] Compression 0x x$$`` +d   fM1?' }F  Wbzip20xx$$`` ,d   fO1? F Y  x&Crash simulation using finite elements'0x'x$$`` -d   fH1?b F  Wfma3d0xx$$``  .d   fT[1?}F b  kPlace and route simulator0xx$$`` /d   fps1?'F }  qtwolf0xx$$``  0d   f|1? Y  mHigh-energy nuclear physics0xx$$`` 1d   fd1?b  tsixtrack 0x x$$`` 2d   fw1?} b  J0xx$$`` 3d   f01?' }  J0xx$$`` 4d   f1? Y)  u#Meteorology: pollutant distribution$0x$x$$`` 5d   fp1?b )  papsi0xx$$`` 6d   fX1?} b )  J0xx$$`` 7d   f@1?' })  J0xx$$``( 8d   f 1? [Y Quantum chromodynamics0xx$$`` 9d   f<1?b [  swupwise0xx$$`` :d   f1?}[b  ] Compression 0x x$$`` ;d   f01?'[} pgzip0xx$$``  d  3 rD1?}b [ _ Description0xx$$`` ?d  3 r 1?'}[ X Name0xx$$``& @d  3 r@1?b Y |(14 FP benchmarks (Fortran 77, 90, and C))8x)x$$``  Ad  3 r1?'b  v"12 Integer benchmarks (C and C++) #8x#x$$``xB Bd  H 1 ?'YrB Cd  B 1 ?'YrB Dd  B 1 ?'YxB Ed  H 1 ?') Y) xB Fd  H 1 ?'') rB Gd  B 1 ?b b ) xB Hd  H 1 ?YY) rB Id  B 1 ?}}) rB Jd  B 1 ?  ) rB Kd  B 1 ?' Y rB Ld  B 1 ?' Y rB Md  B 1 ?'F YF rB Nd  B 1 ?' Y rB Od  B 1 ?' Y rB Pd  B 1 ?'c Yc rB Qd  B 1 ?'YrB Rd  B 1 ?'!Y!rB Sd  B 1 ?'YrB Td  B 1 ?'YrB Ud  B 1 ?'>Y>rB Vd  B 1 ?'YxB Wd  H 1 ?'[Y[, Xd N"1?"`9 'Y TWall clock time is used as metric Benchmarks measure CPU time, because of little I/O"U vUH d 0޽h ? a(___PPT10i.)+D=' = @B +g  0 ~_v_ 0 ^(      60 "` `   7  T,A1? #j wSPEC ratio = Execution time is normalized relative to Sun Ultra 5 (300 MHz) SPEC rating = Geometric mean of SPEC ratios x0 xY8 A  A   BTC<DE<FFOO8<82;2=0A/D,H'I"IIHFD A = : 6 31.*((-%2!5787 50,%       % (*-//-, '#'( ,16;BDHMP RTR%R(P-O0M2F7D8A:;:8<@`     BPC8DEF''P88    !%(*, -///.P.P8&.&%%#!         .&.QT@`@`J     BPC7DE4F> P77  -#-#,,-G-GPP7 @` =    BTCBDEF"FF6 8?DIM PRTTT%R*P/O2K5H8D;?=:?5@/B*B#B@= 8 50-(#       '-025!7*837;5?3B2F-I'I IIHDB? ; 6 @`     BTCADEF"FF6 8?DIM PRTTT$R)P.O1K4H8D;?<:>5@/A*A#A@< 8 40,(#       &,014!6*836;4?3B1F,I&I IIHDB? ; 6 @`U  H   C > H    BPC8DE,F6 P88.? PP .P.P8@` +    BPC:DE$F. P" " :: PP"@`     BPC/DEF00FPP/I/D,?(8$3(!      $$%/- , *'$    %&*/ 8@F$Fbd@`y     BRC0DEFDD(0/- *%"  (6@GM NPRRRP"M%I(G*D+=/8/3/(0('3'9%?%B"F GIIIGFB? 9 3 (           "#%'('@`@`> n    BRC/DEFDD(//+ (%"  (6@GM N PRRRP M%I(G*D+=-8/3/(/(%3%9%?#B"FGIIIGFB ? 9 3 (          "#%%(%@`@` 5    BRC0DEFDD(0/- *%"  (6@GM N PRRRP"M%I(G*D,=-8/3/(0('3'9%?$B"F GIIIGFB ? 9 3 (           "$%'('@`@`     B=C2DEF EE4 8;===#;&9*8-20+2(0$0!-+(%        "%&0 .+(#    #48;; 8 4  !#$#$%&&+(/&2%4"442/+ ( !  @`@`v     PB;C+DE|F;++# #  ;;         !!!;!;+@D@`@ k X   (BRC.DEF99PIM N PRRPP N#I(D+;,3.,,#+(&# PP3$8$=#D GIIIGD @ =94/*& # # %!*#-$3$ux@`@` 6    BTC@DEF"FF6 8?DIM PRTTT#R*P-O2K5H8D;?=:>5@/@*@#@@= 6 30+(#       &-025!6*636;5?3B2F+I&IIIHDB? ; 6 @`   BPC2DE,F6 P22 (#(#,,(P(P2@`_   BPC8DEF''P88    !%(*, -///.P.P8&.&%%#!         .&.QT@`@`S   BPC/DEF00FPP/I/D+?(8#3(!      "##%/- + *'#    %&*/ 8@F#Fbd@`   BRC0DEFDD(0.- *%#  (6@GM NPRRRP"M%I(G*D+=.8.30(0(&3&9%?%B#F GIIIGFB? 9 3 (           "%%&(&@`@`   BRC.DEFDD(..- *%"  (6@GM N PRRRP M%I(G*D+=-8.3.(.(%3%9%?#B"FGIIIGFB ? 9 3 (          "#%%(%@`@`q   BRC0DEFDD(0/- *%"  (6@GM N PRRRP"M%I(G*D,=-8/3/(0('3'9%?$B"F GIIIGFB ? 9 3 (           "$%'('@`@`7gp   @BRC.DEF<<R..& &#    &*-49 ;==; 9!8#6%R%R.&$&(&+%/#1 244421/+ ( #        !%%&&{|@`@`   PB=C0DEF>>(*/2689 ;===;!8&4)1-+.&000. - )&!  !!($&*&-%/!1 24442- (&        !%%&@`@`8   B;CDEXFb;     ;;.0@`H   BRC DE`FjR          RR24@`a`   0B=C0DEF::00. -(%"  (+/49 ;==;"8&4*1-+.$00(#((&/#24421/ + ( #       "#%&((wx@`@`4d8 !  B;CDEXFb;     ;;.0@` )  "  B;CHDEF33;HH@ @=:50*&#    ;;    ;;((( + - 2 5 8;=>>;>;Hhl@` #  B=C0DEF EE4 8;===!;&9)8+2.+0(0$.!,)&#        !$&. ,+(#   #48;;8 4  !#$!$#&$+&/&2$4 442/+ ( !  @`@` $  PB;C+DE|F;++" "  ;;         "";";+@D@`Ju` %  0B=C.DEF;;& (+1489 ;===; 9!8$4(1++,$...+ )(#        !#$$$#$($+#/!1244421- & x|@`? &  PB=C2DEF>>( */268 9 ;===;#8(4+1/+0&2220 / +(#  !!($(*(-'/#1"24442- ( (         #%'(@`@`  '  <lO  A MC00 F  (  <  <_   M000 F  ?  <x   M100 F  @  <$ ^  M500 F  A  <Xa   M000 F  B  <   M000 F  C  <8 M300 F  D  < M000 F  E  <![ M000 F  F  <_ M000 F  G  <%_ M200 F  H  <hd M000 F  I  < M000 F  J  <H M000 F  K  <e M200 F  L  <( M500 F  M  < M000 F  N  < Y M000 F  O  <T  M300 F  P  < $^ M500 F  Q  < a M000 F  R  <t  M000 F  S  <  4 M000 F  T  <t+   M200 F  U  <0 +   M000 F  V  <$ + 4  M000 F  W  <t)  =  M400 F  X  <|.  =  M000 F  Y  <$2  4=  M000 F  Z  <47 T   M600 F  [  <4 T   M000 F  \  <x@ T 4  M000 F  ]  <E d M800 F  ^  <B d M000 F  _  <$I 4d M000 F  `  <xS @|z M100 F  a  <W | M000 F  b  <X\ | M000 F  c  <` |4 M000 F  d  <8e @z M100 F  e  <i  M200 F  f  <n  M000 F  g  <r 4 M000 F  h  <v @z! M100 F  i  <h{ ! M400 F  j  < ! M000 F  k  <H 4! M000 F H l  C aH m  C Zh  n  BC DEF" @`ZhH o  C  a  p  BC DEF"  @`Z h H q  C   r  BC DEF"  @`   s  BC DEF" @`H t  C S aZ H u  C  a H v  C ~ a H w  C aH x  C aH y  C AaGH z  C  & H {  C ] d H |  C   H }  C   H ~  C ] d H   C  $ H   C   H   C      BzC5DE$F. (4Zsz a8'5(@`S" x  6    ByCDE,F6 4RWe qy ma%>Z @`S"  ^ b    B C DEFw  @`     BCXDE$F. M;7 (ABXM@`S" [  F i    BC DEFw  @`Z ^ b i    B}ChDEF_t} h_ @`S" ?      BC DEFw   @`?  G  6   BC|DEDFNph$_dG8' 2CkR+jt|p$(@`S"  D   B C DEFw @`     B7C4DEF&*.7 4*@`S" N   pB C DE Fw  @`DN   BC@DE4F> 6*,[ } ^3&5 @6 @`S" i$   pB C DE Fw  @`$   BC=DE$F. 0!&R W#%3=0@`S" e   xBC DEFw  @`ei   BCFDEF&;@! G-F;@`S" y   xBC DEFw   @`   BC;DEF&.N Q!;.@`S" I6   BC DEFw   @`wF   B1C\DELFVO5H:/" (1 %'/<G7U\O(,@`S" 5fV   xBC DEFw @`3I6V&   BCGDE<FF? ;Wo%,|cH*$ G? $@`S" ]    B C DEFw  @`]h   B|C=DEF2=| u2 @`v . k    BwCFDEF;Fw p; @` b 9    pBC DE F   @` . 9    BC`DEFU` U @`[ F    xBC DEF  @`[ b    B~CZDE,F6 P/+Vhw~ o]75ZP@`? N    BC DEF   @`? G    BC8DE4F> ++'k  "m-+38+ @` ! Y    BC DEF"  @` L Y    BCDE<FF4qpM'/Uxy=  $@`'+    B C DEF  @`! ,    BTCDE<FFy`*;3=KT G?#6@*f $@`{.   B CDEF    @`{'/   BCHDE,F6 ?5-D G*&8@ H?@`Ok   pB C DE F  @`   BCRDE$F. G,5P" W-3ARG@`e \   BC DEF   @`eOl\   BCUDEFI UI @`   pBC DE F  @`    BCHDEF&=Q X(H=@`<   pBC DE F  @`   B5CDE,F6 ~P&.5 1[~@`5j   xBC DEF  @`5<   BCDEDFN6Nd]" *oeX@$ $(@`a2   B C DEF  @`aj   B1CrDE,F6* n @ &1(! E rn@`     BUC6DE$F.* ,*NU 1( 6,@` = #    B C DEF*  @` #    BTC+DEF* MT + @`6    pBC DE F*  @`6 =    BSC2DEF*'LS 2' @`    pBC DE F*   @`    BUC0DEF*%NU 0% @` $    pBC DE F*  @`    B4CDEF 4( @`"T  ? F    BUC"DEF"U Q @`S"  A    B C DEF"v @`     BKC"DEF"K G @`S" =    pBC DE Fv   @`= A    BRC#DEF#R O @`S"  s    pBC DE Fv  @`  €  BSC"DEF"S O @`S"  ^ &  À  pBC DE Fv  @` s H Ā  C iS Z H ŀ  C i  H ƀ  C i  H ǀ  C i~  H Ȁ  C iH ɀ  C iH ʀ  C iAGH ˀ  C i ̀  <h    MP00 F  ̀  <T  R  Me00 F  π  < c   Mn00 F  Ѐ  <    Mt00 F  р  <    Mi00 F  Ҁ  <P    Mu00 F  Ӏ  <  r  Mm00 F  Ԁ  < s   M 00 F  Հ  <    MI00 F  ր  <    MI00 F  ׀  <d    MI00 F  ؀  <    M 00 F  ـ  <D  Y  MC00 F  ڀ  < ^ {  MI00 F  ۀ  <$ }   MN00 F  ܀  <   MT00 F  ݀  <   J  M200 F  ހ  < O   M000 F  ߀  <    M000 F    <     M000 F    <t! 7a MP00 F    <8' _7 Me00 F    <+ 7 Mn00 F    </ 7 Mt00 F    <5 7 Mi00 F    <9 7R Mu00 F    <> Q7 Mm00 F    <,< 7 M 00 F    <|B 7 M400 F    <L  7' M 00 F    <@Q )7t MC00 F    <U y7 MI00 F    < Z 7 MN00 F    < 7( MT00 F    <|\ ,7f M200 F    <g j7 M000 F    <l 7 M000 F    <q 7  M000 F    <n  H K  MP00 F    <,u G K  Me00 F    <  K  Mn00 F    <  K  Mt00 F    <`  K  Mi00 F    <Ќ  : K  Mu00 F    <@ 8 K  Mm00 F    <  K  M 00 F    <  K  MI00 F    <  K  MI00 F    <   K  MI00 F    <p  , K  M 00 F    < . y K  MC00 F    <P ~ K  MF00 F    <  K  MP00 F    <0  G K  M200 F    < K K  M000 F    <  K  M000 F    <  K  M000 F    < |  [ QP40 F    <`   [ Qe40 F    <  >[ Qn40 F    <@ <_[ Qt40 F    < [x[ Qi40 F    < t[ Qu40 F    < [ Qm40 F    < +[ K .0 F w   <p -g[ Q440 F    < j[ K .0 F w   <P [ QC40 F    < [ QF40 F    <  b[ QP40 F    < g[ Q240 F    <4 [ Q040 F    < [ Q040 F    < ![[ Q040 F H   C l  K H   C   ! H   C Q I ~ H   C 6 o 3 H   C /dH   C :H   C YH   C (H   C dH   C '6`jH !  C UH "  C /H #  C    $  B2C-DEF2(% -2 @` a  %  B.C.DEF&..% &. @`] `  &  B CDEF"   @`  '  B.C.DEF%.. &% @`] ; i  (  BC DEF"   @`W ` e i  )  B2C,DEF (,2$ @` = i  *  B C DEF"   @` 8 E  +  B CDEF"   @` a i  ,  B0C-DEF0&% -0 @` 0 ' ]  -  B1C/DEF(/1% (/ @` . ]  .  B CDEF"   @` S  a  /  B1C-DEF# -1 (# @` 8  0  BC DEF"   @` . 8  1  B0C,DEF &,0$ @` ' 8  2  B C DEF"  @`     3  BCDEF"   @` 0 + 8  4  B2C-DEF2(% -2 @`h !  5  B.C.DEF&..% &. @`B p !  6  B CDEF"   @`h  r &  7  B.C.DEF%.. &% @`B p  8  BC DEF"   @`< J  9  B2C,DEF (,2$ @`h  :  B C DEF"   @`h r  ;  B CDEF"   @`  <  B2C+DEF2(# +2 @`M   =  B.C-DEF&-.# &- @`' U  >  B CDEF"   @`M W  ?  B.C/DEF%/. &% @`' y U  @  BC DEF"   @`! /  A  B2C-DEF (-2% @`M {   B  B C DEF"   @`M v W  C  B CDEF"   @`u  D  B2C,DEF2($ ,2 @` P |  E  B.C.DEF&..% &. @` N |  F  B CDEF"   @` s  G  B.C-DEF#-. &# @` + X  H  BC DEF"   @` N X  I  B2C,DEF (,2$ @` , X  J  B C DEF"   @` ( 4  K  B CDEF"   @` P X  L  B1C-DEF1&% -1 @`# P  M  B1C/DEF(/1% (/ @`~! P  N  B CDEF"   @`F T  O  B1C-DEF# -1 (# @`~+  P  BC DEF"   @`y! +  Q  B1C+DEF &+1# @` +  R  B C DEF"  @`  S  BCDEF"   @`# +  T  B1C-DEF1&% -1 @`'T U  B1C.DEF(.1% (. @`c&T V  B CDEF"   @`KY W  B1C.DEF% .1 (% @`c/ X  BC DEF"   @`^&l/ Y  B1C,DEF &,1$ @`/ Z  B C DEF"  @`  [  BCDEF"   @`'/ \  B2C-DEF2(% -2 @` ]  B/C.DEF&./% &. @` ^  B CDEF"   @` _  B/C,DEF#,/ &# @`k `  BC DEF"   @` a  B2C+DEF (+2# @`l b  B C DEF"  @`gt c  BCDEF"   @`  d  B2C+DEF2(# +2 @`rS~ e  B.C-DEF&-.# &- @`LQz~ f  B CDEF"   @`rt| g  B.C/DEF%/. &% @`L,z[ h  BC DEF"   @`FQT[ i  B2C-DEF (-2% @`r.[ j  B C DEF"   @`r)|6 k  BCDEF"   @`S[ l  B3C+DEF3(# +3 @` 87 m  B/C-DEF&-/$ &- @` 7 n  B CDEF"   @`.< o  B/C.DEF$ ./ &$ @` p  BC DEF"   @`  q  B3C-DEF (-3% @`8 r  B C DEF"  @` s  BCDEF"   @`- ; t  B2C-DEF2(% -2 @` u  B/C.DEF&./% &. @`v v  B CDEF"   @` w  B/C,DEF# ,/ &# @`v x  BC DEF"   @`q y  B2C+DEF (+2# @` z  B C DEF"  @` {  BCDEF"   @` |  B1C+DEF1&# +1 @`Ar }  B1C-DEF(-1# (- @`J ~  B CDEF"   @`AK   B1C/DEF% /1 (% @`_J   BC DEF"   @`"   B1C-DEF &-1% @`Aar   B C DEF"  @`A\Ki   BCDEF"   @`gu   B2C-DEF2(% -2 @`o4   B/C.DEF&./$ &. @`Ix4   B CDEF"   @`o*y9   B/C.DEF%./ &% @`Ix   BC DEF"   @`CQ   B2C-DEF (-2% @`o   B C DEF"  @`oy   BCDEF"   @`   B1C+DEF1&# +1 @`2C]   B1C,DEF(,1# (, @`1]   B CDEF"   @`Tb   B1C.DEF% .1 (% @` :   BC DEF"   @`1:   B1C-DEF &-1% @` C:   B C DEF"  @`    BCDEF"   @`82F:   B1C-DEF*# -1 (# @`u     B1C-DEF*(-1# (- @`u     B/C-DEF*# -/ &# @`  5    B/C-DEF*&-/# &- @`  5    B.C-DEF*#-. &# @`0 ^     B.C-DEF*&-.# &- @`0 ^     B/C-DEF*# -/ &# @`|    B/C-DEF*&-/# &- @`|    B/C,DEF*# ,/ &# @`    B/C,DEF*&,/# &, @`    B1C.DEF*% .1 (% @` p G    B1C.DEF*(.1% (. @` p G T K N ]   # K N ]    B7C3DEF733373 @`* ]    B7C3DEF733373 @`     B8C3DEF833383 @`1 i    B8C4DEF844484 @`x v    B7C3DEF733373 @` `    B8C3DEF833383 @` K N ~ [  ZPB jJ?"`  M,$D 0 SNote the relative positions of the CINT and CFP 2000 curves for the Pentium III & 4 T0( 2T  ZdG jJ?"` O- ,$D 0 Pentium III does better at the integer benchmarks, while Pentium 4 does better at the floating-point benchmarks due to its advanced SSE2 instructions 0( 2H  0޽h ? 33___PPT10.ۀ+&D' = @B D' = @BA?%,( < +O%,( < +D' =%(D' =%(D@' =A@BB BB0B%(D' =1:Bvisible*o3>+B#style.visibility<* %(D' =-o6Bdissolve*<3<* D' =%(D' =%(D@' =A@BB BB0B%(D' =1:Bvisible*o3>+B#style.visibility<* %(D' =-o6Bdissolve*<3<* +p+0+ 0 ++0+ 0 +}  0  $(   r  S S  `   r  S T  `m   H  0޽h ? 33___PPT10i.ހpA+D=' = @B +  0 â@bq 0 ?(  p ~ p s *Z  `   8 S 9 bq 6Sf Zq  Z[ 1?SM  PRelative Performance 0( 2H p  C j v H p  C 5 p H p  C p > H p  C a p g #H  p  C ) p 0 > H  p  C p #H  p  C p > H  p  C p #  p  <d F   M000 F  p  <Lj F   M.00 F  p  <o 'F X  M000 F  p  <t    M000 F  p  <dq    M.00 F  p  <w ' X  M200 F  p  <  `  M000 F  p  <x  `  M.00 F  p  < ' X`  M400 F  p  <X 9  M000 F  p  <ȓ 9  M.00 F  p  <8 'X9  M600 F  p  <  M000 F  p  <  M.00 F  p  < 'X M800 F  p  <  M100 F  p  <h  M.00 F  p  <ز 'X M000 F  p  <H X M100 F   p  < X M.00 F  !p  <( 'XX M200 F  "p  < 0 M100 F  #p  < 0 M.00 F  $p  <x '0X M400 F  %p  <  s M100 F  &p  <X  s M.00 F  'p  < ' Xs M600 F H (p  C 4p H )p  C j <v H *p  C F L H +p  C m s H ,p  C  % H -p  C H .p  C H /p  C H 0p  C H 1p  C X^H 2p  C 17H 3p  C u,f2q B 4p  3 g/p H 5p  C u,R q B 6p  3 R p H 7p  C u, H q B 8p  3  H p H 9p  C u, q B :p  3  p H ;p  C u,Xq B 

p  3 x p H ?p  C q B @p  3 p H Ap  C Sq B Bp  3 Rp H Cp  C u  q B Dp  3 u  p H Ep  C =  q B Fp  3 =  p H Gp  C q B Hp  3 p H Ip  C zq B Jp  3 xp H Kp  C g<q B Lp  3 =p H Mp  C gq B Np  3 p H Op  C g v q B Pp  3  t p H Qp  C g @ q B Rp  3  = p H Sp  C gb> q B Tp  3 c>p H Up  C g+q B Vp  3 +p H Wp  C p # Xp  <  $  MS00 F  Yp  <   [$  MP00 F  Zp  < f $  ME00 F  [p  <H  $  MC00 F  \p  < $  MI00 F  ]p  <T T$  MN00 F  ^p  < _ $  MT00 F  _p  < $  M200 F  `p  <t $  M000 F  ap  < F$  M000 F  bp  <!M ~$  M000 F  cp  <' $  MS00 F  dp  <  6$  MP00 F  ep  <|/A |$  ME00 F  fp  <@5 $  MC00 F  gp  <9 $  MF00 F  hp  <> P$  MP00 F  ip  < ~ $  MC00 F  qp  <0h $  MI00 F  rp  <l $  MN00 F  sp  <q ( $  MT00 F  tp  <u2 c $  M200 F  up  <yl $  M000 F  vp  <`~ $  M000 F  wp  <Ђ  $  M000 F  xp  <@G $  MS00 F  yp  < $  MP00 F  zp  <   $  ME00 F  {p  < Y $  MC00 F  |p  <d $  MF00 F  }p  <p $  MP00 F  ~p  < $  M200 F  p  <P% V$  M000 F  p  <] $  M000 F  p  <0 $  M000 F  p  < ;$  MS00 F  p  <F $  MP00 F  p  < $  ME00 F  p  < $  MC00 F  p  <` 7$  MI00 F  p  <8 x$  MN00 F  p  <@ $  MT00 F  p  < $  M200 F  p  <  .$  M000 F  p  <7 h$  M000 F  p  <r $  M000 F  p  <p $  MS00 F  p  < Z$  MP00 F  p  <Pe $  ME00 F  p  < $  MC00 F  p  <0 .$  MF00 F  p  <7 r$  MP00 F  p  <} $  M200 F  p  <h $  M000 F  p  <  #$  M000 F  p  <+ \$  M000 F H p  C j v  p  <VP MP00 F  p  <(ZV Me00 F  p  <`V Mn00 F  p  <!V Mt00 F  p  <&V Mi00 F  p  <@$VD Mu00 F  q  <*NV Mm00 F  q  <.V M 00 F  q  <42V MM00 F  q  <l<(VA M 00 F  q  <@FV M@00 F  q  <LEV M 00 F  q  <IV M100 F  q  <,NV. M.00 F  q  <R5Vf M600 F   q  < WsV M/00 F   q  <|[V M000 F   q  <_V M.00 F   q  <\dV  M600 F   q  <h/VG M 00 F  q  <q  <]+D=' = @B +5  0 ..PW7u 4.(  t ~ t s *| `    t  BljJ S" <$@ 0  *$`h,8 #|, 7u |#,BB t  3 7D/ +   t   B*C@DEF// $'' ******2'8!=@@@:5* * /58885!/!*!!!   ad@`@`   t   BC@DE8FB@@  @ @` \  t   B*CADEF))*8*AA>;63-("     $' '''$!"(-06 6 8*8TX@`  p t   @B*CADEF::00 68;;86!3!-!(%""""      !$ ' ''$!!'"*(*-*6$;AAA>80vx@`]( t   B-C>DE@FL>00($$(-(-0$0$>>( ((#$@`@`{ t   B*C@DEF'',, 2 58885!/!*!$ !'' $**'*/$8!:=@=:5,PT@` J t   XB*CCDEF<<*!   " $*$*-*2'8$>!@CC@;5-$ $' *-05 88;;;8!2!-!'" "'-{|@`@`|BB t 3 7D/1 T t  c $   B t  3    T t  c $$  B t  3 $  T t  c $ B t  3  T  t  c $$  B !t  3 $  T "t  c $  B #t  3   T $t  c $  B %t  3   T &t  c $VK  B 't  3 VK  T (t  c $VK  B )t  3 VK  T *t  c $R K  B +t  3 R K  T ,t  c $T K  B -t  3 T K  T .t  c $TK  B /t  3 TK  T 0t  c $PK  B 1t  3 PK  T 2t  c $V B 3t  3 V T 4t  c $K V B 5t  3 K V T 6t  c $ R B 7t  3  R T 8t  c $ K T B 9t  3  K T T :t  c $T B ;t  3 T T t  # / 1BB t  3 7D/ + <B u  # 1 2 <B u  # - . <B u  # - . <B u  # . 0<B u  # + , <B u  # + ,<B u  # / 1<B u  #  / <B  u  # K /L <B !u  #  / <B "u  # 8/9<B #u  # /<B $u  # #/$<B %u  # / &u   `jJ?S"`?# m  VRelative Energy Efficiency 0( 2 'u  <H"`G  MAlways on / maximum clock(2 (u  <"`6 & PLaptop mode / adaptive clock(2 )u  <"`< , MMinimum power / min clock(2 +u  <"`T  JBenchmark and power mode(2 ,u  <"`0 4  B SPECINT 2000 (2   -u  <|"`2  A SPECFP 2000 (2   .u  <4"`6 :  B SPECINT 2000 (2   /u  <p"`8 &  A SPECFP 2000 (2   0u  <"`< @  B SPECINT 2000 (2   1u  <"`> ,  A SPECFP 2000 (2  @ ` P 6u ; P 2u  Bx"`J / IPentium M @ 1.6/0.6 GHz 2T u  c $!B u  3 !T u  c $KB u  3 KT u  c $.B u  3 .B u  3 ` P 4u  B"`0J  KPentium 4-M @ 2.4/1.2 GHz 2 5u  B"`J @ MPentium III-M @ 1.2/0.8 GHz 2H t 0޽h ? a(___PPT10.w+n).DF' = @B D' = @BA?%,( < +O%,( < +D8' =%(D' =%(D@' =A@BB BB0B%(D' =1:Bvisible*o3>+B#style.visibility<*t %(D' =-o6Bdissolve*<3<*t D@' =A@BB BB0B%(D' =1:Bvisible*o3>+B#style.visibility<*t K%(D' =-o6Bdissolve*<3<*t K+8+0+t 0 +  0L0  5 `x x(  x " x  fxaxaG  ? `  ~ x s *l `   H x 0޽h ? U>=UU(___PPT10i.L`g+D=' = @B + 0   (      `y~~ ? T$K       T1 ?=?   H  0rllC ? a(80___PPT10.L`0g 0  (      T1 ?=?   K   `P1rP1r ?\    5Have them raise their hands when answering questions&6%0  6 6 @`H  0rllC ? a(80___PPT10.L`0g 0    (      `~~ ? T$K       T1 ?=?   H  0rllC ? a(80___PPT10.L` δg 0  @ (      `~~ ? T$K       T1 ?=?   H  0rllC ? a(80___PPT10.L` ]g 0  ` (      T1 ?vQ   9   `9~~ ? T$K  9  H  0rllC ? a(80___PPT10.L` ?g 0   (      ` 9~~ ? T$K  9     T1 ?=?   9H  0rllC ? a(80___PPT10.L`g 0   (      `~~ ? T$K  9     T1 ?=?   9H  0rllC ? a(80___PPT10.L`g 0   (      `9~~ ? T$K  9     T1 ?=?   9H  0rllC ? a(80___PPT10.L`kgb 0 " (      T1 ?   9  C x9nLnL ? P]K  9  H  0rllC ? X(=^ 0   (      `9~~ ? T$K  9     T1 ?=?   9H  0rllC ? a(80___PPT10.L`g 0  0, (  ,  ,  `$9~~ ? T$K  9   ,  T1 ?=?   9H , 0rllC ? a(80___PPT10.L`&gX 0 P4 (  4 p 4  01 ?}   9H 4 0rllC ? X(=^ 0 p< ;(  < g < S ~t+9NhNh1 ? W  9 {Book page 61 has an example to show that a machine with a bigger MIPS performance worse than a machine with a smaller MIPS | <  T1 ?wQ   9H < 0rllC ? a( 0  H (  H  H  `T29~~ ? T$K  9   H  T1 ?=?   9H H 0rllC ? a(80___PPT10.L`&g 0 X (  X  X  `79~~1 ? T$K  9   X  N1 ?vQ   9H X 0rllC ? a(80___PPT10.Op 0  ` (  `  `  `0=9~~ ? T$K  9   `  T1 ?=?   9H ` 0rllC ? a(80___PPT10.L`og 0  p| (  |  |  `B9~~ ? T$K  9   |  T1 ?=?   9H | 0rllC ? a(80___PPT10.L`gr jrҁ H|uPiԇ>Qkm@ 07lѢ֥h5ǵYUy^0|_1Oh+'0 `h|    PerformanceDr. Muhamed MudawarMuhamed Mudawar640Microsoft Office PowerPoint@@ @ И@19u Gg  L  y--$xx--'--$<<--'@BComic Sans MS-. 2 *4 Performance."System8-@Arial-. 2 GBCOE 308.-@Arial-. '2 P2Computer Architecture.-@Arial-. '2 Z.Prof. Muhamed Mudawar.-@Arial-. 62 f'Computer Engineering Department.-@Arial-. L2 o.King Fahd University of Petroleum and Minerals.-՜.+,0    }On-screen ShowKFUPM #ArialComic Sans MS WingdingsTimes New RomanSymbolBookman Old StyleDefault Design PerformanceWhat is Performance?Response Time and Throughput!Books Definition of Performance#What do we mean by Execution Time? Clock CyclesImproving Performance#Clock Cycles per Instruction (CPI)Performance Equation#Understanding Performance EquationUsing the Performance EquationDetermining the CPIExample on Determining the CPISecond Example on CPIMIPS as a Performance MeasureDrawbacks of MIPS MIPS exampleSolution to MIPS Example Amdahls LawExample on Amdahl's Law BenchmarksThe SPEC CPU2000 Benchmarks$SPEC 2000 Ratings (Pentium III & 4)Performance and PowerPerformance and PowerEnergy EfficiencyThings to RememberShl  Fonts UsedDesign Template Slide Titles Custom Shows'_0Muhamed MudawarMuhamed Mudawar  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghiklmnopqstuvwxy{|}~Root EntrydO)Current UserzSummaryInformation(jPowerPoint Document(DocumentSummaryInformation8r