# Single Cycle Processor Design

**COE 308** 

Computer Architecture
Prof. Muhamed Mudawar

Computer Engineering Department
King Fahd University of Petroleum and Minerals

#### Presentation Outline

- Designing a Processor: Step-by-Step
- Datapath Components and Clocking
- ❖ Assembling an Adequate Datapath
- Controlling the Execution of Instructions
- The Main Controller and ALU Controller
- Drawback of the single-cycle processor design

Single Cycle Processor Design

COE 308 - Computer Architecture

#### The Performance Perspective

- \* Recall, performance is determined by:
  - ♦ Instruction count
  - ♦ Clock cycles per instruction (CPI)
  - ♦ Clock cycle time



- Processor design will affect
  - ♦ Clock cycles per instruction
  - ♦ Clock cycle time
- Single cycle datapath and control design:
  - ♦ Advantage: One clock cycle per instruction
  - ♦ Disadvantage: long cycle time

Single Cycle Processor Design

 $COE\ 308-Computer\ Architecture$ 

© Muhamed Mudawar – slide 3

# Designing a Processor: Step-by-Step

- ❖ Analyze instruction set => datapath requirements
  - ♦ The meaning of each instruction is given by the register transfers
  - ♦ Datapath must include storage elements for ISA registers
  - ♦ Datapath must support each register transfer
- Select datapath components and clocking methodology
- Assemble datapath meeting the requirements
- Analyze implementation of each instruction
  - ♦ Determine the setting of control signals for register transfer
- Assemble the control logic

Single Cycle Processor Design

COE 308 - Computer Architecture

#### Review of MIPS Instruction Formats

- ❖ All instructions are 32-bit wide
- ❖ Three instruction formats: R-type, I-type, and J-type



- ♦ Op6: 6-bit opcode of the instruction
- ♦ Rs<sup>5</sup>, Rt<sup>5</sup>, Rd<sup>5</sup>: 5-bit source and destination register numbers
- ♦ sa<sup>5</sup>: 5-bit shift amount used by shift instructions
- → immediate<sup>16</sup>: 16-bit immediate value or address offset
- → immediate<sup>26</sup>: 26-bit target address of the jump instruction

Single Cycle Processor Design

COE 308 – Computer Architecture

© Muhamed Mudawar – slide 5

#### MIPS Subset of Instructions

- Only a subset of the MIPS instructions are considered
  - ♦ ALU instructions (R-type): add, sub, and, or, xor, slt
  - ♦ Immediate instructions (I-type): addi, slti, andi, ori, xori
  - ♦ Load and Store (I-type): Iw, sw
  - ♦ Branch (I-type): beq, bne
  - → Jump (J-type): j
- This subset does not include all the integer instructions
- But sufficient to illustrate design of datapath and control
- Concepts used to implement the MIPS subset are used to construct a broad spectrum of computers

Single Cycle Processor Design

COE 308 – Computer Architecture

#### Details of the MIPS Subset

| Instruction |                           | Meaning          |            | Format          |                 |                                    |   |      |
|-------------|---------------------------|------------------|------------|-----------------|-----------------|------------------------------------|---|------|
| add         | rd, rs, rt                | addition         | $op^6 = 0$ | rs <sup>5</sup> | rt <sup>5</sup> | rd <sup>5</sup>                    | 0 | 0x20 |
| sub         | rd, rs, rt                | subtraction      | $op^6 = 0$ | rs <sup>5</sup> | rt <sup>5</sup> | rd <sup>5</sup>                    | 0 | 0x22 |
| and         | rd, rs, rt                | bitwise and      | $op^6 = 0$ | rs <sup>5</sup> | rt <sup>5</sup> | rd <sup>5</sup> 0 0x2 <sup>4</sup> |   | 0x24 |
| or          | rd, rs, rt                | bitwise or       | $op^6 = 0$ | rs <sup>5</sup> | rt <sup>5</sup> | rd⁵                                | 0 | 0x25 |
| xor         | rd, rs, rt                | exclusive or     | $op^6 = 0$ | rs <sup>5</sup> | rt <sup>5</sup> | rd <sup>5</sup>                    | 0 | 0x26 |
| slt         | rd, rs, rt                | set on less than | $op^6 = 0$ | rs <sup>5</sup> | rt <sup>5</sup> | rd <sup>5</sup>                    | 0 | 0x2a |
| addi        | rt, rs, im <sup>16</sup>  | add immediate    | 80x0       | rs <sup>5</sup> | rt <sup>5</sup> | im <sup>16</sup>                   |   |      |
| slti        | rt, rs, im <sup>16</sup>  | slt immediate    | 0x0a       | rs <sup>5</sup> | rt <sup>5</sup> | im <sup>16</sup>                   |   |      |
| andi        | rt, rs, im <sup>16</sup>  | and immediate    | 0x0c       | rs <sup>5</sup> | rt <sup>5</sup> | im <sup>16</sup>                   |   |      |
| ori         | rt, rs, im <sup>16</sup>  | or immediate     | 0x0d       | rs <sup>5</sup> | rt <sup>5</sup> | im <sup>16</sup>                   |   |      |
| xori        | rt, im <sup>16</sup>      | xor immediate    | 0x0e       | rs <sup>5</sup> | rt <sup>5</sup> | im <sup>16</sup>                   |   |      |
| lw          | rt, im <sup>16</sup> (rs) | load word        | 0x23       | rs <sup>5</sup> | rt <sup>5</sup> | im <sup>16</sup>                   |   |      |
| sw          | rt, im <sup>16</sup> (rs) | store word       | 0x2b       | rs <sup>5</sup> | rt <sup>5</sup> | im <sup>16</sup>                   |   |      |
| beq         | rs, rt, im <sup>16</sup>  | branch if equal  | 0x04       | rs <sup>5</sup> | rt <sup>5</sup> | im <sup>16</sup>                   |   |      |
| bne         | rs, rt, im <sup>16</sup>  | branch not equal | 0x05       | rs <sup>5</sup> | rt <sup>5</sup> | im <sup>16</sup>                   |   |      |
| j           | im <sup>26</sup>          | jump             | 0x02       |                 |                 | im <sup>26</sup>                   |   |      |

Single Cycle Processor Design

COE 308 - Computer Architecture

© Muhamed Mudawar – slide 7

# Register Transfer Level (RTL)

- \* RTL is a description of data flow between registers
- RTL gives a meaning to the instructions
- ❖ All instructions are fetched from memory at address PC

#### Instruction RTL Description

```
ADD
                Reg(Rd) \leftarrow Reg(Rs) + Reg(Rt);
                                                                       PC \leftarrow PC + 4
                                                                       PC \leftarrow PC + 4
SUB
                Reg(Rd) \leftarrow Reg(Rs) - Reg(Rt);
                Reg(Rt) \leftarrow Reg(Rs) \mid zero\_ext(Im16);
                                                                       PC \leftarrow PC + 4
ORI
                Reg(Rt) \leftarrow MEM[Reg(Rs) + sign\_ext(Im16)]; PC \leftarrow PC + 4
LW
                                                                       PC ← PC + 4
SW
                MEM[Reg(Rs) + sign\_ext(Im16)] \leftarrow Reg(Rt);
BEQ
                if (Reg(Rs) == Reg(Rt))
                       PC \leftarrow PC + 4 + 4 \times sign extend(Im16)
                else PC \leftarrow PC + 4
```

Single Cycle Processor Design

COE 308 – Computer Architecture

## Instructions are Executed in Steps

❖ R-type Fetch instruction: Instruction ← MEM[PC]

Fetch operands: data1  $\leftarrow$  Reg(Rs), data2  $\leftarrow$  Reg(Rt) Execute operation: ALU\_result  $\leftarrow$  func(data1, data2)

Write ALU result: Reg(Rd) ← ALU\_result

Next PC address:  $PC \leftarrow PC + 4$ 

❖ I-type Fetch instruction: Instruction ← MEM[PC]

Fetch operands: data1 ← Reg(Rs), data2 ← Extend(imm16)

 $\label{eq:local_equation} \textbf{Execute operation:} \qquad \textbf{ALU\_result} \leftarrow op(data1,\,data2)$ 

Write ALU result: Reg(Rt) ← ALU\_result

Next PC address: PC ← PC + 4

**♦ BEQ** Fetch instruction: Instruction ← MEM[PC]

Fetch operands:  $data1 \leftarrow Reg(Rs)$ ,  $data2 \leftarrow Reg(Rt)$ Equality:  $zero \leftarrow subtract(data1, data2)$ 

Branch: if (zero)  $PC \leftarrow PC + 4 + 4 \times sign\_ext(imm16)$ 

else  $PC \leftarrow PC + 4$ 

Single Cycle Processor Design

COE 308 - Computer Architecture

© Muhamed Mudawar – slide 9

# Instruction Execution - cont'd

**♦ LW** Fetch instruction: Instruction ← MEM[PC]

Fetch base register: base ← Reg(Rs)

Calculate address: address ← base + sign\_extend(imm16)

Read memory:  $data \leftarrow MEM[address]$ Write register Rt:  $Reg(Rt) \leftarrow data$ Next PC address:  $PC \leftarrow PC + 4$ 

**♦ SW** Fetch instruction: Instruction ← MEM[PC]

Fetch registers: base  $\leftarrow$  Reg(Rs), data  $\leftarrow$  Reg(Rt)
Calculate address: address  $\leftarrow$  base + sign extend(imm16)

Write memory: MEM[address] ← data

Next PC address: PC ← PC + 4

Jump Fetch instruction:

Instruction ← MEM[PC]

concatenation

Target PC address: target ← PC[31:28], Imm26, '00'

Jump: PC ← target

Single Cycle Processor Design

COE 308 - Computer Architecture

# Requirements of the Instruction Set

- Memory
  - ♦ Instruction memory where instructions are stored
  - ♦ Data memory where data is stored
- Registers
  - ♦ 32 × 32-bit general purpose registers, R0 is always zero
  - ♦ Read source register Rs
  - ♦ Read source register Rt
  - ♦ Write destination register Rt or Rd
- Program counter PC register and Adder to increment PC
- Sign and Zero extender for immediate constant
- ALU for executing instructions

Single Cycle Processor Design

COE 308 - Computer Architectur

© Muhamed Mudawar – slide 11

#### Next ...

- Designing a Processor: Step-by-Step
- Datapath Components and Clocking
- Assembling an Adequate Datapath
- Controlling the Execution of Instructions
- The Main Controller and ALU Controller
- Drawback of the single-cycle processor design

Single Cycle Processor Design

COE 308 - Computer Architecture





# MIPS Register File

- ❖ Register File consists of 32 × 32-bit registers
  - ♦ BusA and BusB: 32-bit output busses for reading 2 registers
  - → BusW: 32-bit input bus for writing a register when RegWrite is 1
  - ♦ Two registers read and one written in a cycle
- \* Registers are selected by:
  - ♦ RA selects register to be read on BusA
  - ♦ RB selects register to be read on BusB
  - ♦ RW selects the register to be written
- Clock input
  - ♦ The clock input is used ONLY during write operation
  - ♦ During read, register file behaves as a combinational logic block
    - RA or RB valid => BusA or BusB valid after access time

Single Cycle Processor Design

COE 308 - Computer Architecture

© Muhamed Mudawar – slide 15

**1**32

Register

File



#### Tri-State Buffers Allow multiple sources to drive a single bus ❖ Two Inputs: Enable ♦ Data signal (data\_in) ♦ Output enable Data\_in Data\_out One Output (data out): ♦ If (Enable) Data\_out = Data\_in else Data\_out = High Impedance state (output is disconnected) Data\_0 -Tri-state buffers can be Output used to build multiplexors Select Single Cycle Processor Design © Muhamed Mudawar – slide 17



# Instruction and Data Memories

- Instruction memory needs only provide read access
  - ♦ Because datapath does not write instructions
  - ♦ Behaves as combinational logic for read
  - ♦ Address selects Instruction after access time
- Data Memory is used for load and store
  - ♦ MemRead: enables output on Data out
    - Address selects the word to put on Data\_out
  - ♦ MemWrite: enables writing of Data in
    - Address selects the memory word to be written
    - The Clock synchronizes the write operation
- Separate instruction and data memories
  - ♦ Later, we will replace them with caches

Single Cycle Processor Design COE 308 – Computer Architecture

© Muhamed Mudawar – slide 19

Instruction Memory

> Data Memory

# Clocking Methodology

- Clocks are needed in a sequential logic to decide when a state element (register) should be updated
- To ensure correctness, a clocking methodology defines when data can be written and read
- Combinational logic

  rising edge

  Single Cycle Processor Design

  COE 308 Computer Architecture
- We assume edgetriggered clocking
- All state changes occur on the same clock edge
- Data must be valid and stable before arrival of clock edge
- Edge-triggered clocking allows a register to be read and written during same clock cycle

# Determining the Clock Cycle

With edge-triggered clocking, the clock cycle must be long enough to accommodate the path from one register through the combinational logic to another register



- T<sub>clk-q</sub>: clock to output delay through register
- T<sub>max comb</sub>: longest delay through combinational logic
- T<sub>s</sub>: setup time that input to a register must be stable before arrival of clock edge
- T<sub>h</sub>: hold time that input to a register must hold after arrival of clock edge
- ❖ Hold time (T<sub>h</sub>) is normally satisfied since  $T_{clk-q} > T_h$

Single Cycle Processor Design

© Muhamed Mudawar – slide 21

### Clock Skew

- Clock skew arises because the clock signal uses different paths with slightly different delays to reach state elements
- Clock skew is the difference in absolute time between when two storage elements see a clock edge
- With a clock skew, the clock cycle time is increased

$$T_{\text{cycle}} \ge T_{\text{clk-q}} + T_{\text{max\_combinational}} + T_{\text{setup}} + T_{\text{skew}}$$

Clock skew is reduced by balancing the clock delays

Single Cycle Processor Design

COE 308 – Computer Architecture

### Next...

- Designing a Processor: Step-by-Step
- Datapath Components and Clocking
- Assembling an Adequate Datapath
- Controlling the Execution of Instructions
- The Main Controller and ALU Controller
- Drawback of the single-cycle processor design

Single Cycle Processor Design COE 308 – Computer Architecture © Muhamed Mudawar – slide 23

#### Instruction Fetching Datapath ❖ We can now assemble the datapath from its components ❖ For instruction fetching, we need ... ♦ Program Counter (PC) register ♦ Instruction Memory Improved datapath increments upper ♦ Adder for incrementing PC 30 bits of PC by 1 The least significant 2 bits of the PC are '00' since **Improved** PC is a multiple of 4 **Datapath** Instruction Instruction Datapath does not handle branch or Instruction jump instructions Memory Single Cycle Processor Design COE 308 – Computer Architecture © Muhamed Mudawar – slide 24





#### Combining R-type & I-type Datapaths Another mux Instruction Registers selects 2nd ALU Memory input as either source register Rt data on BusB or the extended immediate A mux selects RW Extender as either Rt or Rd Control signals ♦ ALUCtrl is derived from either the Op or the funct field ♦ RegWrite enables the writing of the ALU result → ExtOp controls the extension of the 16-bit immediate ♦ RegDst selects the register destination as either Rt or Rd ♦ ALUSrc selects the 2<sup>nd</sup> ALU source as BusB or extended immediate Single Cycle Processor Design COE 308 – Computer Architecture © Muhamed Mudawar – slide 27



# Details of the Extender

- Two types of extensions
  - → Zero-extension for unsigned constants
  - ♦ Sign-extension for signed constants
- Control signal ExtOp indicates type of extension
- Extender Implementation: wiring and one AND gate

















#### Next...

- Designing a Processor: Step-by-Step
- Datapath Components and Clocking
- Assembling an Adequate Datapath
- Controlling the Execution of Instructions
- The Main Controller and ALU Controller
- Drawback of the single-cycle processor design

Single Cycle Processor Design COE 308 – Computer Architecture © Muhamed Mudawar – slide 37





| Main Control Signals                                                                |                                                                      |                                                             |  |  |  |  |  |  |
|-------------------------------------------------------------------------------------|----------------------------------------------------------------------|-------------------------------------------------------------|--|--|--|--|--|--|
| Signal                                                                              | Effect when '0'                                                      | Effect when '1'                                             |  |  |  |  |  |  |
| RegDst                                                                              | Destination register = Rt                                            | Destination register = Rd                                   |  |  |  |  |  |  |
| RegWrite                                                                            | None                                                                 | Destination register is written with the data value on BusW |  |  |  |  |  |  |
| ExtOp                                                                               | 16-bit immediate is zero-extended                                    | 16-bit immediate is sign-extended                           |  |  |  |  |  |  |
| ALUSrc                                                                              | Second ALU operand comes from the second register file output (BusB) | Second ALU operand comes from the extended 16-bit immediate |  |  |  |  |  |  |
| MemRead                                                                             | None                                                                 | Data memory is read<br>Data_out ← Memory[address]           |  |  |  |  |  |  |
| MemWrite                                                                            | None                                                                 | Data memory is written<br>Memory[address] ← Data_in         |  |  |  |  |  |  |
| MemtoReg                                                                            | BusW = ALU result                                                    | BusW = Data_out from Memory                                 |  |  |  |  |  |  |
| Beq, Bne                                                                            | PC ← PC + 4                                                          | PC ← Branch target address<br>If branch is taken            |  |  |  |  |  |  |
| J                                                                                   | PC ← PC + 4                                                          | PC ← Jump target address                                    |  |  |  |  |  |  |
| ALUOp This multi-bit signal specifies the ALU operation as a function of the opcode |                                                                      |                                                             |  |  |  |  |  |  |

# Main Control Signal Values

| Ор     | Reg<br>Dst | Reg<br>Write | Ext<br>Op | ALU<br>Src | ALU<br>Op | Beq | Bne | J | Mem<br>Read | Mem<br>Write | Mem<br>toReg |
|--------|------------|--------------|-----------|------------|-----------|-----|-----|---|-------------|--------------|--------------|
| R-type | 1 = Rd     | 1            | Х         | 0=BusB     | R-type    | 0   | 0   | 0 | 0           | 0            | 0            |
| addi   | 0 = Rt     | 1            | 1=sign    | 1=lmm      | ADD       | 0   | 0   | 0 | 0           | 0            | 0            |
| slti   | 0 = Rt     | 1            | 1=sign    | 1=lmm      | SLT       | 0   | 0   | 0 | 0           | 0            | 0            |
| andi   | 0 = Rt     | 1            | 0=zero    | 1=lmm      | AND       | 0   | 0   | 0 | 0           | 0            | 0            |
| ori    | 0 = Rt     | 1            | 0=zero    | 1=lmm      | OR        | 0   | 0   | 0 | 0           | 0            | 0            |
| xori   | 0 = Rt     | 1            | 0=zero    | 1=lmm      | XOR       | 0   | 0   | 0 | 0           | 0            | 0            |
| lw     | 0 = Rt     | 1            | 1=sign    | 1=lmm      | ADD       | 0   | 0   | 0 | 1           | 0            | 1            |
| SW     | Х          | 0            | 1=sign    | 1=lmm      | ADD       | 0   | 0   | 0 | 0           | 1            | х            |
| beq    | Х          | 0            | Х         | 0=BusB     | SUB       | 1   | 0   | 0 | 0           | 0            | х            |
| bne    | Х          | 0            | Х         | 0=BusB     | SUB       | 0   | 1   | 0 | 0           | 0            | х            |
| j      | х          | 0            | х         | х          | х         | 0   | 0   | 1 | 0           | 0            | х            |

❖ X is a don't care (can be 0 or 1), used to minimize logic

Single Cycle Processor Design COE 308 – Computer Architecture © Muhamed Mudawar – slide 41



RegDst <= R-type

RegWrite  $\leq (\overline{sw + beq + bne + j})$ 

ExtOp  $\leftarrow$  (andi + ori + xori)

ALUSrc  $\langle (R-type + beq + bne) \rangle$ 

MemRead <= Iw

MemWrite <= sw

MemtoReg <= Iw



Single Cycle Processor Design

COE 308 - Computer Architecture

#### ALU Control Truth Table

| 0-6             | А                           | 4-bit |         |          |  |  |
|-----------------|-----------------------------|-------|---------|----------|--|--|
| Op <sup>6</sup> | ALUOp funct <sup>6</sup> AL |       | ALUCtrl | Encoding |  |  |
| R-type          | R-type                      | add   | ADD     | 0000     |  |  |
| R-type          | R-type                      | sub   | SUB     | 0010     |  |  |
| R-type          | R-type                      | and   | AND     | 0100     |  |  |
| R-type          | R-type                      | or    | OR      | 0101     |  |  |
| R-type          | R-type                      | xor   | XOR     | 0110     |  |  |
| R-type          | R-type                      | slt   | SLT     | 1010     |  |  |
| addi            | ADD                         | х     | ADD     | 0000     |  |  |
| slti            | SLT                         | х     | SLT     | 1010     |  |  |
| andi            | AND                         | х     | AND     | 0100     |  |  |
| ori             | OR                          | х     | OR      | 0101     |  |  |
| xori            | XOR                         | х     | XOR     | 0110     |  |  |
| lw              | ADD                         | х     | ADD     | 0000     |  |  |
| SW              | sw ADD                      |       | ADD     | 0000     |  |  |
| beq             | SUB                         | х     | SUB     | 0010     |  |  |
| bne             | SUB                         | Х     | SUB     | 0010     |  |  |
| j               | Х                           | х     | х       | Х        |  |  |

The 4-bit encoding for ALUctrl is chosen here to be equal to the last 4 bits of the function field

Other binary
encodings are also
possible. The idea is
to choose a binary
encoding that will
minimize the logic for
ALU Control

Single Cycle Processor Design

COE 308 - Computer Architecture

© Muhamed Mudawar – slide 43

#### Next ...

- Designing a Processor: Step-by-Step
- Datapath Components and Clocking
- Assembling an Adequate Datapath
- Controlling the Execution of Instructions
- The Main Controller and ALU Controller
- Drawback of the single-cycle processor design

Single Cycle Processor Design

COE 308 – Computer Architecture



# Multicycle Implementation

- Break instruction execution into five steps
  - ♦ Instruction fetch
  - ♦ Instruction decode and register read
  - ♦ Execution, memory address calculation, or branch completion
  - ♦ Memory access or ALU instruction completion
  - ♦ Load instruction completion
- One step = One clock cycle (clock cycle is reduced)
  - ♦ First 2 steps are the same for all instructions

| Instruction | # cycles | Instruction | # cycles |
|-------------|----------|-------------|----------|
| ALU & Store | 4        | Branch      | 3        |
| Load        | 5        | Jump        | 2        |

Single Cycle Processor Design COE 308 – Computer Architecture © Muhamed Mudawar – slide 46

### Performance Example

- Assume the following operation times for components:
  - ♦ Instruction and data memories: 200 ps
  - ♦ ALU and adders: 180 ps
  - ♦ Decode and Register file access (read or write): 150 ps
  - ♦ Ignore the delays in PC, mux, extender, and wires
- Which of the following would be faster and by how much?
  - ♦ Single-cycle implementation for all instructions
  - ♦ Multicycle implementation optimized for every class of instructions
- ❖ Assume the following instruction mix:
  - ♦ 40% ALU, 20% Loads, 10% stores, 20% branches, & 10% jumps

Single Cycle Processor Design

COE 308 – Computer Architecture

© Muhamed Mudawar – slide 47

#### Solution

| Instruction<br>Class | Instruction<br>Memory | Register<br>Read | ALU<br>Operation | Data<br>Memory | Register<br>Write | Total  |
|----------------------|-----------------------|------------------|------------------|----------------|-------------------|--------|
| ALU                  | 200                   | 150              | 180              |                | 150               | 680 ps |
| Load                 | 200                   | 150              | 180              | 200            | 150               | 880 ps |
| Store                | 200                   | 150              | 180              | 200            |                   | 730 ps |
| Branch               | 200                   | 150              | 180              |                | ·                 | 530 ps |
| Jump                 | 200                   | 150 ←            | decode           | and update F   | C                 | 300 ps |

- For fixed single-cycle implementation:
  - ♦ Clock cycle = 880 ps determined by longest delay (load instruction)
- For multi-cycle implementation:
  - ♦ Clock cycle = max (200, 150, 180) = 200 ps (maximum delay at any step)
  - $\Rightarrow$  Average CPI = 0.4×4 + 0.2×5 + 0.1×4+ 0.2×3 + 0.1×2 = 3.8
- ❖ Speedup = 880 ps / (3.8 × 200 ps) = 880 / 760 = 1.16

Single Cycle Processor Design

COE 308 - Computer Architecture



# Worst Case Timing - Cont'd

- Long cycle time: must be long enough for Load operation PC's Clk-to-Q
  - + Instruction Memory's Access Time
  - + Maximum of (

Register File's Access Time,

Delay through control logic + extender + ALU mux)

- + ALU to Perform a 32-bit Add
- + Data Memory Access Time
- + Delay through MemtoReg Mux
- + Setup Time for Register File Write + Clock Skew
- Cycle time is longer than needed for other instructions
  - ♦ Therefore, single cycle processor design is not used in practice

 $Single\ Cycle\ Processor\ Design \\ COE\ 308-Computer\ Architecture \\ @\ Muhamed\ Mudawar-slide\ 50$ 

# Summary

- 5 steps to design a processor
  - ♦ Analyze instruction set => datapath requirements
  - ♦ Select datapath components & establish clocking methodology
  - ♦ Assemble datapath meeting the requirements
  - ♦ Analyze implementation of each instruction to determine control signals
  - ♦ Assemble the control logic
- MIPS makes Control easier
  - ♦ Instructions are of same size
  - ♦ Source registers always in same place
  - ♦ Immediates are of same size and same location
  - ♦ Operations are always on registers/immediates
- ❖ Single cycle datapath => CPI=1, but Long Clock Cycle

Single Cycle Processor Design

COE 308 - Computer Architecture