Multiple Cycle CPU
as its name implies, the multiple cycle cpu requires multiple cycles to execute a single instruction. this means that our cpi will be greater than 1.
the big advantage of the multi-cycle design is that we can use more or less cycles to execute each instruction, depending on the complexity of the instruction. for example, we can take five cycles to execute a load instruction, but we can take just three cycles to execute a branch instruction. the big disadvantage of the multi-cycle design is increased complexity. control is now a finite state machine - before it was just combinational logic.
another important difference between the single-cycle design and the multi-cycle design is the cycle time. in the single cycle processor, the cycle time was determined by the slowest instruction. in the multi-cycle design, the cycle time is determined by the slowest functional unit [memory, registers, alu]. this greatly reduces our cycle time.
this outline describes all the things that happen on various cycles in our multi-cycle cpu. all the events described in each numbered item take place in one clock cycle.
1) instruction fetch: load ir with instruction at pc, load pc with pc + 4. If its not a branch instruction, pc+4 will be used to fetch instruction in the next clock cycle. If its a branch instruction, taken or non-taken will be resolved after after 2 clock cycles (after ALU). At the 3th cycle, pc+4 or branch address will be selected to fetch the instruction.
2) instruction decode, read registers: parse the instruction, load registers A and B with values from the register file, load aluout with the target address of the branch
3) execute: if executing a load or store, perform the effective address computation and put the result in aluout. for arithmetic instructions, load aluout with the result of the appropriate computation. for a beq instruction, if the result of register A is zero, load pc with the value in aluout. if executing a beq, we are done - return to step 1
4) memory: if executing a load, load mdr with the data at address aluout. if executing a store, write the data in register b into memory at address aluout. if executing an arithmetic instruction, write the value in aluout into the register file. if executing a store or an arithmetic instruction, we are done - return to step 1
5) writeback: if we are here, we are executing a load instruction. write the value in mdr into the register file, and return to step 1
Ideally, for an n stages pipeline, the performance improvement is n times than non-pipeline structure. However, mostly it can not be achieved, because:
1) uneven pipeline stages: we can hardly separate task to stages with the same execution time.
2) hazards: so we can't always let the pipeline full.
3) latch overhead
http://www.cs.umass.edu/~weems/CmpSci535/Discussion14.html — instruction fetch
http://cseweb.ucsd.edu/~j2lau/cs141/week4.html — pipeline & multiple cycle