Instruction Sets
Schematic diagram of a simple computer
Execution Cycle
FetchAn instruction, stored in the memory, is fetched into the control unit by supplying the memory with the address of the instruction.
Decode
The control unit decodes the instruction in order to find the sequence of operation necessary to execute it.
Memory
Any data necessary for an instruction is fetched from the memory by the control unit and stored in the datapath.
Execute
The operation is performed within the datapath.
Write
The result of the operation is possibly written to the memory.
Example Instruction
ADD [0],[1],[2] ; Add the contents of memory location 1 to location 2 and put the result in location 0.Execution Sequence
1. Address of the instruction is sent to memory with a READ control signal2. Instruction is fetched into control unit
3. Instruction is decoded.
4. Address 1 sent to memory with READ signal
5. Data fetched from memory location 1 into the datapath
6. Address 2 sent to memory with READ signal
7. Data fetched from memory location 2 into the datapath
8. ADD operation is sent to ALU in datapath
9. Address 0 sent to memory with WRITE signal
10. Result sent from datapath to memory and written to location 0.
What is an instruction set
Machine Language is very specific to a certain type of CPU. Each CPU will have a certain repertoire of instructions that it can decode and execute. This is called the instruction set of the CPU or the Instruction Set Architecture [ISA].Information which must be present in an instruction
Operatione.g. ADD, SUBTRACT etc.
Where to get the data
Memory addresses, or the data may be in
the datapath already, or no data may be needed. e.g. HALT instruction.
Where to put the result
A memory address or perhaps temporarily within the datapath.
Where to find the next instruction
An address or, to allow decisions to be made, a condition and a choice of addresses.
Parts of an Instruction
OpcodeThe operation itself is usually represented by a code called the opcode [for OPeration CODE]
Operands
All the
other parts of an instruction are called operands.
Addresses
Some of the operands may be the actual addresses of data in memory.
Example
ADD AX,[102]
- The opcode is ADD
- There are two operands, AX and [102]
- This instruction contains one address - 102
Format used to represent an instruction in Assembly Language
There is no standard format for the description of instructions, sometimes even for the same CPU. The opcode will almost certainly come first; however some Assembly Languages have the destination operand first and some have it last.Types of instruction
Machines with a single type of instructionIn order to put all the information necessary into a single instruction it must have the following format
Opcode | Destination | Source1 | Source2 | Condition | Next/True | Next/False |
Code | Address | Address | Address | Code | Address | Address |
Example
SUB [0],[1],[2],NZ,4,5
Means
Subtract the value stored in location 2 from that in location 1 and store it in location 0. If the result is not zero [NZ] then get the next instruction from location 4 otherwise get it from location 5.
Disadvantages of using a single type of instruction
In practice the codes in an instruction [opcode and condition] may be fairly small e.g. 2..8
bits. However, if the instruction is to be able to reference large quantities of data then the addresses must be large e.g. 16..32 bits. If the above instruction were to use 6 bits for the opcode, 4 bits for the condition code and 16 bits for each address then it would have to be 90 bits long.
Some very early computers had instructions such as this, but not modern machines.
Three address machines
Not all instructions need to change the order of instruction execution, so Next/False can be a default:Assume that Next/False is the next sequential instruction
We now need a special storage location inside the CPU to store the default address of the next instruction. This is called the program counter [PC], the instruction pointer [IP], or the instruction address register [IAR]. We will use PC as the name for this register.
Assume most instructions will be sequential
Split the instruction into two types
Type 1 will be a three address instruction [ALU operations], type 2 will be a one address instruction [Control instructions].
Type 1 Instructions: ALU operations
Opcode | Destination | Source1 | Source2 |
Code | Address | Address | Address |
Opcode | Condition | Next/True |
Code | Code | Address |
Add Flags to the Datapath
There must now be a way of holding information about the previous operation. This is usually done by having a special set of single bit storage locations inside the datapath called the flags. The flags are set by certain instructions, e.g. the SUBTRACT operation may set a flag called the Zero Flag if the result of the subtraction is
zero. This can be used to find out if two numbers are equal [If A=B then A-B=0].
Example
In order to perform the same operation we need two instructions:
SUB [0],[1],[2]
JNZ 4
Instruction which would have been at address 5 goes here
Advantages and Disadvantages of 3 address machines
Now to perform the same operation we need two instructions, so the machine may be slower, however such operations are rare and so the speed decrease will be offset
by the fact that programs are now shorter.
Two Address Machines
Three address instructions are still very long, but they may be made shorter:Assume that the destination is the same as one of the sources
This is often the case, but where it is not, an extra operation is necessary -MOV.
Define an instruction called MOV [Transfer instruction]
If an opcode called MOV is defined, which moves its source to its destination, then any operation may be
performed.
All the operations are between operands which reference memory, these are called Memory-Memory Instructions
Type 1 Instructions: Memory-Memory ALU operations [including MOV]
Opcode | Destination/Source1 | Source2 |
Code | Address | Address |
Opcode | Condition | Next/True |
Code | Code | Address |
Now in order to perform the same operation we need three instructions:
MOV [0],[1]
SUB [0],[2]
JNZ 4
Advantages and Disadvantages of 2 address machines
The machine may be slower for this example but not in all cases. Programs are now even shorter.
One Address Machines
Two address instructions may be made even shorter:Allow an operand to be a code which represents a temporary location within the datapath.
These temporary locations are called registers. If there is only one it is usually called the accumulator. Registers are usually referred to by a symbolic name e.g. A,B,AX,R1,R2 etc.
Operations may now be from memory to a register [Memory-Register operations] or from a register to memory [Register-Memory operations].
Type 1 Instructions: Register-Memory ALU operations
Opcode | Destination/Source1 | Source2 |
Code | Address | Register |
Opcode | Condition | Next/True |
Code | Code | Address |
Opcode | Destination/Source1 | Source2 |
Code | Register | Address |
Opcode | Destination/Source1 | Source2 |
Code | Address | Register |
Now, in order to perform the same operation we need four instructions:
MOV A,[1]
SUB A,[2]
MOV [0],A
JNZ 4
Advantages and Disadvantages of 1 address machines
The machine will be slower in this case but not in all cases. Programs are now even shorter. The registers may be used for temporary results which are not needed immediately or for holding frequently used operands e.g. the end count in a "for" loop.
Zero Address Machines
No machine can have only instructions with no addresses. However it is possible have all operations except those which get data from memory and those which put data into the memory as zero address instructions. There are two common ways of achieving this.1. Allow all operands to be registers.
Register-Register operations and transfers are now possible.
Type 1 Instructions: Register-Register ALU operations
Opcode | Destination/Source1 | Source2 |
Code | Register | Register |
Opcode | Condition | Next/True |
Code | Code | Address |
LOAD | Destination | Source |
Code | Register | Address |
STORE | Destination | Source |
Code | Address | Register |
Now in order to perform the same operation we need five instructions:
Often this type of machine is referred to as a load/store machine because the only instructions with addresses are the load and store instructions.
Some machines allow three operand zero address instructions, this reduces the number of MOV instructions.
2. Assume all operations implicitly use a stack.
Define two instructions PUSH and POP which can be used to move the data on the top of the stack to and from the memory. ALU operations will always use the top two words on the stack for sources and put the result on the top of the stack.
Type 1 Instructions: Stack ALU operations
Type 2 Instructions: Control instructionsOpcode | Condition | Next/True |
Code | Code | Address |
Example
Now in order to perform the same operation we need five instructions:
PUSH [1]
PUSH [2]
SUB
POP [0]
JNZ 4
This is sometimes called a stack machine, and such machines do exist e.g. INMOS TRANSPUTER, but they are rare.
Advantages and Disadvantages of zero address machines
Programs may be shorter, but for the stack machine there
may be extra overhead incurred by having to manipulate the stack.
The use of a stack allows the storage of a large number of temporary results which may not fit in the registers of a machine.
A stack allows recursive functions to be defined.
Comparison of three, two, one and zero address machines.
A good way of looking at the effectiveness of these types of instruction, is to look at the number of memory accesses necessary for the evaluation of a simple expression. The memory accesses will be split into instruction fetches and data accesses [since data accesses are often slower].Example Expression
X=Y*[Y+Z]
Where X,Y and Z are stored in memory locations 0,1 and 2
Instructions | Instruction Fetches | Data Accesses |
Three address: ADD [0],[1],[2] MULT [0],[0],[1] | 2 | 6 |
Two Address: MOV [0],[1] ADD [0],[2] MULT [0],[1] | 3 | 6 |
One Address: MOV A,[1] ADD A,[2] MULT A,[2] MOV [0],A | 4 | 4 |
Load/Store: MOV A,[1] MOV B,[2] ADD A,B MULT A,B MOV [0],A | 5 | 3 |
Stack: LD [1] DUP LD [2] ADD MULT ST [0] | 6 | 3 |
In practice machines may allow a variety of different types of instruction. For example the Pentium is a one address machine, but it also has zero address register-register instructions, and some stack operations.
Summary
Instructions may be classified by the number of operands and the number of addresses which they use.Instructions are executed sequentially using a Program Counter to hold the address of the next instruction. Control instructions are used to change the program counter based on flags set by a previous instruction.
Transfer instructions may be used to move data without performing any operation on it.
Registers may be used to hold temporary results or frequently used operands.
Sometimes stack operations are useful for storing temporary data which may not fit in registers. A stack is also useful for implementing recursion.
Addressing modes
Direct AddressingSo far we have assumed that all references to memory are to explicitly stated memory locations. This way of referencing memory is called Direct addressing. It is possible to write programs which only use direct addressing, but such programs have a serious drawback - a program must be allowed to change its own code.
Example program using only
Direct Addressing
Imagine a machine with two byte instructions, where the first byte is the opcode and the second is an 8 bit address.
Machine Instruction Set
Operation | Code |
MOV A,[address] | 00 |
MOV [address],A | 01 |
ADD A,[address] | 02 |
SUB A,[address] | 03 |
JMPNZ address | 04 |
HALT | 05 |
Add one to each number in a table starting at a given address, with a given length.
Pseudocode
The following pseudo PASCAL program would perform the task [assume M is the memory and start and num are the start address in memory and the number of entries in the table]
procedure add_one[start:byte,num:byte];
var ctr,address:byte;
begin
ctr:=num;
address:=start;
repeat
M[address]:=M[address]+1;
address:=address+1;
ctr:=ctr-1
until ctr=0
end;
Assembly Language Program:
Assume that start and num have been previously stored in two memory locations.
The following program will perform add_one on the Direct Addressing machine.
Address | Opcode | Operand | Assembly Language | Pseudo Code |
00 | 00 | 22 | MOV A,[num] | |
02 | 01 | 23 | MOV [ctr],A | ctr:=num |
04 | 00 | 24 | MOV A,[start] | |
06 | 01 | 0B | MOV [get+1],A | address:=start |
08 | 01 | 0F | MOV [put+1],A | repeat |
0A | 00 | 00 | get: MOV A,[0] | temp:=M[address] |
0C | 02 | 25 | ADD A,[one] | temp:=temp+1 |
0E | 01 | 00 | put: MOV [0],A | M[address]:=temp |
10 | 00 | 0B | MOV A,[get+1] | temp:=address |
12 | 02 | 25 | ADD A,[one] | temp:=temp+1 |
14 | 01 | 0B | MOV [get+1],A | |
16 | 01 | 0F | MOV [put+1],A | address:=temp |
18 | 00 | 23 | MOV A,[ctr] | |
1A | 03 | 25 | SUB A,[one] | ctr:=ctr-1 |
1C | 01 | 23 | MOV [ctr],A | |
1E | 04 | 0A | JMP NZ,get | until ctr=0 |
20 | 05 | 00 | HALT | stop |
22 | 0A | 00 | num: DB 10 ctr: DB 0 | byte num byte ctr |
24 | 64 | 01 | start: DB 100 one: DB 1 | byte start |
Comments on this program
The temporary variable 'address' is used to refer to the location in memory which is to be changed. This variable must be within the program because the only way to reference memory is through a direct address and so it is stored at get+1. Unfortunately our program uses 'address' twice, once at get+1 and once at put+1. Thus if 'address' is changed then both get+1 and put+1 must also be changed. This creates a confusing program.
Often in the program a constant is needed, e.g. for ctr:=ctr-1 the constant 1 is necessary. This constant must be stored in a memory location, such a constant is called a literal.
New addressing modes
Changes in the addressing of memory which would improve the program.If a register were allowed to hold an address, then the program would not need to change itself. This type of addressing is called Register Indirect addressing.
If a constant could be part of an instruction then literals would not be necessary. This is called Immediate Addressing.
New Instruction Set With Register Indirect and Immediate Addressing.
Operation | Code | Addressing Mode |
MOV A,[address] | 00 | Direct |
MOV [address],A | 01 | Direct |
ADD A,[address] | 02 | Direct |
SUB A,[address] | 03 | Direct |
JMPNZ address | 04 | |
HALT | 05 | |
MOV B,[address] | 06 | Direct |
MOV A,[B] | 07 | Register Indirect |
ADD A,n | 08 | Immediate |
ADD B,n | 09 | Immediate |
MOV [B},A | 0A | Register Indirect |
SUB A,n | 0B | Immediate |
New Assembly Language Program
Address | Opcode | Operand | Assembly Language | Pseudo Code |
00 | 00 | 18 | MOV A,[num] | |
02 | 01 | 19 | MOV [ctr],A | ctr:=num |
04 | 06 | 1A | MOV B,[start] | address:=start |
06 | 07 | 00 | get: MOV A,[B] | repeat temp:=M[address] |
08 | 08 | 01 | ADD A,1 | temp:=temp+1 |
0A | 0A | 00 | MOV [B],A | M[address]:=temp |
0C | 09 | 01 | ADD B,1 | address:=address+1 |
0E | 00 | 19 | MOV A,[ctr] | |
10 | 0B | 01 | SUB A,1 | ctr:=ctr-1 |
12 | 01 | 19 | MOV [ctr],A | |
14 | 04 | 06 | JMP NZ,get | until ctr=0 |
16 | 05 | 00 | HALT | stop |
18 | 0A | 00 | num: DB 10 ctr: DB 0 | byte num byte ctr |
1A | 64 | start: DB 100 | byte start |
Description of Common Addressing Modes
Register Addressinge.g. ADD A,B Adds the value in register B to the value in register A and puts the result in register A.
Both operands in this instruction use Register addressing, the value referenced is held in a register within the datapath.
This addressing mode is used for access to temporary or commonly used variables.
e.g. ADD A,23 Adds 23 on to the value in A register.
The value 23 is part of the instruction and is called an immediate.
This addressing mode is used for constants.
* 75% to 80% fit within 16 bits
Displacement Addressing
e.g. ADD A,[B+16] Uses the value in the B register+16 as the address of a value which is added to the A register.
This mode has two common uses. [note that the value may be a signed integer]
1. Accessing local variables
- Upon entry to a procedure a section of memory may be allocated for local variables. If
the register holds the address of the start of this memory, then the constant may be used to access different local variables.
- If the constant in the instruction is the address of a table which has a fixed location in memory, then the value in the register may be used as an index into the table.
Average of 5 programs from SPECint92 and Average of 5 programs from SPECfp92
X-axis is in powers of 2
1% of addresses > 16-bits Register Indirect Addressing
e.g. ADD A,[B] Add the value at the address held the B register to the value in the A register.
This mode is used to allow pointers to data held in memory.
e.g. ADD A,[B+C] Use B+C as the address of a value to add to the A register.
This mode is often used to access the elements of a table or array where B is the address of the start of the table and C is an index into the table.
e.g. ADD A,[200] Add the value at address 200 to the A register.
Used for access to global variables which will have a fixed location in memory.
Memory Indirect Addressing
e.g. ADD A,@[200] Use the value at memory address 200 as the address of a value to add to the A register.
This mode has the same uses as Register Indirect and may be found mainly on older machines.
This is a combination of indexed and displacement, it may be used for accessing array elements.
Auto-increment and Auto-decrement Addressing
Some machines have addressing modes which automatically add or subtract a constant from the register specified in Register Indirect Addressing. This may occur before or after the register is used as an address.
These modes may be used for stepping through the elements of a table or array, and for performing stack operations where none are defined.
Scaled
Sometimes the elements of an array are not single machine words. In this case an index into the array must be multiplied by the size of an element. Some machines have an addressing mode which will perform this scaling.
Relative [or PC relative]
This mode is the same as indexed or displacement where the register used is the Program Counter. It is often used to specify the destination of certain JUMP instructions. The destination of a JUMP instruction is most likely to be
near the current instruction, so if relative addressing is used then the size in memory of the destination address may be reduced.
Summary of Addressing Modes
Name | Example | Operation | Usage |
Register | ADD R4,R3 | R4:=R4+R3 | Temporary Variables |
Immediate | ADD R4,3 | R4:=R4+3 | Constants |
Displacement [Based] | ADD R4,[R1+100] | R4:=R4+M[R1+100] | Local Variables |
Register Indirect | ADD R4,[R1] | R4:=R4+M[R1] | Pointers |
Indexed | ADD R4,[R1+R2] | R4:=R4+M[R1+R2] | Arrays |
Indexed with displacement | ADD R4,[R1+R2+8] | R4:=R4+M[R1+R2+8] | Arrays |
Direct | ADD R4,[100] | R4:=R4+M[100] | Global Variables |
Memory Indirect | ADD R4,@[100] | R4:=R4+M[M[100]] | Pointers |
Auto-increment | ADD R4,[R2]+ | R4:=R4+M[R2] R2:=R2+d | Stepping through an Array or Stack ops. |
Auto-decrement | ADD R4,-[R2] | R2:=R2-d R4:=R4+M[R2] | Stack ops |
Scaled | ADD R1,[R2+R3*4] | R4=:R4+M[R2+R3*4] | Arrays of structures |
Relative | ADD R1,[R2+PC] | R4:=R4+M[R2+PC] | Static Local Variables |
Data Size
So far we have assumed that there is only one type of data, in reality machines have to deal with bytes, halfwords, words and double words.
Some Instruction sets allow every instruction to work with every data size, others only allow word accesses and perhaps byte accesses.
Instruction Coding
Instructions may be encoded so that they are variable in size [e.g. pentium] or fixed [PowerPC]. Variable sized instructions allow a program to take up less memory but the CPU must be more complex to deal with the variable length. Hybrid coding forces an instruction to be one of a few fixed lengths.