Instructions
Computer components
System bus connects CPU, memory, and I/O devices.
The key design concepts of Von Neumann architecture are:
- Data and instructions are stored in a single read-write memory.
- The contents of memory are addressable by location.
- Execution occurs in sequential fashion.
- Byte: 8 bits
- Word: Unit of memory, e.g. 32-bit word
- Registers: Small amount of fast storage inside CPU.
- Buffer: Temporary storage for data.
Short | Name | Holds |
---|---|---|
PC | Program Counter | Address of next instruction |
IR | Instruction Register | Current instruction ` |
MAR | Memory Address Register | Address of memory location |
MBR | Memory Buffer Register | Data to be written to memory or data read from memory |
I/O AR | I/O Address Register | Address of I/O device |
I/O BR | I/O Buffer Register | Data to be written to/read from I/O device |
Instruction set
Instructions:
- Are in 32-bit (4-byte) binary format, and can be one-word or multi-word.
- Can be represented by hexadecimal format (hand-assembled). They can also be simplified to assembly.
- Instructions are made up of an operator code (code) and some parameters (operands).
- Data transfer
- Arithmetric
- Logical
- Control flow
- Input / Output
- Data conversion
Arithmetric operations treat operands as numbers, and has to consider the sign of operands.
Logical operations treat operands as bit patterns
The following is our list of instructions, sorted by operator code:
Mnemonic | Assembled | Assembly Syntax | Description | Type |
---|---|---|---|---|
ADD | 00###### | ADD src1, src2, dst | Add src1 and src2, store in dst | Arithmetic |
SUB | 01###### | SUB src1, src2, dst | Subtract src2 from src1, store in dst | Arithmetic |
AND | 02###### | AND src1, src2, dst | Bitwise AND src1 and src2, store in dst | Logical |
OR | 03###### | OR src1, src2, dst | Bitwise OR src1 and src2, store in dst | Logical |
NOT | 04##00## | NOT src1, dst | Bitwise NOT src1, store in dst | Logical |
MOV | 05##00## | MOV src1, dst | Copy src1 to dst | Data transfer |
LD | 0600ff## 000000$$ | LD adr, dst | Load value from adr to dst | Data transfer |
ST | 07##ff00 000000$$ | ST src1, adr | Store value from src1 to adr | Data transfer |
BR | 0800ff00 000000$$ | BR label | Branch to label Regardless of output | Control flow |
BZ | 0801ff00 000000$$ | BZ label | Branch if Zero | Control flow |
BNZ | 0802ff00 000000$$ | BNZ label | Branch if Not Zero | Control flow |
HALT | 09000000 | HLT | Stops the execution | Control flow |
PUSH | 0A##0000 | PUSH r | Push r to stack (temp store value) | Data transfer |
POP | 0B0000## | POP r | Pop from stack to r (restore value) | Data transfer |
CALL | 0C00ff00 000000$$ | CALL label | Call function at label | Control flow |
RET | 0D000000 | RET | Return from function | Control flow |
This table is useful during assembly programming and manual assembly.
This section involve hand-assembling an assembly program.
A simplified overview for completion of assignments:
src
,r
,dst
are a memory registers. In assembly syntax, it should beR1
,R2
etc. The assembled equivalent is the number of the register. (e.g. assembly syntaxR1
= assembed01
, replace##
)adr
are addresses of memory location. In assembly syntax, it should beP1
,P2
etc. The assembled equivalent is the address of memory location. (e.g. assembly syntaxP1
replace$$
with address).label
are also addresses, but in assembly syntax, it should be a label.
In a program, each instruction is 4-bytes, so the address of an instruction on line is (0-indexed). Refer to example below.
There are other ways to specify values in assembly for src
, r
, dst
and adr
, described in addressing modes.
Example ADD
0000 0001 0010 0011
represents an addition (0000) of the numbers in memory locations 2 (0010) and 3 (0011) and store the result in memory location 1 (0001).
The instruction might sometimes also be expressed in hex format: 0x00 01 02 03
.
Example BNZ
Consider the following set of instructions:
The BNZ
instruction on line 2 and 3 (recall BNZ
is 2-worded) will loop back to line 1 at address (0000H
).
Addressing modes are ways you can specify the address of an operand. Using these methods can reduce the size of program code (as don't need to calculate address explicity), but hardware will be more complicated. What addressing modes a program can use depends on the hardware.
Mode | Notation | Explaination | Advantages | Disadvantages |
---|---|---|---|---|
Immediate | MOV #5, | Value specified directly | No memory reference | Limited operand magnitude |
Direct | MOV 10, | Value in address 10 | Large operand magnitude | Limited address space |
Indirect | MOV (10), | Value in address specified in value in address 10 | Large address space | Multiple memory ref. |
Register | MOV R, | Value of R | No memory reference | Limited address space |
Register Indirect | MOV (R), | Value in address specified in value of R | Large address space | Extra memory ref. |
Displacement | MOV 2(R) | Value in address specified in value of R offset by 2 | Flexibility | Complexity |
Stack | PUSH R1, <> , POP <>, R1 | <> is the implicit return address stored on the stack | No memory reference | Limited applicability |
More on instruction set
In most applications, instructions either have three, two, one, or zero operands (or addresses). Symbolically, they are represented as:
no. operands | Assembly representation | Interpretation |
---|---|---|
3 | OP A,B,C | A B OP C |
2 | OP A,B | A A OP B |
1 | OP A | AC AC OP A |
0 | OP | T (T-1) OP T |
Where AC
is accumulator, T
is top of stack, and T-1
is the next of stack.
A procedure consists of multiple instructions that are executed in sequence. Within a procedure, instructions can be given to execute another procedure. For the CPU to know where to go and where to return after the called procedure is done, the return addresses need to be stored, which is done by a stack. The latest return address will be at the top of the stack, and the CPU will pop it when it reaches a return instruction.
Two types of data types exist: (1) Numeric (integer, floating point) and (2) Non-numeric (character, binary data). Their lengths are typically 8, 16, 32, or 64 bits.
For the MIPS architecture, a family of reduced instruction set computers (RISC), there are 9 basic data types: (1) signed and unsigned bytes, (2) signed and unsigned half-words, (3) signed and unsigned words, (4) double words, (5) single-precision floating point (32 bits), and (6) double-precision floating point (64 bits).
For the ARM architecture, it supports data types of (1) byte (8 bits), (2) half-word ( 16 bits), and (3) word (32 bits) in length. It only provides unsigned integers, nonnegative integers, and two’s complement integers. Floating point hardware is not provided in ARM architecture and must be emulated in software.
Assembly language programming
Assembly language is a low-level programming language that is very instruction set architecture (ISA) specific. Our courses focuses on the following ISA:
- Comments are preceded by
#
- Destination operands are on the right of the operands list
- Instructions are case insensitive
Each line of assembly language consists of:
label
is an optional label that can be used to refer to the instruction latermnemonic
is either a operand or a assembler directiveoperandx
is the operand(s) for the operationcomment
is an optional comment that describes the instruction
You can find the list of mnemonics in the instruction set.
An assembler directive is a command to the assembler, not an instruction to the CPU. They start with a .
and are not executed by the CPU:
Directive | Description |
---|---|
.data | Adds the subsequent data to the data segment |
.text | Adds the subsequent code to the program |
.global NAME | Makes the label NAME visible to other modules |
.space <EXPRESSION> | Reserves space with the size of <EXPRESSION> in bytes, filled with 0 s |
.word value1 [, value2, ...] | Put the values in successive memory locations, each occupying 4 bytes |
Flow control
Use call
to call a function and ret
to return from a function. Unlike high-level languages, you must manage the parameters and result of the function yourself.
- Specify the input and output parameter registers in the function's comment
- Use
push
andpop
to save and restore registers temporarily.
Execution cycle
The cycle is as follows:
Address to the next instruction is stored in PC register, which is incremented automatically during execution.
For a two-word instruction, the first word is fetched first, then PC is incremented by 1 word to point to the second word. Then the second word is fetched, and PC is incremented again to point to the next instruction.
The control unit handles:
- Decoding the instruction
- Setups up CPU components like ALU at the right time
If the operands are in registers, data are moved from registers to ALU.
If the operands are in memory, then the instruction would be a two-word instruction. Note that if PC points to the second word of a two-word instruction, after the above process, MBR will contain the address of the second word, not the content. Therefore, another memory read is needed to fetch the content, by:
The ALU performs the operation specified by the instruction. The result is stored in some temporary register.
Similar to operand fetch, if the destination is in register, RF write is performed. If the destination is in memory, then operand address calculation is first performed. Then the data is written to memory.
Interruptions are important as:
- They improve efficiency.
- When an I/O arrives, it may need immediate attention, or data may be lost. e.g. incoming data from a network.
- Other programs may also need the CPU’s attention. e.g. on a time-sharing system.
When interruption is required, I/O device sends a signal to the CPU. The CPU will need to remember the current state of the program, and then jump to serve the interrupt. The CPU will then return to the original program and continue execution as if nothing happened.
Interrupt handlers can either be hardware or software