Memory
Memory hierarchy
Memory comes with varying attributes. Our goal is to on capacity, cost, access time, and access frequency. The memory hierarchy is a way to organize memory in a computer system to optimize these attributes.
Type of memory | Usage | Category |
---|---|---|
Registers | Instructions | Inbound memory |
On-chip cache | Inside CPU | Inbound memory |
Cache memory | Motherboard | Inbound memory |
Main memory | RAM | Inbound memory |
Secondary storage | Hard disk, SSD | Outbound / Off-line storage |
Going from top of bottom, the memory hierarchy has the following trends:
- Capacity
- Cost
- Access time
- Access frequency
Programs do not access all memory locations uniformly. Some memory locations have higher tendency to be accessed than others.
There are two types of locality:
-
Temporal locality: If a memory location is accessed, it is likely to be accessed again in the near future.
-
Spatial locality: If a memory location is accessed, it is likely that nearby memory locations will be accessed in the near future.
Example of locality
Consider the following program:
The memory at s
is accessed in every loop (temporal), while each element in arr
is accessed sequentially (spatial).
Assuming memory stores in units of 8 bits (1 byte), multi-byte data is stored differently in memory on different architectures.
- Big endian: Stored from most to least signficant byte (left to right).
- Little endian: Stored from least to most signficant byte (right to left).
Example storage
To store 0x11223344
in memory:
- Big endian:
0x11 0x22 0x33 0x44
- Little endian:
0x44 0x33 0x22 0x11
In modern days, CPU reads from cache memory. CPU reads **word by word, and data is transferred in blocks (usually 4 KB), containing multiple words.
Access method | Data accessed by |
---|---|
Sequential | Beginning to end |
Random | In any order. Has constant latency |
Associative | Data is accessed by content, not by address. Used in cache memory & Content-Addressable Memory (CAM) |
Internal memory
The following table compares the different types of internal memory:
Memory Type | Category | Erasure | Write Mechanism | Volatility |
---|---|---|---|---|
RAM | Read-write | Electrically, byte-level | Electrically | Volatile |
ROM | Read-only | Not possible | Masks | Nonvolatile |
PROM | Electrically | |||
EPROM | Read-mostly | UV light, chip-level | ||
EEPROM | Electrically, byte-level | |||
Flash | Electrically, block-level |
Flash memory is primarily used for USB drives and SSD. They have faster write times than EEPROM, but limited number of write cycles.
Characteristic | Dynamic RAM | Static RAM |
---|---|---|
Storage Technology | Use transistors to store electric charges. | Use logic gates (latches). |
Refreshing | Required (every few ms due to leaking charge) | Not required |
Speed | Slower (delay due to capacitance) | Faster |
Usage | Main memory | Cache memory |
Cost | Cheap | Very expensive |
We are interested in the following key metrics:
- Access time: Time to read/write data
- Transfer time: Time to transfer data
- Memory cycle time: Access + Transfer time
Errors may arise from various sources such as spike in voltage (lightning), electromagnetic interference (cosmic ray), power supply problems, etc.
Extra bits are required to detect errors. (Usually in secondary storage devices but not in main memory as it is volatile and less likely to have errors.)
Common error detection and correction methods include:
- Parity Bit: Extra bit added to data to make number of 1s even/odd. (Only detection of odd number of errors, no correctionCannot be used to correct errors. )
- Hamming Code
- Repetition Code
Cache memory
Cache memory is hidden software and managed by hardware. It stores copies of frequently accessed data to speed up future access to them.
Consider the word-addressing main memory, with each address of length bits. By organizing the memory into blocks (groups) of words, we have a total of blocks.
Generally, caches will container lines (groups) of the blocks.
On each cache line, it contains a tag (identifier) and a word.
Tag | Block | |||
---|---|---|---|---|
0 | 2n/k | ⋯ | 2n - 2n/k | |
1 | 2n/K + 1 | ⋯ | 2n - 2n/k + 1 | |
⋮ | ⋮ | ⋱ | ⋮ | |
2n/k - 1 | 2 × 2n/k - 1 | ⋯ | 2n - 1 | |
←Block length (k words)→ |
Tag | Word | |||
---|---|---|---|---|
0 | m | ⋯ | (k-1)·m | |
1 | m + 1 | ⋯ | (k-1)·m + 1 | |
⋮ | ⋮ | ⋱ | ⋮ | |
m - 1 | 2m - 1 | ⋯ | k·m - 1 | |
←Word length (k words)→ |
The following are common replacement algorithms:
- Random replacement: Randomly select a block to replace.
- Least Recently Used (LRU): Replace the block that has not been used for the longest time.
- Least Frequently Used (LFU): Replace the block that has been used the least number of times.
- First In First Out (FIFO): Replace the block that has been in the cache the longest.
Address mapping
Address mapping inteprets the address of a word in memory, and maps it to a cache block.
The cache memory splits the given memory address into 3 parts:
The specific number of bits for Number is different for each cache mapping method.
Offset is used to identify, from left to right, the index of the word in block.
Mapping Type | Value of s | Usage of Number | Usage of Tag | Adv. | Disadv. |
---|---|---|---|---|---|
Direct | = no. of cache lines | Determines cache line directly | Ensures consistency | Fast | Multiple memory block can map to same cache line |
Full Associative | - | Matches across all cache lines in parallel | Heavy processing (parallel) | ||
Set Associative | = no. of cache sets | Determines cache set | Matches within the set in parallel |
For set associative, a n-way associative set will divide the cache by , where each set will have blocks.
Example set associative
Given address in binary: 1100 1001 0001 0010 1111
, cache size: 1024 bytes, block size 128 word, word size 4 bytes. For a two-way set associative mapping, what is the tag, set number and offset?
Block size bytes, so bits.
Number of blocks in cache , number of set , so bits.
The rest is the tag. Therefore, from left to right, tag is 1100 1001 0
, set number 00
, offset 1 0010 1111
Performance
Cache miss penalty (): Time to transfer value from main memory to cache (in nanosecs)
Cache hit time (): Time to access the cache
Cache hit rate ():
Average memory access time:
- Split Cache: The cache is divided into instruction cache and data cache. Size for each cache is fixed. No pipeline hazard. The main trend is to use split cache.
- Unified Cache: The cache is used for both instructions and data. Instructions and data are automatically balanced. Has contention problem on parallel and pipeline execution that imposes bottleneck on performance.