Data Structures
The purpose of a program is to process data efficiently. A data structure is a way to group & store data in a computer, and have pros and cons which make them suitable for different scenarios.
We use to describe the best, average and worst case time complexity of an algorithm.
Overview of data structures
A Linear Data Structure has each item only relates to it's front and back item.
A Non-linear Data Structure has each item can relate to multiple other items.
Abstract data types (ADT) is a concept which defines the operations that can be performed on a data structure, without specifying the implementation details.
Concrete data types (CDT) is the implementation of the ADT.
The following is an overview of the ADTs that is dicussed in the course:
- Core-Implementations
- Arrays
- Nodes / linked list
- Structure types
- Linear-Structures
- Linked list
- Stack
- Queues
- Nonlinear-Structures
- Trees
- Graphs
- Hash tables
- Linear-Structures
Core implementations
The following ADTs can be used to implement other data structures that will be introduced in the following sections.
Set number of items stored in adjacent memory locations. Identified by the first item.
- Access: Simple access by index.
- Insertion & deletion: Slow, as it involves shifting the surrounding items and creating a new array.
A node is an item, which can be connected to other nodes by edges. In order to access an item in a node-based data structure, we have to traverse the nodes by following the edges.
Nodes can be directional (one-way access) or non-directional (two-way access), and can have weights (values).
Nodes that are connected to the next node. Identified by the first node.
- Access: Slow, as we have to traverse the list from the start.
- Insertion & deletion: Fast, as we only need to change the pointers of the neighbouring nodes.
We can reverse a linked list in time by iterating through the list and reversing the direction of the edges using three pointers:
- Make current node point to previous node
- Save previous node
- Move to next node
- Repeat
We can find the middle node of a linked list in time by using two pointers:
- Move one pointer one node at a time
- Move the other pointer two nodes at a time
- When the fast pointer reaches the end, the slow pointer will be at the middle
Linear structures
Stacks and queues are concepts that extend from a linear list of items.
They have the following characteristics:
- Only one item can be accessed at an instance of the data structure.
- Items cannot be inserted or removed freely.
In a stack / queue, push / enqueue (in) means adding an item, and pop / dequeue (out) means removing an item. Only the out item can be accessed.
- Stack: First In, First Out (FIFO).
- Queue: Last In, First Out (LIFO). Accessing out items is . Accessing / operating on other items is .
Priority queues are a special type of queue, know more about them in the Heaps section.
Non-linear structures
Graph
Nodes which can connect to multiple other nodes. (Allow loops)
- Directed: Only one direction is allowed.
- Undirected: Both directions are allowed.
- Weighted: Each edge has a weight.
- Unweighted: Each edge has no weight.
- Number of nodes:
- Number of edges:
- Degree of node: Number of edges connected to the node.
- Maximum degree of graph: Maximum degree of all nodes.
Implementations:
A matrix. are the list of nodes that node is connected to.
- Space: - better for dense graphs (more edges).
- Access (check): for determining if there is a connection between two nodes.
- Access (process): for processing all edges of the graph.

An array with items, each item in the array is a list of nodes that the current node is connected to.
- Space: - better for sparse graphs (less edges).
- Access (check): for determining if there is a connection between two nodes.
- Access (process): Better for sparse graphs as less than elements.

Traversal:
Visits nodes at the increasing order of distance from the starting node.
- Queue starting node
- Repeat until queue is empty, visit queue node
- Add unqueued adjacent nodes to queue and mark as queued.
Example implementation: Graph Bfs
Heaps
A priority queue is a queue where each item has a priority, and the item with the highest priority is accessed / removed first. The most common implementation is the heap.
- Access: (size, peek)
- Insertion: (enqueue, dequeue)
A (max) heap is a binary tree where each node has a value greater than or equal to its children. The root node has the highest value.
- Max heap: The root node has the highest value.
- Min heap: The root node has the lowest value.
Thouhg a tree-based data structure, we usually implement heaps as arrays, which is useful such as in the case of heap sort.
- Root at index 0
- Left child at
- Right child at
- Parent at
We can build (max) heaps using heapify, which converts a binary tree into a heap, given that its children are already heaps. This takes time.
-
If largest node is not the root node, swap the largest node and the root
-
Heapify the new root node recursively Due to the limitation of heapify, we must build the heap from the bottom up. Heapifying leaf-nodes does nothing. Observe that the last non-leaf node is at index , we just have to heapify all nodes before that index to build the complete heap. As items further down the array is deeper in the heap:
-
Iterate through the array from the last non-leaf node to the root node.
-
Heapify the current node. This takes time.
Heaps are especially useful for finding the largest and smallest elements. To find the largest k elements:
- Put first k elements in a min-heap.
- For each remaining element, peek the min-heap and check if the current element is greater than the peeked element. If so, remove the peeked element and add the current element to the min-heap.
We can also use heaps to merge sorted arrays.
- Put the first element of each array into a min-heap.
- Pop the min-heap to the result, and add the next element of the array that the popped element came from.
- Repeat until all elements are added. Time complexity: where is the total number of elements and is the number of arrays.
We had inserted elements into the heap, and each insertion takes time.
Space complexity: as the heap have at most elements, and result array has elements.
Trees
- Root: The top node of the tree.
- Internal node: A node with children.
- Leaf / external node: A node with no children.
- Sibilings: Nodes with the same parent.
- Height: The number of edges from the root to the furthest leaf.
- Depth: The number of edges from the root to the current node.
- Sub tree: A tree that is a child of a node.
- Degree of node: The number of immediate children of the node.
- Degree of tree: The maximum degree of all nodes.
- (Proper) ancestor: Node that has the current node as a child. If not proper, a node can be an ancestor of itself.
- (Proper) Descendant: Node that has the current node as a parent. If not proper, a node can be a descendant of itself.
Hash tables
Dynamic set of key-value pairs, where items are stored in an array with fixed size. To access an item, the index is found by a hash function.
A collision occurs when . We deal with it by different methods.
- Access: ,
- Insertion & deletion: ,
Where is the hash function and is the key.
Note: Hash tables are a specific implementation of a dictionary. Each item in the hash table table is called a bucket or a slot
A hash function is a function that takes a key and returns an index. It should be in relation to the size of the table .
The most basic hash function is the division method:
A good value of should be a prime number that is not close to a power of 2.
When , we have a collision. We can resolve it by:
- Chaining: Store items in the same slot by appending them to the linked list in the slot.
- Open addressing: Store items in another slot. For opening addressing, we have the following implementations and the slot where the item is stored is:
Method | Slot |
---|---|
Linear probing | |
Quadratic probing | |
Double hashing |
Where is the number of collisions.
Primary clustering occurs in open-addressing hash tables that use linear probing for collision resolution, which can lead to clusters of occupied slots.
This can lead to longer search times for items that are stored in the same cluster.
General steps:
- For each slot, assume a collision and resolve until an empty slot is found, and count the steps (no. of slots inspected). We usually count nil slots as well.
- Sum and divide by Notes:
- For double hashing, we must consider the second hash function. In other resolution methods, we already assumed a collision so the first hash function does not matter. Refer to example for detailed steps.
General steps:
- For each slot, count the number of items in the slot.
- Sum and divide by
For chaining:
-
-
-
Space For open addressing:
-
-
-
Space Where is the average number of slots inspected, is the average number of slots inspected for successful search, is the size of the table, is the number of items, is the space for a pointer, is the space for an element, and is the load factor.