Data Science Coding Interview Guide

Manralai

21 min readAug 15, 2024

Topic to focus on for data science coding interviews Subscribe

Arrays
Linked Lists
Stacks
Queues
Trees
Graphs
Hashing
Heaps
Trie
Sets and Maps
Dynamic Programming
Disjoint Set (Union-Find)

For all new updates: Subscribe

1. Arrays

An array is a data structure that stores a collection of elements, each identified by an index or a key. Elements in an array are typically of the same data type and are stored in contiguous memory locations. The index allows for direct access to any element in the array. Arrays can be one-dimensional (a list), two-dimensional (a matrix), or multidimensional.

Space and Time Complexity

Space Complexity: The space complexity of an array is O(n), where n is the number of elements in the array. Each element occupies a constant amount of space, but the total space required is proportional to the number of elements.
Time Complexity: Accessing an element in an array by index has a time complexity of O(1), as it involves direct addressing. Insertion and deletion operations may have a time complexity of O(n) in the worst case, as they may require shifting elements to maintain contiguous storage.

↪ Interview Questions and Answers

1: Find the Maximum Sub-array Sum.

def max_subarray_sum(nums):
    max_sum = current_sum = nums[0]
    for num in nums[1:]:
        current_sum = max(num, current_sum + num)
        max_sum = max(max_sum, current_sum)
    return max_sum

The time complexity is O(n), where n is the length of the array.

2: Rotate an Array

def rotate_array(nums, k):
    n = len(nums)
    k = k % n
    nums[:] = nums[-k:] + nums[:-k]

The time complexity O(n), where n is the length of the array.

3: Implement an Array from Scratch

class MyArray:
  def __init__(self):
    self.length = 0
    self.data = {}

  def get(self, index):
    return self.data[index]

  def push(self, item):
    self.data[self.length] = item
    self.length += 1

  def pop(self):
    if self.length == 0:
      return None
    popped_item = self.data[self.length - 1]
    
    del self.data[self.length - 1]
    self.length -= 1
    return popped_item

The time complexity is O(1) for get, push, and pop operations.

4: Find the Single Number in an Array

def single_number(nums):
    result = 0
    for num in nums:
        result ^= num
    return result

The time complexity is O(n), where n is the length of the array.

For all new updates: Subscribe

2. Linked Lists

A linked list is a linear data structure consisting of a sequence of elements, where each element points to the next element in the sequence through a reference or a “link.” Unlike arrays, where elements are stored in contiguous memory locations, linked lists allow elements to be scattered in memory, connected by these links.

Each element, known as a “node,” comprises two parts: the actual data or payload, and a reference (pointer) to the next node in the sequence. In a doubly linked list, each node also contains a reference to the previous node.

This structure enables dynamic memory allocation, efficient insertions, and deletions at any position within the list, and facilitates the creation of various types of linked lists, including singly linked lists, doubly linked lists, and circular linked lists.

The flexibility of linked lists comes with a trade-off in terms of random access, as accessing an element at a specific index requires traversing the list from the beginning or end, resulting in a time complexity of O(n) for access operations.

However, the dynamic nature and ease of manipulation make linked lists valuable in scenarios where frequent insertions and deletions are common, and the size of the data set is not known in advance.

Types

Singly Linked List: Each element points to the next.
Doubly Linked List: Each element points to the next and the previous.
Circular Linked List: The last element points to the first.

Time Complexity

Access: O(n) — Traversal is necessary.
Insertion/Deletion (at an arbitrary position): O(1) — If the node is given; otherwise, O(n).
Insertion/Deletion (at the beginning): O(1) — Constant time as no shifting is required.

↪ Interview Questions & Answers

1: Reverse a Linked List

def reverse_linked_list(head):
    prev, current = None, head
    while current:
        next_node = current.next
        current.next = prev
        prev = current
        current = next_node
    return prev

Time Complexity is O(n) since the algorithm traverses the linked list once, visiting each node exactly once.

2: Find the Middle of a Linked List

def find_middle(head):
    slow = fast = head
    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next
    return slow.value

Time Complexity is O(n) as the algorithm uses two pointers, one advancing twice as fast as the other. The slower pointer reaches the middle in one pass.

3: Merge Two Sorted Arrays

def merge_sorted_arrays(arr1, arr2):
    result = []
    i = j = 0
    while i < len(arr1) and j < len(arr2):
        if arr1[i] < arr2[j]:
            result.append(arr1[i])
            i += 1
        else:
            result.append(arr2[j])
            j += 1
    result.extend(arr1[i:])
    result.extend(arr2[j:])
    return result

Time Complexity is O(m + n) where m and n are the lengths of the two input arrays. The algorithm linearly merges the two arrays.

For all new updates: Subscribe

3. Stacks

A stack is a linear data structure that follows the Last In, First Out (LIFO) principle. It means that the last element added to the stack is the first one to be removed. A stack can be visualized as a collection of elements with two main operations: push, which adds an element to the top of the stack, and pop, which removes the top element.

Space and Time Complexity

Space Complexity: The space complexity of a stack is O(n), where n is the number of elements in the stack. In the worst case, the stack may need to store all elements.
Time Complexity: Both push and pop operations have a time complexity of O(1), assuming a well-implemented stack.

↪ Interview Questions and Answers

1: Implement a Stack Using an Array

class Stack:
    def __init__(self):
        self.items = []
    def push(self, item):
        self.items.append(item)
    def pop(self):
        if not self.is_empty():
            return self.items.pop()
        else:
            return None
    def is_empty(self):
        return len(self.items) == 0
# Example usage:
# stack = Stack()
# stack.push(1)
# stack.push(2)
# popped_item = stack.pop()
# Output: 2

Time Complexity: O(1) for both push and pop operations.

2: Implement a Min Stack

class MinStack:
    def __init__(self):
        self.stack = []
        self.min_stack = []
    def push(self, value):
        self.stack.append(value)
        if not self.min_stack or value <= self.min_stack[-1]:
            self.min_stack.append(value)
    def pop(self):
        if self.stack:
            popped_value = self.stack.pop()
            if popped_value == self.min_stack[-1]:
                self.min_stack.pop()
            return popped_value
        else:
            return None
    def get_min(self):
        return self.min_stack[-1] if self.min_stack else None
# Example usage:
# min_stack = MinStack()
# min_stack.push(3)
# min_stack.push(1)
# min_value = min_stack.get_min()
# Output: 1

Time Complexity: O(1) for push, pop, and get_min operations.

3: Check for Balanced Parentheses

def is_balanced_parentheses(s):
    stack = []
    brackets = {'(': ')', '[': ']', '{': '}'}
    
    for char in s:
        if char in brackets.keys():
            stack.append(char)
        elif char in brackets.values():
            if not stack or brackets[stack.pop()] != char:
                return False
    return len(stack) == 0
# Example usage:
# is_balanced = is_balanced_parentheses("{[()]}")
# Output: True

Time Complexity: O(n) where n is the length of the input string.

For all new updates: Subscribe

4. Queues

A queue is a linear data structure that follows the First In, First Out (FIFO) principle. It means that the first element added to the queue is the first one to be removed. A queue can be visualized as a collection of elements with two main operations: enqueue, which adds an element to the rear of the queue, and dequeue, which removes the front element.

Space and Time Complexity:

Space Complexity: The space complexity of a queue is O(n), where n is the number of elements in the queue. In the worst case, the queue may need to store all elements.
Time Complexity: Both enqueue and dequeue operations have a time complexity of O(1), assuming a well-implemented queue.

↪ Interview Questions & Answers

1: Implement a Queue Using an Array

class Queue:
    def __init__(self):
        self.items = []
    def enqueue(self, item):
        self.items.append(item)
    def dequeue(self):
        if not self.is_empty():
            return self.items.pop(0)
        else:
            return None
    def is_empty(self):
        return len(self.items) == 0
# Example usage:
# queue = Queue()
# queue.enqueue(1)
# queue.enqueue(2)
# dequeued_item = queue.dequeue()
# Output: 1

Time complexity is O(1) for both enqueue and dequeue operations.

2: Implement a Circular Queue

class MyCircularQueue:
    def __init__(self, k):
        self.capacity = k
        self.queue = [None] * k
        self.front = self.rear = -1
    def enqueue(self, value):
        if self.is_empty():
            self.front = self.rear = 0
        elif not self.is_full():
            self.rear = (self.rear + 1) % self.capacity
        else:
            return False
        self.queue[self.rear] = value
        return True
    def dequeue(self):
        if not self.is_empty():
            if self.front == self.rear:
                self.front = self.rear = -1
            else:
                self.front = (self.front + 1) % self.capacity
            return True
        else:
            return False
    def front(self):
        return self.queue[self.front] if not self.is_empty() else -1
    def rear(self):
        return self.queue[self.rear] if not self.is_empty() else -1
    def is_empty(self):
        return self.front == -1
    def is_full(self):
        return (self.rear + 1) % self.capacity == self.front
# Example usage:
# circular_queue = MyCircularQueue(3)
# circular_queue.enqueue(1)
# circular_queue.enqueue(2)
# is_full = circular_queue.is_full()
# Output: False

The time complexity O(1) for enqueue, dequeue, front, rear, is_empty, and is_full operations.

3: Design a Blocking Queue

from queue import Queue
import threading


class BlockingQueue:
    def __init__(self, max_size):
        self.queue = Queue(max_size)
    def enqueue(self, item):
        self.queue.put(item)
    def dequeue(self):
        return self.queue.get()
# Example usage:
# blocking_queue = BlockingQueue(5)
# blocking_queue.enqueue(1)
# item = blocking_queue.dequeue()
# Output: 1

The time complexity O(1) for both enqueue and dequeue operations.

4: Reverse the First K Elements of a Queue

from queue import Queue

def reverse_k_elements(queue, k):
    stack = []
    for _ in range(k):
        stack.append(queue.get())
    while stack:
        queue.put(stack.pop())
    for _ in range(queue.qsize() - k):
        queue.put(queue.get())
# Example usage:
# my_queue = Queue()
# for i in range(1, 6):
#     my_queue.put(i)
# reverse_k_elements(my_queue, 3)
# Result: Queue contents after reversal - 3, 2, 1, 4, 5

Time complexity O(n), where n is the size of the queue.

For all new updates: Subscribe

5. Trees

A tree is a hierarchical data structure composed of nodes connected by edges. It is a widely used abstract data type that emulates a hierarchical tree structure with a root node and sub-trees of child nodes, where each node can have a parent and zero or more child nodes. In a binary tree, each node has at most two children, referred to as the left child and the right child.

Space and Time Complexity:

Space Complexity: The space complexity of a tree is O(n), where n is the number of nodes in the tree. In a balanced binary tree, the height of the tree is log(n), leading to a balanced relationship between the number of nodes and the height.
Time Complexity: The time complexity varies based on the specific operation and the type of tree. For binary search trees (BSTs), the time complexity for search, insertion, and deletion is O(log n) on average, where n is the number of nodes. In the case of unbalanced trees, the time complexity can degrade to O(n).

↪ Interview Questions and Answers

1: Implement a Binary Tree Node

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

2: Binary Tree Traversal (In-order, Pre-order, Post-order)

def inorder_traversal(root):
    result = []
    if root:
        result.extend(inorder_traversal(root.left))
        result.append(root.value)
        result.extend(inorder_traversal(root.right))
    return result

# Similar functions for preorder and postorder traversal.

The time complexity is O(n) for each traversal, where n is the number of nodes in the tree.

3: Check if a Binary Tree is Balanced

def is_balanced(root):
    def height(node):
        if not node:
            return 0
        left_height = height(node.left)
        right_height = height(node.right)
        if left_height == -1 or right_height == -1 or abs(left_height - right_height) > 1:
            return -1
        return max(left_height, right_height) + 1
  
    return height(root) != -1

The time complexity is O(n), where n is the number of nodes in the tree.

4: Lowest Common Ancestor in a Binary Search Tree (BST)

def lowest_common_ancestor(root, p, q):
    if not root:
        return None
    
    if root.value > p.value and root.value > q.value:
        return lowest_common_ancestor(root.left, p, q)
    elif root.value < p.value and root.value < q.value:
        return lowest_common_ancestor(root.right, p, q)
    else:
        return root

The time complexity is O(log n) on average for a balanced BST; O(n) in the worst case.

For all new updates: Subscribe

6. Graphs

A graph is a collection of nodes (vertices) and edges connecting these nodes. Graphs can be directed or undirected, and edges may have weights. Nodes represent entities, and edges represent relationships between these entities. Graphs can be cyclic or acyclic, and they may contain multiple connected components. Graphs are versatile data structures used to model relationships and connections between various entities.

Space and Time Complexity

Space Complexity: The space complexity of a graph is O(V + E), where V is the number of vertices (nodes) and E is the number of edges. This accounts for the storage of vertices and edges in the graph.
Time Complexity: The time complexity varies based on the specific graph operation. Common operations include traversing the graph (Breadth-First Search, Depth-First Search), finding the shortest path (Dijkstra’s algorithm, Bellman-Ford algorithm), and detecting cycles. The time complexity can range from O(V + E) for simple traversals to more complex algorithms with higher time complexities.

↪ Interview Questions and Answers

1: Implement a Graph Using an Adjacency List

class Graph:
    def __init__(self):
        self.graph = {}
    def add_edge(self, u, v):
        if u in self.graph:
            self.graph[u].append(v)
        else:
            self.graph[u] = [v]
# Example usage:
# my_graph = Graph()
# my_graph.add_edge(1, 2)
# my_graph.add_edge(1, 3)
# my_graph.add_edge(2, 3)

Time Complexity:

Adding an edge (add_edge): O(1) on average. In most cases, adding an edge involves appending the destination vertex to the adjacency list of the source vertex, which is a constant-time operation.
Displaying the graph (display): O(V + E), where V is the number of vertices and E is the number of edges. This is because it iterates through all vertices and their corresponding edges

The Space Complexity is O(V + E) for storing vertices and edges

2: Perform Depth-First Search (DFS) on a Graph

def dfs(graph, node, visited):
    if node not in visited:
        print(node)
        visited.add(node)
        for neighbor in graph.get(node, []):
            dfs(graph, neighbor, visited)

# Example usage:
# my_graph = {1: [2, 3], 2: [3]}
# dfs(my_graph, 1, set())

The time Complexity: O(V + E), where V is the number of vertices and E is the number of edges.

3: Find the Shortest Path in a Weighted Graph (Dijkstra’s Algorithm)

import heapq

def dijkstra(graph, start):
    distances = {vertex: float('infinity') for vertex in graph}
    distances[start] = 0
    priority_queue = [(0, start)]
    while priority_queue:
        current_distance, current_vertex = heapq.heappop(priority_queue)
        if current_distance > distances[current_vertex]:
            continue
        for neighbor, weight in graph[current_vertex].items():
            distance = current_distance + weight
            if distance < distances[neighbor]:
                distances[neighbor] = distance
                heapq.heappush(priority_queue, (distance, neighbor))
# Example usage:
# my_weighted_graph = {1: {2: 1, 3: 4}, 2: {3: 2}, 3: {}}
# dijkstra(my_weighted_graph, 1)

Time Complexity: O((V + E) log V), where V is the number of vertices and E is the number of edges.

For all new updates: Subscribe

7. Hashing

Hashing is a technique that transforms input data (keys) into a fixed-size array index through a hash function. This process enables efficient data retrieval, as the hashed index directly maps to the location where the associated value is stored. Hashing is widely used for its ability to provide constant-time average-case complexity for key-based operations.

Space and Time Complexity

Space Complexity: The space complexity of hashing depends on the implementation and the underlying data structure. In a hash table, the space complexity is typically O(n), where n is the number of elements in the hash table.
Time Complexity: On average, hashing operations such as insertion, retrieval, and deletion have a time complexity of O(1). This assumes a well-designed hash function and proper handling of collisions. However, in the worst case (when collisions are frequent), the time complexity may degrade to O(n), where n is the number of elements.

↪ Interview Questions and Answers

1: Implement a Hash Table

class HashTable:
    def __init__(self, size):
        self.size = size
        self.table = [None] * size

    def hash_function(self, key):
          return hash(key) % self.size
      def insert(self, key, value):
          index = self.hash_function(key)
          if self.table[index] is None:
              self.table[index] = [(key, value)]
          else:
              for i, (existing_key, _) in enumerate(self.table[index]):
                  if existing_key == key:
                      self.table[index][i] = (key, value)
                      break
              else:
                  self.table[index].append((key, value))

      def get(self, key):
          index = self.hash_function(key)
          if self.table[index] is not None:
              for existing_key, value in self.table[index]:
                  if existing_key == key:
                      return value
          return None

Time Complexity: O(1) on average for insertion and retrieval, assuming a good hash function and minimal collisions.

2: Handling Collisions — Implement Separate Chaining

class HashTableSeparateChaining:
    def __init__(self, size):
        self.size = size
        self.table = [[] for _ in range(size)]

    def hash_function(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        index = self.hash_function(key)
        self.table[index].append((key, value))

    def get(self, key):
        index = self.hash_function(key)
        for existing_key, value in self.table[index]:
            if existing_key == key:
                return value
        return None

Time Complexity: O(1) on average for insertion and retrieval, assuming a good hash function and minimal collisions.

3: Open Addressing — Linear Probing

class HashTableLinearProbing:
    def __init__(self, size):
        self.size = size
        self.table = [None] * size

    def hash_function(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        index = self.hash_function(key)
        while self.table[index] is not None:
            index = (index + 1) % self.size
        self.table[index] = (key, value)

      def get(self, key):
        index = self.hash_function(key)
        initial_index = index
        while self.table[index] is not None:
            existing_key, value = self.table[index]
            if existing_key == key:
                return value
            index = (index + 1) % self.size
            if index == initial_index:
                break
        return None

Time Complexity: O(1) on average for insertion and retrieval, but may degrade to O(n) in the worst case due to clustering.

For all new updates: Subscribe

8. Heaps

A heap is a specialized tree-based data structure that satisfies the heap property. In a max heap, for any given node, the value of the node is greater than or equal to the values of its children. In a min heap, the value of each node is less than or equal to the values of its children. Heaps are commonly implemented as arrays, where the parent-child relationships are defined by the indices.

Space and Time Complexity:

Space Complexity: The space complexity of a heap is O(n), where n is the number of elements in the heap.
Time Complexity: The time complexity for basic heap operations is as follows: 1. Insertion (heapify-up): O(log n) 2. Deletion (heapify-down): O(log n) 3. Peek (accessing the root): O(1) 4. Building a heap from an array: O(n)

↪ Interview Questions and Answers:

1: Implement a Max Heap

class MaxHeap:
    def __init__(self):
        self.heap = []

    def insert(self, value):
        self.heap.append(value)
        self._heapify_up(len(self.heap) - 1)

    def extract_max(self):
        if len(self.heap) == 0:
            return None

        if len(self.heap) == 1:
            return self.heap.pop()

        max_value = self.heap[0]
        self.heap[0] = self.heap.pop()
        self._heapify_down(0)
        return max_value

    def _heapify_up(self, index):
        parent_index = (index - 1) // 2
        while index > 0 and self.heap[index] > self.heap[parent_index]:
            self.heap[index], self.heap[parent_index] = self.heap[parent_index], self.heap[index]
            index = parent_index
            parent_index = (index - 1) // 2

    def _heapify_down(self, index):
        left_child_index = 2 * index + 1
        right_child_index = 2 * index + 2
        largest = index

        if left_child_index < len(self.heap) and self.heap[left_child_index] > self.heap[largest]:
            largest = left_child_index

        if right_child_index < len(self.heap) and self.heap[right_child_index] > self.heap[largest]:
            largest = right_child_index

        if largest != index:
            self.heap[index], self.heap[largest] = self.heap[largest], self.heap[index]
            self._heapify_down(largest)

Time Complexity: O(log n) for insertion and extraction of the maximum element.

2: Implement a Min Heap

import heapq


class MinHeap:
    def __init__(self):
        self.heap = []

    def insert(self, value):
        heapq.heappush(self.heap, value)

    def extract_min(self):
        if len(self.heap) == 0:
            return None
        return heapq.heappop(self.heap)

Time Complexity: O(log n) for insertion and extraction of the minimum element using the built-in heapq module.

3: Merge k Sorted Lists Using a Min Heap

import heapq

def merge_k_sorted_lists(lists):
    min_heap = []
    for i, lst in enumerate(lists):
        if lst:
            heapq.heappush(min_heap, (lst.val, i, lst))

    dummy = ListNode()
    current = dummy

    while min_heap:
        val, i, node = heapq.heappop(min_heap)
        current.next = node
        current = current.next
        if node.next:
            heapq.heappush(min_heap, (node.next.val, i, node.next))

    return dummy.next

Time Complexity: O(N log k), where N is the total number of nodes across all lists and k is the number of lists.

For all new updates: Subscribe

9. Trie

A trie, also known as a prefix tree, is a tree-like data structure used to store a dynamic set of strings where the keys usually represent words. In a trie, each node represents a character, and the path from the root to a particular node forms the key associated with that node. Tries are efficient for tasks such as prefix matching and searching for words with common prefixes.

Space and Time Complexity:

Space Complexity: The space complexity of a trie is O(N*M), where N is the number of words and M is the average length of a word. Tries can be more memory-efficient than other data structures, especially when words share common prefixes.
Time Complexity: The time complexity of basic trie operations (insertion, deletion, search) is O(M), where M is the length of the key. This makes tries suitable for tasks like autocomplete and dictionary search.

↪ Interview Questions and Answers

1: Implement a Trie

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end_of_word = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end_of_word = True

    def search(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        return node.is_end_of_word

    def starts_with_prefix(self, prefix):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        return True

Time Complexity: O(M) for insertion, search, and prefix matching.

2: Autocomplete Using a Trie

def autocomplete(trie, prefix):
    node = trie.root
    for char in prefix:
        if char not in node.children:
            return []
        node = node.children[char]
    
    suggestions = []
    find_words_with_prefix(node, prefix, suggestions)
    return suggestions

def find_words_with_prefix(node, current_prefix, suggestions):
    if node.is_end_of_word:
        suggestions.append(current_prefix)
    
    for char, child_node in node.children.items():
        find_words_with_prefix(child_node, current_prefix + char, suggestions)

Time Complexity: O(K) for autocomplete, where K is the number of words with the given prefix.

3: Longest Word in a Dictionary

def longest_word(words):
    trie = Trie()
    for word in sorted(words):
        trie.insert(word)
    
    longest = ""
    for word in words:
        if trie.search(word):
            current = trie.longest_word_from_node(trie.root, word)
            if len(current) > len(longest) or (len(current) == len(longest) and current < longest):
                longest = current
    return longest


# Extend Trie class with the following method
def longest_word_from_node(self, node, current_word):
    longest = current_word
    for char, child_node in node.children.items():
        if child_node.is_end_of_word:
            child_longest = self.longest_word_from_node(child_node, current_word + char)
            if len(child_longest) > len(longest) or (len(child_longest) == len(longest) and child_longest < longest):
                longest = child_longest
    return longest# Extend Trie class with the following method


def longest_word_from_node(self, node, current_word):
    longest = current_word
    for char, child_node in node.children.items():
        if child_node.is_end_of_word:
            child_longest = self.longest_word_from_node(child_node, current_word + char)
            if len(child_longest) > len(longest) or (len(child_longest) == len(longest) and child_longest < longest):
                longest = child_longest
    return longest

Time Complexity: O(NM), where N is the number of words and M is the average length of a word, for building the trie. The search for the longest word has a time complexity of O(NM).

def longest_word(words):
    trie = Trie()
    for word in sorted(words):
        trie.insert(word)
    
    longest = ""
    for word in words:
        if trie.search(word):
            current = trie.longest_word_from_node(trie.root, word)
            if len(current) > len(longest) or (len(current) == len(longest) and current < longest):
                longest = current
    
    return longest

# Extend Trie class with the following method
def longest_word_from_node(self, node, current_word):
    longest = current_word
    for char, child_node in node.children.items():
        if child_node.is_end_of_word:
            child_longest = self.longest_word_from_node(child_node, current_word + char)
            if len(child_longest) > len(longest) or (len(child_longest) == len(longest) and child_longest < longest):
                longest = child_longest
    return longest

For all new updates: Subscribe

10. Sets and Maps

A set is a collection of distinct elements with no specific order. It is designed to provide efficient methods for membership tests, insertion, and deletion of elements. Sets are commonly used when the presence or absence of elements is more important than the order.

While a map, also known as a dictionary, is a collection of key-value pairs where each key is unique. It allows efficient retrieval, insertion, and deletion of values based on their associated keys. Maps are versatile data structures used for tasks like indexing and representing relationships between entities.

Space and Time Complexity:

Sets

Space Complexity: O(n), where n is the number of elements in the set.
Time Complexity:
Membership Test: O(1) on average for hash-based sets, O(n) for list-based sets.
Insertion and Deletion: O(1) on average for hash-based sets, O(n) for list-based sets.

Maps

Space Complexity: O(n), where n is the number of key-value pairs in the map.
Time Complexity:
Search (by key): O(1) on average for hash-based maps, O(n) for list-based maps.
Insertion and Deletion: O(1) on average for hash-based maps, O(n) for list-based maps.

↪ Interview Questions and Answers

1: Find the Intersection of Two Sets

def intersection(set1, set2):
    return set1.intersection(set2)

Time Complexity: O(min(len(set1), len(set2)))

2: Check if a String is an Anagram

def is_anagram(str1, str2):
    return set(str1) == set(str2)

Time Complexity: O(len(str1) + len(str2))

3: Implement a Frequency Counter Using a Map

def frequency_counter(arr):
    frequency_map = {}
    for element in arr:
        if element in frequency_map:
            frequency_map[element] += 1
        else:
            frequency_map[element] = 1
    return frequency_map

Time Complexity: O(n), where n is the length of the input array.

4: Find the First Non-Repeating Character in a String

def first_non_repeating_char(s):
    char_count = {}
    for char in s:
        char_count[char] = char_count.get(char, 0) + 1

    for char in s:
        if char_count[char] == 1:
            return char
    return None

Time Complexity: O(n), where n is the length of the input string.

For all new updates: Subscribe

11. Dynamic Programming

Dynamic Programming (DP) is a problem-solving paradigm that involves breaking down a problem into smaller sub-problems and solving each sub-problem only once, storing the solutions to sub-problems in a table to avoid redundant computations. It is particularly useful for optimization problems where the goal is to find the best solution among a set of feasible solutions.

Space and Time Complexity

Space Complexity: The space complexity of dynamic programming algorithms varies. It depends on whether a solution is stored in a 1D array, 2D array, or some other data structure. In many cases, it is O(n) or O(n²), where n is the size of the input.
Time Complexity: The time complexity of dynamic programming algorithms also varies. It depends on the number of sub-problems and the time required to solve each sub-problem. Common dynamic programming problems have time complexities ranging from O(n²) to O(n³), but optimisations may reduce it to O(n) or even O(1) in some cases.

↪ Interview Questions and Answers

1: Fibonacci Sequence

def fibonacci(n):
    if n <= 1:
        return n
    
    memo = {}
    if n not in memo:
        memo[n] = fibonacci(n-1) + fibonacci(n-2)
    
    return memo[n]

Time Complexity: O(n) with memorisation.

2: Longest Increasing Sub-sequence

def length_of_lis(nums):
    if not nums:
        return 0

    dp = [1] * len(nums)

    for i in range(1, len(nums)):
        for j in range(i):
            if nums[i] > nums[j]:
                dp[i] = max(dp[i], dp[j] + 1)

    return max(dp)

Time Complexity: O(n²)

3: Coin Change Problem

def coin_change(coins, amount):
    dp = [float('inf')] * (amount + 1)
    dp[0] = 0

    for coin in coins:
        for i in range(coin, amount + 1):
            dp[i] = min(dp[i], dp[i - coin] + 1)

    return dp[amount] if dp[amount] != float('inf') else -1

Time Complexity: O(n * m), where n is the amount and m is the number of coin types.

4: Maximum Sub-array Sum

def max_subarray_sum(nums):
    current_sum = max_sum = nums[0]
    
    for num in nums[1:]:
        current_sum = max(num, current_sum + num)
        max_sum = max(max_sum, current_sum)

    return max_sum

Time Complexity: O(n).

For all new updates: Subscribe

12. Disjoint Set (Union-Find)

A Disjoint Set, or Union-Find or Merge-Find, is a data structure that efficiently keeps track of a set partition into disjoint subsets. It provides operations to determine if two elements are in the same set and to unite two sets. The data structure is commonly used in algorithms for efficiently managing connected components in a graph, cycle detection, and dynamic connectivity.

Space and Time Complexity:

Space Complexity: The space complexity of a disjoint set is O(n), where n is the number of elements in the set. This accounts for storing the parent array and, optionally, the rank or size array.
Time Complexity: The time complexity of the basic operations in disjoint sets is typically close to O(1). Specifically: 1. Find (or find_set): O(log n) in the worst case due to path compression. 2. Union (or union_sets): O(1) on average if using union by rank or union by size.

↪ Interview Questions and Answers:

1: Implementing a Disjoint Set

class DisjointSet:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find_set(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find_set(self.parent[x])  # Path compression
        return self.parent[x]

    def union_sets(self, x, y):
        root_x = self.find_set(x)
        root_y = self.find_set(y)

        if root_x != root_y:
            if self.rank[root_x] < self.rank[root_y]:
                self.parent[root_x] = root_y
            elif self.rank[root_x] > self.rank[root_y]:
                self.parent[root_y] = root_x
            else:
                self.parent[root_x] = root_y
                self.rank[root_y] += 1

Time Complexity: O(log n) for find_set with path compression and approximately O(1) for union_sets with the union by rank.

2: Detecting a Cycle in an Undirected Graph

def has_cycle(graph):
    dsu = DisjointSet(len(graph))

    for edge in graph:
        node1, node2 = edge

        if dsu.find_set(node1) == dsu.find_set(node2):
            return True

        dsu.union_sets(node1, node2)

    return False

Time Complexity: O(V + E), where V is the number of vertices and E is the number of edges, assuming the disjoint set operations are approximately O(1).

3: Kruskal’s Minimum Spanning Tree Algorithm

def kruskal(graph):
    graph.sort(key=lambda edge: edge[2])  # Sort edges by weight
    dsu = DisjointSet(len(graph))
    minimum_spanning_tree = []

    for edge in graph:
        node1, node2, weight = edge

        if dsu.find_set(node1) != dsu.find_set(node2):
            minimum_spanning_tree.append(edge)
            dsu.union_sets(node1, node2)

    return minimum_spanning_tree

Time Complexity: O(E log E), where E is the number of edges, due to sorting the edges.

— — — — — — — — — —

If you like the article and would like to support me, make sure to:

👏 Clap for the story (1000 + 1 clap) to help this featured — I will add more to it
📰 For all new updates: Subscribe
🔔 Find Me: LinkedIn | Youtube | GitHub

— — — — — — — — — —

Data Science Coding Interview Guide

Table of Contents

1. Arrays

Space and Time Complexity

↪ Interview Questions and Answers

2. Linked Lists

↪ Interview Questions & Answers

3. Stacks

Space and Time Complexity

↪ Interview Questions and Answers

4. Queues

Space and Time Complexity:

↪ Interview Questions & Answers

5. Trees

Space and Time Complexity:

↪ Interview Questions and Answers

6. Graphs

Space and Time Complexity

↪ Interview Questions and Answers

7. Hashing

Space and Time Complexity

↪ Interview Questions and Answers

8. Heaps

Space and Time Complexity:

↪ Interview Questions and Answers:

9. Trie

Space and Time Complexity:

↪ Interview Questions and Answers

10. Sets and Maps

Space and Time Complexity:

↪ Interview Questions and Answers

11. Dynamic Programming

Space and Time Complexity

↪ Interview Questions and Answers

12. Disjoint Set (Union-Find)

Space and Time Complexity:

↪ Interview Questions and Answers:

Written by Manralai