What is a hash map (dictionary) and why is it useful?
A hash map stores key-value pairs for fast lookup, insertion, and deletion with an average O(1) time.
What is the difference between an array and a linked list?
arrays have a fixed size, linked lists have dynamic size
What is the difference between a for loop and a while loop?
for loops iterate a fixed number of times or over a sequence; while loops continue until a condition is false
What’s the time complexity for searching for an element in an unsorted array?
o(n), since every element may need to be checked
What’s the difference between mutable and immutable data types in python?
mutable types (e.g., lists, dicts) can be changed in place; immutable types (e.g., strings, tuples) cannot
What is sampling?
selecting a subset of data from a population to make inferences about the whole population
What is the central limit theorem (CLT) ?
The CLT states that the sampling distribution of the sample mean approaches a normal distribution as sample size increases, regardless of population distribution
What is hypothesis testing?
a method to test a claim about a population using sample data. Involves a null hypothesis (H_0), alternate hypothesis (H_1), and p-value.
What is an A/B test?
A controlled experiment comparing two versions (A and B) to determine which performs better on a specific metric.
What are Type I and type II errors?
Type I (false positive): reject true H_0
Type II (false negative): fail to reject false H_0
What is p-value?
The probability of observing data as extreme as, or more extreme than, your sample assuming H_0 is true.
What is the difference between descriptive and inferential statistics?
Descriptive summarizes data (mean, median, std). Inferential draws conclusions from the data (hypothesis tests, confidence intervals).
What are overfitting and underfitting in ML?
Overfitting: model fits training data too closely, leading to poor generalization.
Underfitting: model too simple to capture patterns.
What is bias-variance tradeoff?
Increasing model complexity reduces bias but increases variance; goal is to find balance for best generalization.
What are measures of central tendency and variability?
Central tendency: mean, median, mode
Variability: range, variance, standard deviation
What is data normalization?
Process of organizing data to reduce redundancy and improve integrity, often by splitting data into related tables with foreign keys.
What are the normal forms in database normalization?
1NF - atomic columns
2NF - no partial dependency
3NF - no transitive dependency
What is a primary key vs. a foreign key?
A primary key uniquely identifies a row. Foreign key references a primary key in another table to link related data.
What is a taxonomy in data modeling?
A hierarchical classification system (e.g., product categories -> subcategories -> items)
What does “combining datasets” mean?
Merging or joining data from multiple sources to create a unified dataset. Common operations: inner, outer, left, right joins.
What is data optimization?
Process of improving data storage, retrieval, and query performance using indexing, denormalization, caching or partitioning.
How can you iterate over both index and value in Python?
Use enumerate(), e.g., for i, val in enumerate(my_list):
How can you count occurrences efficiently in Python?
collections.Counter() - returns a dict-like object mapping elements to counts
What is the difference between pass, continue, and break in loops?
pass: does nothing, placeholder
continue: skip to next iteration
break: exit loop entirely