Remove Duplicates
Remove duplicate values from a list.
Duplicate removal is essential for data cleaning and normalization. This exercise teaches you about hash-based lookups, order preservation, and the trade-offs between different data structures.
📚 Concepts & Theory
Removing duplicates from a list while preserving order is a common data cleaning task. Python offers multiple approaches, each with different trade-offs regarding performance and order preservation.
Understanding Uniqueness in Lists:
Method 1: Using set() (Fastest, Order Not Preserved)
numbers = [1, 2, 2, 3, 4, 3, 5]
unique = list(set(numbers)) # [1, 2, 3, 4, 5] - order may vary
Method 2: dict.fromkeys() (Order Preserved in Python 3.7+)
numbers = [1, 2, 2, 3, 4, 3, 5]
unique = list(dict.fromkeys(numbers)) # [1, 2, 3, 4, 5]
Method 3: Manual Loop with Tracking (Order Preserved)
numbers = [1, 2, 2, 3, 4, 3, 5]
unique = []
seen = set()
for num in numbers:
if num not in seen:
unique.append(num)
seen.add(num)
Method 4: List Comprehension with Tracking
seen = set()
unique = [x for x in numbers if not (x in seen or seen.add(x))]
Performance Comparison:
- set(): O(n) time, but doesn't preserve order in older Python
- dict.fromkeys(): O(n) time, preserves insertion order (Python 3.7+)
- Manual with set tracking: O(n) time, O(n) space, explicit control
- List comprehension: O(n) time, more compact but less readable
- Order doesn't matter →
list(set(data)) - Order matters →
list(dict.fromkeys(data)) - Need custom logic → manual loop with set tracking
🎯 Your Challenge
Write remove_duplicates that preserves order.
📝 Starter Code
def remove_duplicates(items):
pass
print(remove_duplicates([1, 2, 2, 3]))
- Use a set to track elements you've already seen
- Sets have O(1) lookup time for membership testing
- Build a new list containing only first occurrences
- Check each element against your seen set
- Add elements to both the result list and seen set
Solution
def remove_duplicates(items):
seen = set()
result = []
for item in items:
if item not in seen:
seen.add(item)
result.append(item)
return result
print(remove_duplicates([1, 2, 2, 3]))
Explanation
This solution uses a set to track elements we've already seen (O(1) lookup time), and builds a new list containing only unique elements in their original order. For each number, we check if it's in the seen set—if not, we add it to both the unique list and the seen set. Using a set for membership testing is crucial for O(n) overall complexity; using a list would make it O(n²). The final result preserves the order of first occurrence. Time complexity: O(n). Space complexity: O(n) for both unique list and seen set.
⚠️ Common Mistakes to Avoid
- Using list instead of set for tracking (O(n²) complexity)
- Not preserving original order when required
- Forgetting that set() doesn't preserve order in Python < 3.7
- Modifying list while iterating over it
- Not handling unhashable types (like lists of lists)