intermediate

Remove Duplicates

Q: Does set() preserve the original order?

In Python 3.7+, insertion order is preserved. In earlier versions, order is lost.

Q: What if my list contains unhashable items like dictionaries?

You'll need a different approach, possibly converting to tuples or using a custom comparison.

Q: Which method is fastest?

list(set(data)) is fastest but may not preserve order. dict.fromkeys() is fast and preserves order.

Q: How do I remove duplicates from a list of dictionaries?

You need to define what makes dictionaries 'equal' and use custom logic, possibly with tuple conversion.

Remove duplicate values from a list.

Duplicate removal is essential for data cleaning and normalization. This exercise teaches you about hash-based lookups, order preservation, and the trade-offs between different data structures.

📚 Concepts & Theory

Removing duplicates from a list while preserving order is a common data cleaning task. Python offers multiple approaches, each with different trade-offs regarding performance and order preservation.

Understanding Uniqueness in Lists:

Method 1: Using set() (Fastest, Order Not Preserved)

numbers = [1, 2, 2, 3, 4, 3, 5]

unique = list(set(numbers))  # [1, 2, 3, 4, 5] - order may vary

Method 2: dict.fromkeys() (Order Preserved in Python 3.7+)

numbers = [1, 2, 2, 3, 4, 3, 5]

unique = list(dict.fromkeys(numbers))  # [1, 2, 3, 4, 5]

Method 3: Manual Loop with Tracking (Order Preserved)

numbers = [1, 2, 2, 3, 4, 3, 5]

unique = []

seen = set()

for num in numbers:

    if num not in seen:

        unique.append(num)

        seen.add(num)

Method 4: List Comprehension with Tracking

seen = set()

unique = [x for x in numbers if not (x in seen or seen.add(x))]

Performance Comparison:

set(): O(n) time, but doesn't preserve order in older Python

dict.fromkeys(): O(n) time, preserves insertion order (Python 3.7+)

Manual with set tracking: O(n) time, O(n) space, explicit control

List comprehension: O(n) time, more compact but less readable

When to Use Each:

Order doesn't matter → list(set(data))

Order matters → list(dict.fromkeys(data))

Need custom logic → manual loop with set tracking

🎯 Your Challenge

Write remove_duplicates that preserves order.

📝 Starter Code

Python

def remove_duplicates(items):
    pass

print(remove_duplicates([1, 2, 2, 3]))

Use a set to track elements you've already seen
Sets have O(1) lookup time for membership testing
Build a new list containing only first occurrences
Check each element against your seen set
Add elements to both the result list and seen set

Solution

Python

def remove_duplicates(items):
    seen = set()
    result = []
    for item in items:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

print(remove_duplicates([1, 2, 2, 3]))

Explanation

This solution uses a set to track elements we've already seen (O(1) lookup time), and builds a new list containing only unique elements in their original order. For each number, we check if it's in the seen set—if not, we add it to both the unique list and the seen set. Using a set for membership testing is crucial for O(n) overall complexity; using a list would make it O(n²). The final result preserves the order of first occurrence. Time complexity: O(n). Space complexity: O(n) for both unique list and seen set.

⚠️ Common Mistakes to Avoid

Using list instead of set for tracking (O(n²) complexity)
Not preserving original order when required
Forgetting that set() doesn't preserve order in Python < 3.7
Modifying list while iterating over it
Not handling unhashable types (like lists of lists)

❓ Frequently Asked Questions

In Python 3.7+, insertion order is preserved. In earlier versions, order is lost.

You'll need a different approach, possibly converting to tuples or using a custom comparison.

list(set(data)) is fastest but may not preserve order. dict.fromkeys() is fast and preserves order.

You need to define what makes dictionaries 'equal' and use custom logic, possibly with tuple conversion.

📚 Concepts & Theory

🎯 Your Challenge

📝 Starter Code

Solution

Explanation

⚠️ Common Mistakes to Avoid

❓ Frequently Asked Questions

🔗 Related Exercises

List Comprehensions

Find Maximum Value

Reverse a List

Filter Even Numbers