Mergesort Algorithm
Welcome to Day 27 of our 60 Days of Coding Algorithm Challenge! Today, we’ll dive deep into the Mergesort algorithm, another efficient and widely used sorting algorithm based on the divide-and-conquer paradigm.
Introduction to Mergesort
Mergesort is a comparison-based sorting algorithm that follows the divide-and-conquer approach. It divides the input array into two halves, recursively sorts them, and then merges the two sorted halves.
How Mergesort Works
- Divide the unsorted list into n sublists, each containing one element (a list of one element is considered sorted).
- Repeatedly merge sublists to produce new sorted sublists until there is only one sublist remaining. This will be the sorted list.
Basic Implementation of Mergesort
Here’s a basic implementation of the Mergesort algorithm:
1def merge_sort(arr):
2 if len(arr) <= 1:
3 return arr
4
5 mid = len(arr) // 2
6 left = merge_sort(arr[:mid])
7 right = merge_sort(arr[mid:])
8
9 return merge(left, right)
10
11def merge(left, right):
12 result = []
13 i, j = 0, 0
14
15 while i < len(left) and j < len( …
Mergesort Algorithm
Welcome to Day 27 of our 60 Days of Coding Algorithm Challenge! Today, we’ll dive deep into the Mergesort algorithm, another efficient and widely used sorting algorithm based on the divide-and-conquer paradigm.
Introduction to Mergesort
Mergesort is a comparison-based sorting algorithm that follows the divide-and-conquer approach. It divides the input array into two halves, recursively sorts them, and then merges the two sorted halves.
How Mergesort Works
- Divide the unsorted list into n sublists, each containing one element (a list of one element is considered sorted).
- Repeatedly merge sublists to produce new sorted sublists until there is only one sublist remaining. This will be the sorted list.
Basic Implementation of Mergesort
Here’s a basic implementation of the Mergesort algorithm:
1def merge_sort(arr):
2 if len(arr) <= 1:
3 return arr
4
5 mid = len(arr) // 2
6 left = merge_sort(arr[:mid])
7 right = merge_sort(arr[mid:])
8
9 return merge(left, right)
10
11def merge(left, right):
12 result = []
13 i, j = 0, 0
14
15 while i < len(left) and j < len(right):
16 if left[i] <= right[j]:
17 result.append(left[i])
18 i += 1
19 else:
20 result.append(right[j])
21 j += 1
22
23 result.extend(left[i:])
24 result.extend(right[j:])
25 return result
26
27# Example usage
28arr = [38, 27, 43, 3, 9, 82, 10]
29sorted_arr = merge_sort(arr)
30print(f"Sorted array: {sorted_arr}")
In-place Mergesort Implementation
While the above implementation is easy to understand, it uses additional space for creating new lists. Here’s an in-place implementation:
1def merge_sort_inplace(arr, left, right):
2 if left < right:
3 mid = (left + right) // 2
4 merge_sort_inplace(arr, left, mid)
5 merge_sort_inplace(arr, mid + 1, right)
6 merge_inplace(arr, left, mid, right)
7
8def merge_inplace(arr, left, mid, right):
9 left_arr = arr[left:mid+1]
10 right_arr = arr[mid+1:right+1]
11
12 i, j, k = 0, 0, left
13
14 while i < len(left_arr) and j < len(right_arr):
15 if left_arr[i] <= right_arr[j]:
16 arr[k] = left_arr[i]
17 i += 1
18 else:
19 arr[k] = right_arr[j]
20 j += 1
21 k += 1
22
23 while i < len(left_arr):
24 arr[k] = left_arr[i]
25 i += 1
26 k += 1
27
28 while j < len(right_arr):
29 arr[k] = right_arr[j]
30 j += 1
31 k += 1
32
33# Example usage
34arr = [38, 27, 43, 3, 9, 82, 10]
35merge_sort_inplace(arr, 0, len(arr) - 1)
36print(f"Sorted array: {arr}")
Time Complexity
- Best Case: O(n log n)
- Average Case: O(n log n)
- Worst Case: O(n log n)
Mergesort has a consistent time complexity of O(n log n) for all cases, which is one of its main advantages.
Space Complexity
- O(n) for the basic implementation
- O(n) for the in-place implementation (due to the temporary arrays in the merge step)
Advantages of Mergesort
- Stable: Preserves the relative order of equal elements
- Guaranteed O(n log n) time complexity for all cases
- Well-suited for external sorting (sorting large datasets that don’t fit in memory)
Disadvantages of Mergesort
- Requires extra space O(n) for merging
- For small datasets, it may be slower than simpler algorithms like insertion sort
Optimizations for Mergesort
1. Using Insertion Sort for Small Subarrays
For small subarrays, it’s more efficient to use insertion sort:
1def insertion_sort(arr, left, right):
2 for i in range(left + 1, right + 1):
3 key = arr[i]
4 j = i - 1
5 while j >= left and arr[j] > key:
6 arr[j + 1] = arr[j]
7 j -= 1
8 arr[j + 1] = key
9
10def merge_sort_optimized(arr, left, right):
11 if right - left <= 10: # Use insertion sort for small subarrays
12 insertion_sort(arr, left, right)
13 elif left < right:
14 mid = (left + right) // 2
15 merge_sort_optimized(arr, left, mid)
16 merge_sort_optimized(arr, mid + 1, right)
17 merge_inplace(arr, left, mid, right)
18
19# Example usage
20arr = [38, 27, 43, 3, 9, 82, 10]
21merge_sort_optimized(arr, 0, len(arr) - 1)
22print(f"Sorted array: {arr}")
2. Avoiding Unnecessary Copying
We can avoid copying elements in the merge step if the last element of the left subarray is smaller than or equal to the first element of the right subarray:
1def merge_optimized(arr, left, mid, right):
2 if arr[mid] <= arr[mid + 1]:
3 return # Arrays are already sorted
4
5 left_arr = arr[left:mid+1]
6 right_arr = arr[mid+1:right+1]
7
8 i, j, k = 0, 0, left
9
10 while i < len(left_arr) and j < len(right_arr):
11 if left_arr[i] <= right_arr[j]:
12 arr[k] = left_arr[i]
13 i += 1
14 else:
15 arr[k] = right_arr[j]
16 j += 1
17 k += 1
18
19 while i < len(left_arr):
20 arr[k] = left_arr[i]
21 i += 1
22 k += 1
Applications of Mergesort
- External sorting: Sorting large files that don’t fit in memory
- Sorting linked lists: Mergesort is particularly efficient for linked lists
- Counting inversions in an array
- Used in various external sorting algorithms in database systems
Comparison with Quicksort
Aspect | Mergesort | Quicksort |
---|
Time Complexity | Always O(n log n) | Average: O(n log n), Worst: O(n^2) |
Space Complexity | O(n) | O(log n) |
Stability | Stable | Not stable |
In-place | Not in-place | In-place |
Adaptivity | Not adaptive | Can be made adaptive |
Locality of reference | Good | Excellent |
Exercise
- Implement a bottom-up (iterative) version of Mergesort.
- Modify the Mergesort algorithm to count the number of inversions in an array.
- Implement Mergesort for a linked list.
Summary
Today, we explored the Mergesort algorithm, an efficient and stable sorting algorithm based on the divide-and-conquer paradigm. We implemented basic and optimized versions of Mergesort, discussed its time and space complexity, and looked at various optimization techniques.
Mergesort’s consistent performance and stability make it a crucial algorithm to understand for any programmer, especially when dealing with large datasets or when stability is important.
Tomorrow, we’ll dive into heap sort, another efficient comparison-based sorting algorithm. We’ll explore its unique characteristics and compare it with Quicksort and Mergesort. Stay tuned!