Sorting Algorithms Biostatistics 615/815 Lecture 6 Last Lecture … z Recursive Functions • Natural expression for many algorithms z Dynamic Programming • Automatic strategy for generating efficient versions of recursive algorithms Today … z Properties of Sorting Algorithms z Elementary Sorting Algorithms • Selection Sort • Insertion Sort • Bubble Sort Homework 1 z Limits of floating point z Important concepts … • Precision is limited and relative • Errors can accumulate and lead to error • Mathematical soundness may not be enough Floating Point Precision z Smallest value that can be added to 1 • 2 -52 or 2.2 * 10 -16 • 2 -23 or 1.2 * 10 -7 z Smallest value that can be subtracted from 1 • 2 -53 or 1.1 * 10 -16 • 2 -24 or 6.0 * 10 -8 z Smallest value that is distinct from zero • 2 -1074 or 4.9 * 10 -324 • 2 -149 or 1.4 * 10 -45 Calculating Powers of Φ z Two possibilities 12 1 −− − −= = nnn nn φφφ φφφ Results Using Product Formula 020406080 0. 0 0 . 2 0. 4 0 . 6 0. 8 1 . 0 Exponent C a l c ul a t i o n U s i ng P r od uc t Results Using Difference Formula 0 20406080 - 0 .5 0 . 0 0 .5 1 . 0 Exponent C a l c u l at i on U s i ng D i f f e r ence Relative Error on Log Scale 0 20406080 -1 5 - 1 0 -5 0 5 1 0 1 5 log(abs(product - difference)/product) Exponent L o g a ri t h m o f R e l a ti v e E rro r Applications of Sorting z Facilitate searching • Building indices z Identify quantiles of a distribution z Identify unique values z Browsing data Elementary Methods z Suitable for • Small datasets • Specialized applications z Prelude to more complex methods • Illustrate ideas • Introduce terminology • Sometimes useful complement ... but beware! z Elementary sorts are very inefficient • Typically, time requirements are O(N 2 ) z Probably, most common inefficiency in scientific computing • Make programs “break” with large datasets Aim z Rearrange a set of keys • Using some predefined order • Integers • Doubles • Indices for records in a database z Keys stored as array in memory • More complex sorts when we can only load part of the data Basic Building Blocks z An type for each element #define Item int z Compare two elements z Exchange two elements z Compare and exchange two elements Comparing Two Elements z Define a function to compare two elements bool isLess(Item a, Item b) { return a < b; } z Alternative is to use macros, but I don’t recommend it #define isLess(a,b)((a)<(b)) Exchanging Two Elements z The best way is to use a C++ function void Exchange(Item & a, Item & b) { Item temp = a; a = b; b = temp; } z But using a macro is still an alternative #define Exchange(a,b)\ {\ Item tmp = (a); \ (a) = (b); \ (b) = tmp; \ } Comparing And Exchange z Using C++ function Item CompExch(Item & a, Item & b) { if (isLess(b, a)) Exchange(a, b); } z Using a macro #define CompExch(a,b) \ if (isLess((b),(a))) Exchange((a),(b)); A Simple Sort z Gradually sort the array by: z Sorting the first 2 elements z Sorting the first 3 elements z … z Sort all N elements A Simple Sort Routine void sort(Item a[], int start, int stop) { int i, j; for (i = start + 1; i <= stop; i++) for (j = i; j > start; j--) CompExch(a[j-1], a[j]); } Properties of this Simple Sort z Non-adaptive • Comparisons do not depend on data z Stable • Preserves relative order for duplicates z Requires O(N 2 ) running time Sorts We Will Examine Today z Selection Sort z Insertion Sort z Bubble Sort Recipe: Selection Sort z Find the smallest element • Place it at beginning of array z Find the next smallest element • Place it in the second slot z … C Code: Selection Sort void sort(Item a[], int start, int stop) { int i, j; for (i = start; i < stop; i++) { int min = i; for (j = i + 1; j < stop; j++) if (isLess(a[j], a[min]) min = j; Exchange(a[i], a[min]); } } Selection Sort Notice: Each exchange moves element into final position. Right portion of array looks random. Properties of Selection Sort z Running time does not depend on input • Random data • Sorted data • Reverse ordered data… z Performs exactly N-1 exchanges z Most time spent on comparisons Recipe: Insertion Sort z The “Simple Sort” we first considered z Consider one element at a time • Place it among previously considered elements • Must move several elements to “make room” z Can be improved, by “adapting to data” Improvement I z Decide when further comparisons are futile z Stop comparisons when we reach a smaller element z What speed improvement do you expect? Insertion Sort (I) void sort(Item a[], int start, int stop) { int i, j; for (i = start + 1; i <= stop; i++) for (j = i; j > start; j--) if (isLess(a[j], a[j-1]) Exchange(a[j-1], a[j]); else break; } Improvement II z Notice that inner loop continues until: • First element reached, or • Smaller element reached z If smallest element is at the beginning… • Only one condition to check Insertion Sort (II) void sort(Item a[], int start, int stop) { int i, j; // This ensures that smallest element is at the beginning for (i = stop; i > start; i--) CompExch(a[i-1], a[i]); // Now, we don’t need to check that j > start for (i = start + 2; i <= stop; i++) { int j = i; while (isLess(a[j], a[j-1])) { Exchange(a[j], a[j-1]); j--; } } } Improvement III z The basic approach requires many exchanges involving each element z Instead of carrying out many exchanges … z Find out position for the new element and shift elements to the right to make room Insertion Sort (III) void sort(Item a[], int start, int stop) { int i, j; for (i = stop; i > start; i--) CompExch(a[i-1], a[i]); for (i = start + 2; i <= stop; i++) { int j = i; Item val = a[j]; // Store the value of new element while (isLess(val, a[j-1])) // Proceed through larger elements { a[j] = a[j-1]; // Shifting things to the right … j--; } a[j] = val; // Finally, insert new element in place } } Insertion Sort Notice: Elements in left portion of array can still change position. Right remains untouched. Properties of Insertion Sort z Adaptive version running time depends on input • About 2x faster on random data • Improvement even greater on sorted data • Similar speed on reverse ordered data z Stable sort Recipe: Bubble Sort z Pass through the array • Exchange elements that are out of order z Repeat until done… z Very “popular” • Very inefficient too! C Code: Bubble Sort void sort(Item a[], int start, int stop) { int i, j; for (i = start; i <= stop; i++) for (j = stop; j > i; j--) CompExch(a[j-1], a[j]); } Bubble Sort Notice: Each pass moves one element into position. Right portion of array is partially sorted Shaker Sort Notice: Things improve slightly if bubble sort alternates directions… Notes on Bubble Sort z Similar to non-adaptive Insertion Sort • Moves through unsorted portion of array z Similar to Selection Sort • Does more exchanges per element z Stop when no exchanges performed • Adaptive, but not as effective as Insertion Sort Selection Insertion Bubble Performance Characteristics z Selection, Insertion, Bubble Sorts z All quadratic • Running time differs by a constant z Which sorts do you think are stable? Selection Sort z Exchanges • N – 1 z Comparisons • N * (N – 1) / 2 z Requires about N 2 / 2 operations z Ignoring updates to min variable Adaptive Insertion Sort z Half - Exchanges • About N 2 / 4 on average (random data) • N * (N – 1) / 2 (worst case) z Comparisons • About N 2 / 4 on average (random data) • N * (N – 1) / 2 (worst case) z Requires about N 2 / 4 operations z Requires nearly linear time on sorted data Bubble Sort z Exchanges • N * (N – 1) / 2 z Comparisons • N * (N – 1) / 2 z Average case and worst case very similar, even for adaptive method Empirical Comparison 13818262119854000 34451529212000 8114751000 ShakerBubble Insertion (adaptive) InsertionSelectionN Sorting Strategy (Running times in seconds) Reading z Sedgewick, Chapter 6 Goncalo Abecasis Microsoft PowerPoint - 615.06 -- Basic Sorting