Sorting Algorithms Biostatistics 615/815 Lecture 6 Last Lecture ? z Recursive Functions ? Natural expression for many algorithms z Dynamic Programming ? Automatic strategy for generating efficient versions of recursive algorithms Today ? z Properties of Sorting Algorithms z Elementary Sorting Algorithms ? Selection Sort ? Insertion Sort ? Bubble Sort Homework 1 z Limits of floating point z Important concepts ? ? Precision is limited and relative ? Errors can accumulate and lead to error ? Mathematical soundness may not be enough Floating Point Precision z Smallest value that can be added to 1 ? 2 -52 or 2.2 * 10 -16 ? 2 -23 or 1.2 * 10 -7 z Smallest value that can be subtracted from 1 ? 2 -53 or 1.1 * 10 -16 ? 2 -24 or 6.0 * 10 -8 z Smallest value that is distinct from zero ? 2 -1074 or 4.9 * 10 -324 ? 2 -149 or 1.4 * 10 -45 Calculating Powers of ? z Two possibilities 12 1 ?? ? ?= = nnn nn ??? ??? Results Using Product Formula 020406080 0. 0 0 . 2 0. 4 0 . 6 0. 8 1 . 0 Exponent C a l c ul a t i o n U s i ng P r od uc t Results Using Difference Formula 0 20406080 - 0 .5 0 . 0 0 .5 1 . 0 Exponent C a l c u l at i on U s i ng D i f f e r ence Relative Error on Log Scale 0 20406080 -1 5 - 1 0 -5 0 5 1 0 1 5 log(abs(product - difference)/product) Exponent L o g a ri t h m o f R e l a ti v e E rro r Applications of Sorting z Facilitate searching ? Building indices z Identify quantiles of a distribution z Identify unique values z Browsing data Elementary Methods z Suitable for ? Small datasets ? Specialized applications z Prelude to more complex methods ? Illustrate ideas ? Introduce terminology ? Sometimes useful complement ... but beware! z Elementary sorts are very inefficient ? Typically, time requirements are O(N 2 ) z Probably, most common inefficiency in scientific computing ? Make programs ?break? with large datasets Aim z Rearrange a set of keys ? Using some predefined order ? Integers ? Doubles ? Indices for records in a database z Keys stored as array in memory ? More complex sorts when we can only load part of the data Basic Building Blocks z An type for each element #define Item int z Compare two elements z Exchange two elements z Compare and exchange two elements Comparing Two Elements z Define a function to compare two elements bool isLess(Item a, Item b) { return a < b; } z Alternative is to use macros, but I don?t recommend it #define isLess(a,b)((a)<(b)) Exchanging Two Elements z The best way is to use a C++ function void Exchange(Item & a, Item & b) { Item temp = a; a = b; b = temp; } z But using a macro is still an alternative #define Exchange(a,b)\ {\ Item tmp = (a); \ (a) = (b); \ (b) = tmp; \ } Comparing And Exchange z Using C++ function Item CompExch(Item & a, Item & b) { if (isLess(b, a)) Exchange(a, b); } z Using a macro #define CompExch(a,b) \ if (isLess((b),(a))) Exchange((a),(b)); A Simple Sort z Gradually sort the array by: z Sorting the first 2 elements z Sorting the first 3 elements z ? z Sort all N elements A Simple Sort Routine void sort(Item a[], int start, int stop) { int i, j; for (i = start + 1; i <= stop; i++) for (j = i; j > start; j--) CompExch(a[j-1], a[j]); } Properties of this Simple Sort z Non-adaptive ? Comparisons do not depend on data z Stable ? Preserves relative order for duplicates z Requires O(N 2 ) running time Sorts We Will Examine Today z Selection Sort z Insertion Sort z Bubble Sort Recipe: Selection Sort z Find the smallest element ? Place it at beginning of array z Find the next smallest element ? Place it in the second slot z ? C Code: Selection Sort void sort(Item a[], int start, int stop) { int i, j; for (i = start; i < stop; i++) { int min = i; for (j = i + 1; j < stop; j++) if (isLess(a[j], a[min]) min = j; Exchange(a[i], a[min]); } } Selection Sort Notice: Each exchange moves element into final position. Right portion of array looks random. Properties of Selection Sort z Running time does not depend on input ? Random data ? Sorted data ? Reverse ordered data? z Performs exactly N-1 exchanges z Most time spent on comparisons Recipe: Insertion Sort z The ?Simple Sort? we first considered z Consider one element at a time ? Place it among previously considered elements ? Must move several elements to ?make room? z Can be improved, by ?adapting to data? Improvement I z Decide when further comparisons are futile z Stop comparisons when we reach a smaller element z What speed improvement do you expect? Insertion Sort (I) void sort(Item a[], int start, int stop) { int i, j; for (i = start + 1; i <= stop; i++) for (j = i; j > start; j--) if (isLess(a[j], a[j-1]) Exchange(a[j-1], a[j]); else break; } Improvement II z Notice that inner loop continues until: ? First element reached, or ? Smaller element reached z If smallest element is at the beginning? ? Only one condition to check Insertion Sort (II) void sort(Item a[], int start, int stop) { int i, j; // This ensures that smallest element is at the beginning for (i = stop; i > start; i--) CompExch(a[i-1], a[i]); // Now, we don?t need to check that j > start for (i = start + 2; i <= stop; i++) { int j = i; while (isLess(a[j], a[j-1])) { Exchange(a[j], a[j-1]); j--; } } } Improvement III z The basic approach requires many exchanges involving each element z Instead of carrying out many exchanges ? z Find out position for the new element and shift elements to the right to make room Insertion Sort (III) void sort(Item a[], int start, int stop) { int i, j; for (i = stop; i > start; i--) CompExch(a[i-1], a[i]); for (i = start + 2; i <= stop; i++) { int j = i; Item val = a[j]; // Store the value of new element while (isLess(val, a[j-1])) // Proceed through larger elements { a[j] = a[j-1]; // Shifting things to the right ? j--; } a[j] = val; // Finally, insert new element in place } } Insertion Sort Notice: Elements in left portion of array can still change position. Right remains untouched. Properties of Insertion Sort z Adaptive version running time depends on input ? About 2x faster on random data ? Improvement even greater on sorted data ? Similar speed on reverse ordered data z Stable sort Recipe: Bubble Sort z Pass through the array ? Exchange elements that are out of order z Repeat until done? z Very ?popular? ? Very inefficient too! C Code: Bubble Sort void sort(Item a[], int start, int stop) { int i, j; for (i = start; i <= stop; i++) for (j = stop; j > i; j--) CompExch(a[j-1], a[j]); } Bubble Sort Notice: Each pass moves one element into position. Right portion of array is partially sorted Shaker Sort Notice: Things improve slightly if bubble sort alternates directions? Notes on Bubble Sort z Similar to non-adaptive Insertion Sort ? Moves through unsorted portion of array z Similar to Selection Sort ? Does more exchanges per element z Stop when no exchanges performed ? Adaptive, but not as effective as Insertion Sort Selection Insertion Bubble Performance Characteristics z Selection, Insertion, Bubble Sorts z All quadratic ? Running time differs by a constant z Which sorts do you think are stable? Selection Sort z Exchanges ? N ? 1 z Comparisons ? N * (N ? 1) / 2 z Requires about N 2 / 2 operations z Ignoring updates to min variable Adaptive Insertion Sort z Half - Exchanges ? About N 2 / 4 on average (random data) ? N * (N ? 1) / 2 (worst case) z Comparisons ? About N 2 / 4 on average (random data) ? N * (N ? 1) / 2 (worst case) z Requires about N 2 / 4 operations z Requires nearly linear time on sorted data Bubble Sort z Exchanges ? N * (N ? 1) / 2 z Comparisons ? N * (N ? 1) / 2 z Average case and worst case very similar, even for adaptive method Empirical Comparison 13818262119854000 34451529212000 8114751000 ShakerBubble Insertion (adaptive) InsertionSelectionN Sorting Strategy (Running times in seconds) Reading z Sedgewick, Chapter 6 Goncalo Abecasis Microsoft PowerPoint - 615.06 -- Basic Sorting