Measure of presortedness
In computer science, a measure of presortedness of a sequence represents how much work is required to sort the sequence. If the sequence is pre-sorted, sorting the sequence entirely require little work, hence it is expected to have a small measure of presortedness. In particular, the measure of a sorted sequence is 0.
Some sorting algorithms are more efficient on pre-sorted list, as they can use this pre-work into account to avoid duplicate work. The measure of presortedness allows to formalize the notion that an algorithm is optimal for a certain measure.
Definition
Let represents finite sequence of natural numbers. Let a function sending sequence of numbers to non-negative numbers. This function is said to be a measure of presortedness if it satisfies the following axioms:
- if is in ascending order, then , that is, the measure of presortedness of sorted sequences is 0.
- let and . If iff , then . That is, the measure do not depends on the actual numbers in the sequence, but only on their order compared to other elements of the sequences.
- If is a subsequence of then , that is, is monotonic for the subsequence relation
- let and . If for each , then , that is, the measure is subadditive
- for each and , . Intuitively, if you add an element , it can be in the wrong order compared to at most every other element of , and so the pre-sortedness can be increased by at most the length of .
Those axioms are similar to the axioms of measure theory. However, a measure of presortedness is not a measure as in measure theory, since it is not defined over a set but over sequence.
Optimal algorithm
It is well known that an optimal comparison sort requires to compare pairs of elements in a sequence. That is, the decision tree complexity is . However, when more informations is known about how much a sequence is presorted, more optimal bounds can be devised. This notion is now formalized.
Given a measure of pre-sortedness , a sequence , , let be the set of permutations of whose measure is at most . Then let the set of permutations whose measure is at most the measure of .
A sorting algorithm which takes as input and is said to be weakly -optimal if its time complexity is . That is, the number of operations is logarithmic in the number of possible permutations whose measure is at most . This is optimal in the sens that simply selecting an element in a set of size requires bits of information.
Similarly, a sorting algorithm which takes as input is -optimal if its time complexit is . When is computable in linear time, a weakly -optimal algorithm is automatically -optimal. However, if is more complex, it is possible that computing is actully more costly than sorting the sequence in the first place, and thus only an upper bound of can be used.
Similarly, a decision tree is weakly -optimal if its complexity is and it is -optimal if its complexity .
Existence of decision tree
For each measure , there exists a weakly -optimal decision tree.
If the decision tree complexity of is , there exists an -optimal decision tree. This decision tree's root compute . Once is computed in a node, the weakly -optimal decision tree for is appended at this node.
The existence of the tree does not implie that there exists (weakly) -optimal sorting algorithm, as computing the decision tree may be costly.
Examples
Here is a list of measures of presortedness and related (weakly) -optimal algorithms. Let be fixed for the remaining of the section.
Let us introduce a sorting algorithm that will be optimal for multiple measure. Let's call local insertion sort the algorithm that consists in iterating on the sequence and adding each element in a finger tree.
The zero measure
The constant function 0 is the most trivial measure of presortedness. Any sorting function running in -time, such as heapsort and mergesort is -optimal.
The indicator function
The indicator function of sorted sequence is a measure of sortedness. This function sends sorted sequences to 0 and all other sequences to 1. Any algorithm that check whether a list is sorted, and if it is not sorted apply a -optimal sorting algorithm is -optimal.
Number of runs
The number of runs, , is the number of increasing sequences in minus one. It is equivalently defined as the number of such that . is a measure of presortedness.
Natural merge sort, which consists in using existing runs and merging them, is a -optimal algorithm. The local insertion sort is also Runs-optimal.
Number of inversions
The number of inversion in , , is the number of pairs such that . It is a measure of presortedness and the local insertion sort is Inv-optimal.
Number of deletions
The mininimum number of elements that need to be removed from to obtain a sorted sequences. It is a measure of presortedness and the local insertion sort is Rem-optimal.
Radius
The radius of [1] is a measure of presortdness.
The following algorithm is Rad-optimal. It consists in iteratively halving the radius until it reaches 0. Halving the radius can be done by three merges of sub-sequences of length .
Minimum of two measures
Given two measures of presortdness and , the measure is also a measure of presortdness. Given two -optimal sorting algorithms , the algorithm consisting in executing and in parallel and returning the first output is a -optimal algorithm.
References
- ↑ Estivill-Castro, Vladimir; Wood, Derick (October 1989). "A new measure of presortdness". Information andComputation 83 (1): 111–119. doi:10.1016/0890-5401(89)90050-3.
- Mannila, H (1985). "Measures of Presortedness and Optimal Sorting Algorithms". IEEE Trans. Comput (C-34): 318–325.
Template:Data structures and algorithms