Measure of presortedness

From HandWiki
Revision as of 21:22, 27 September 2021 by imported>Steve Marsio (fix)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In computer science, a measure of presortedness of a sequence represents how much work is required to sort the sequence. If the sequence is pre-sorted, sorting the sequence entirely require little work, hence it is expected to have a small measure of presortedness. In particular, the measure of a sorted sequence is 0.

Some sorting algorithms are more efficient on pre-sorted list, as they can use this pre-work into account to avoid duplicate work. The measure of presortedness allows to formalize the notion that an algorithm is optimal for a certain measure.

Definition

Let * represents finite sequence of natural numbers. Let m:* a function sending sequence of numbers to non-negative numbers. This function is said to be a measure of presortedness if it satisfies the following axioms:

  • if X=x1,,xn is in ascending order, then m(X)=0, that is, the measure of presortedness of sorted sequences is 0.
  • let X=x1,,xn and Y=y1,,yn. If xi<xj iff yi<yj, then m(X)=m(Y). That is, the measure do not depends on the actual numbers in the sequence, but only on their order compared to other elements of the sequences.
  • If Y is a subsequence of X then m(Y)m(X), that is, m is monotonic for the subsequence relation
  • let X=x1,,xn and Y=y1,,yn. If for each i,j, xi<yj then m(XY)m(X)+m(Y), that is, the measure is subadditive
  • for each x and X*, m(xX)|X|+m(X). Intuitively, if you add an element x, it can be in the wrong order compared to at most every other element of X, and so the pre-sortedness can be increased by at most the length of X.

Those axioms are similar to the axioms of measure theory. However, a measure of presortedness is not a measure as in measure theory, since it is not defined over a set but over sequence.

Optimal algorithm

It is well known that an optimal comparison sort requires to compare O(nlogn) pairs of elements in a sequence. That is, the decision tree complexity is O(nlogn). However, when more informations is known about how much a sequence is presorted, more optimal bounds can be devised. This notion is now formalized.

Given a measure of pre-sortedness m, a sequence X=x1,,xn, zm(X), let below(z,X,m) be the set of permutations of X whose measure is at most Z. Then let below(X,m)=below(m(X),X,m) the set of permutations whose measure is at most the measure of X.

A sorting algorithm S which takes as input X and z is said to be weakly m-optimal if its time complexity is O(max(n,log|below(z,X,m)|)). That is, the number of operations is logarithmic in the number of possible permutations whose measure is at most z. This is optimal in the sens that simply selecting an element in a set of size N requires log(N) bits of information.

Similarly, a sorting algorithm which takes as input X is m-optimal if its time complexit is O(max(n,log|below(z,X,m)|)). When m(X) is computable in linear time, a weakly m-optimal algorithm is automatically m-optimal. However, if m is more complex, it is possible that computing m(X) is actully more costly than sorting the sequence in the first place, and thus only an upper bound z of m(X) can be used.

Similarly, a decision tree is weakly m-optimal if its complexity is O(max(n,log|below(z,X,m)|)) and it is m-optimal if its complexity O(max(n,log|below(X,m)|)).

Existence of decision tree

For each measure m, there exists a weakly m-optimal decision tree.

If the decision tree complexity of m is O(max(|X|,log|below(X,m)|)), there exists an m-optimal decision tree. This decision tree's root compute m(X). Once m(X) is computed in a node, the weakly m-optimal decision tree for m(X) is appended at this node.

The existence of the tree does not implie that there exists (weakly) m-optimal sorting algorithm, as computing the decision tree may be costly.

Examples

Here is a list of measures of presortedness m and related (weakly) m-optimal algorithms. Let X=x1,,xn be fixed for the remaining of the section.

Let us introduce a sorting algorithm that will be optimal for multiple measure. Let's call local insertion sort the algorithm that consists in iterating on the sequence and adding each element in a finger tree.

The zero measure

The constant function 0 is the most trivial measure of presortedness. Any sorting function running in O(nlogn)-time, such as heapsort and mergesort is 0-optimal.

The indicator function

The indicator function m01 of sorted sequence is a measure of sortedness. This function sends sorted sequences to 0 and all other sequences to 1. Any algorithm that check whether a list is sorted, and if it is not sorted apply a 0-optimal sorting algorithm is m01-optimal.

Number of runs

The number of runs, Runs(X), is the number of increasing sequences in X minus one. It is equivalently defined as the number of i such that xi+1<xi. is a measure of presortedness.

Natural merge sort, which consists in using existing runs and merging them, is a Runs-optimal algorithm. The local insertion sort is also Runs-optimal.

Number of inversions

The number of inversion in X, Inv(X), is the number of pairs i<j such that xi>xj. It is a measure of presortedness and the local insertion sort is Inv-optimal.

Number of deletions

The mininimum number Rem(X) of elements that need to be removed from X to obtain a sorted sequences. It is a measure of presortedness and the local insertion sort is Rem-optimal.

Radius

The radius Rad(X) of X[1] is a measure of presortdness.

The following algorithm is Rad-optimal. It consists in iteratively halving the radius until it reaches 0. Halving the radius can be done by three merges of sub-sequences of length Rad(X).

Minimum of two measures

Given two measures of presortdness m1 and m2, the measure Xmin(m1(X),m2(X)) is also a measure of presortdness. Given two mi-optimal sorting algorithms Si, the algorithm consisting in executing S1 and S2 in parallel and returning the first output is a min(m1,m2)-optimal algorithm.

References

  1. Estivill-Castro, Vladimir; Wood, Derick (October 1989). "A new measure of presortdness". Information andComputation 83 (1): 111–119. doi:10.1016/0890-5401(89)90050-3. 
  • Mannila, H (1985). "Measures of Presortedness and Optimal Sorting Algorithms". IEEE Trans. Comput (C-34): 318–325. 

Template:Data structures and algorithms