Small-bias sample space

From HandWiki

In theoretical computer science, a small-bias sample space (also known as ϵ-biased sample space, ϵ-biased generator, or small-bias probability space) is a probability distribution that fools parity functions. In other words, no parity function can distinguish between a small-bias sample space and the uniform distribution with high probability, and hence, small-bias sample spaces naturally give rise to pseudorandom generators for parity functions.

The main useful property of small-bias sample spaces is that they need far fewer truly random bits than the uniform distribution to fool parities. Efficient constructions of small-bias sample spaces have found many applications in computer science, some of which are derandomization, error-correcting codes, and probabilistically checkable proofs. The connection with error-correcting codes is in fact very strong since ϵ-biased sample spaces are equivalent to ϵ-balanced error-correcting codes.

Definition

Bias

Let X be a probability distribution over {0,1}n. The bias of X with respect to a set of indices I{1,,n} is defined as[1]

biasI(X)=|PrxX(iIxi=0)PrxX(iIxi=1)|=|2PrxX(iIxi=0)1|,

where the sum is taken over 𝔽2, the finite field with two elements. In other words, the sum iIxi equals 0 if the number of ones in the sample x{0,1}n at the positions defined by I is even, and otherwise, the sum equals 1. For I=, the empty sum is defined to be zero, and hence bias(X)=1.

ϵ-biased sample space

A probability distribution X over {0,1}n is called an ϵ-biased sample space if biasI(X)ϵ holds for all non-empty subsets I{1,2,,n}.

ϵ-biased set

An ϵ-biased sample space X that is generated by picking a uniform element from a multiset X{0,1}n is called ϵ-biased set. The size s of an ϵ-biased set X is the size of the multiset that generates the sample space.

ϵ-biased generator

An ϵ-biased generator G:{0,1}{0,1}n is a function that maps strings of length to strings of length n such that the multiset XG={G(y)|y{0,1}} is an ϵ-biased set. The seed length of the generator is the number and is related to the size of the ϵ-biased set XG via the equation s=2.

Connection with epsilon-balanced error-correcting codes

There is a close connection between ϵ-biased sets and ϵ-balanced linear error-correcting codes. A linear code C:{0,1}n{0,1}s of message length n and block length s is ϵ-balanced if the Hamming weight of every nonzero codeword C(x) is between (12ϵ)s and (12+ϵ)s. Since C is a linear code, its generator matrix is an (n×s)-matrix A over 𝔽2 with C(x)=xA.

Then it holds that a multiset X{0,1}n is ϵ-biased if and only if the linear code CX, whose columns are exactly elements of X, is ϵ-balanced.[2]

Constructions of small epsilon-biased sets

Usually the goal is to find ϵ-biased sets that have a small size s relative to the parameters n and ϵ. This is because a smaller size s means that the amount of randomness needed to pick a random element from the set is smaller, and so the set can be used to fool parities using few random bits.

Theoretical bounds

The probabilistic method gives a non-explicit construction that achieves size s=O(n/ϵ2).[2] The construction is non-explicit in the sense that finding the ϵ-biased set requires a lot of true randomness, which does not help towards the goal of reducing the overall randomness. However, this non-explicit construction is useful because it shows that these efficient codes exist. On the other hand, the best known lower bound for the size of ϵ-biased sets is s=Ω(n/(ϵ2log(1/ϵ)), that is, in order for a set to be ϵ-biased, it must be at least that big.[2]

Explicit constructions

There are many explicit, i.e., deterministic constructions of ϵ-biased sets with various parameter settings:

  • (Naor Naor) achieve s=npoly(ϵ). The construction makes use of Justesen codes (which is a concatenation of Reed–Solomon codes with the Wozencraft ensemble) as well as expander walk sampling.
  • (Alon Goldreich) achieve s=O(nϵlog(n/ϵ))2. One of their constructions is the concatenation of Reed–Solomon codes with the Hadamard code; this concatenation turns out to be an ϵ-balanced code, which gives rise to an ϵ-biased sample space via the connection mentioned above.
  • Concatenating Algebraic geometric codes with the Hadamard code gives an ϵ-balanced code with s=O(nϵ3log(1/ϵ)).[2]
  • (Ben-Aroya Ta-Shma) achieves s=O(nϵ2log(1/ϵ))5/4.
  • (Ta-Shma 2017) achieves s=O(nϵ2+o(1)) which is almost optimal because of the lower bound.

These bounds are mutually incomparable. In particular, none of these constructions yields the smallest ϵ-biased sets for all settings of ϵ and n.

Application: almost k-wise independence

An important application of small-bias sets lies in the construction of almost k-wise independent sample spaces.

k-wise independent spaces

A random variable Y over {0,1}n is a k-wise independent space if, for all index sets I{1,,n} of size k, the marginal distribution Y|I is exactly equal to the uniform distribution over {0,1}k. That is, for all such I and all strings z{0,1}k, the distribution Y satisfies PrY(Y|I=z)=2k.

Constructions and bounds

k-wise independent spaces are fairly well understood.

  • A simple construction by (Joffe 1974) achieves size nk.
  • (Alon Babai) construct a k-wise independent space whose size is nk/2.
  • (Chor Goldreich) prove that no k-wise independent space can be significantly smaller than nk/2.

Joffe's construction

(Joffe 1974) constructs a k-wise independent space Y over the finite field with some prime number n>k of elements, i.e., Y is a distribution over 𝔽nn. The initial k marginals of the distribution are drawn independently and uniformly at random:

(Y0,,Yk1)𝔽nk.

For each i with ki<n, the marginal distribution of Yi is then defined as

Yi=Y0+Y1i+Y2i2++Yk1ik1,

where the calculation is done in 𝔽n. (Joffe 1974) proves that the distribution Y constructed in this way is k-wise independent as a distribution over 𝔽nn. The distribution Y is uniform on its support, and hence, the support of Y forms a k-wise independent set. It contains all nk strings in 𝔽nk that have been extended to strings of length n using the deterministic rule above.

Almost k-wise independent spaces

A random variable Y over {0,1}n is a δ-almost k-wise independent space if, for all index sets I{1,,n} of size k, the restricted distribution Y|I and the uniform distribution Uk on {0,1}k are δ-close in 1-norm, i.e., Y|IUk1δ.

Constructions

(Naor Naor) give a general framework for combining small k-wise independent spaces with small ϵ-biased spaces to obtain δ-almost k-wise independent spaces of even smaller size. In particular, let G1:{0,1}h{0,1}n be a linear mapping that generates a k-wise independent space and let G2:{0,1}{0,1}h be a generator of an ϵ-biased set over {0,1}h. That is, when given a uniformly random input, the output of G1 is a k-wise independent space, and the output of G2 is ϵ-biased. Then G:{0,1}{0,1}n with G(x)=G1(G2(x)) is a generator of an δ-almost k-wise independent space, where δ=2k/2ϵ.[3]

As mentioned above, (Alon Babai) construct a generator G1 with h=k2logn, and (Naor Naor) construct a generator G2 with =logs=logh+O(log(ϵ1)). Hence, the concatenation G of G1 and G2 has seed length =logk+loglogn+O(log(ϵ1)). In order for G to yield a δ-almost k-wise independent space, we need to set ϵ=δ2k/2, which leads to a seed length of =loglogn+O(k+log(δ1)) and a sample space of total size 2lognpoly(2kδ1).

Notes

  1. cf., e.g., (Goldreich 2001)
  2. 2.0 2.1 2.2 2.3 cf., e.g., p. 2 of (Ben-Aroya Ta-Shma)
  3. Section 4 in (Naor Naor)

References