# 如何展示从k-wise独立哈希函数的哈希代码的独立和统一分发？ -- probability-theory 领域 和 hash 领域 cs 相关 的问题

## How to show independence and uniform distribution of hash codes from k-wise independent hash functions?

1 ### 问题

$$k$$哈希函数的独立族族的定义，我遇到了一个romial $$h$$$$d$$$$r$$是k-wise独立的对于所有Distinct $$X_1，X_2， Dots，X_K In D$$$$Y_1，Y_2， Dots，Y_K 在r$$

$$mathbb {p} _ {h 在h}（h（x_1）= y_1，h（x_2）= y_2， dots，h（x_k）= Y_K）= FRAC {1} {| R | ^ K}$$

wikipedia文章在k-wise独立哈希函数（使用上面定义）声明该定义等同于以下两个条件：

（i）对于所有$$x In d$$$$h（x）$$是统一分布在$$r$$鉴于$$h$$是随机选择的 $H$

（ii）对于任何固定的不同键$$x_1，x_2， dots，x_k In d$$，作为$$h$$$$h$$，哈希代码$$h（x_1），h（ x_2）， dots，h（x_k）$$是独立的随机变量。

Most definitions of a $$k$$-wise independent family of hash functions I have encountered state that a family $$H$$ of hash functions from $$D$$ to $$R$$ is k-wise independent if for all distinct $$x_1, x_2,\dots, x_k \in D$$ and $$y_1, y_2,\dots, y_k \in R$$,

$$\mathbb{P}_{h \in H}(h(x_1) = y_1, h(x_2) = y_2, \dots, h(x_k) = y_k) = \frac{1}{|R|^k}$$

The Wikipedia article on k-wise independent hash functions (which uses the above definition) claims that the definition is equivalent to the following two conditions:

(i) For all $$x \in D$$, $$h(x)$$ is uniformly distributed in $$R$$ given that $$h$$ is randomly chosen from $$H$$.

(ii) For any fixed distinct keys $$x_1, x_2,\dots, x_k \in D$$, as $$h$$ is randomly drawn from $$H$$, the hash codes $$h(x_1), h(x_2), \dots, h(x_k)$$ are independent random variables.

It is not obvious to me how one proves (i) from the above definition without explicitly assuming (ii) in the definition as well (and vice-versa). How is the definition sufficient for proving both (i) and (ii)?

## 回答列表

1  Throughout, we assume that $$|D| \geq k$$.

Suppose that $$H$$ satisfies, for all distinct $$x_1,\dots,x_k \in D$$ and all $$y_1,\ldots,y_k \in R$$, $$\Pr_{h \in H}[h(x_1)=y_1,\ldots,h(x_k)=y_k] = \frac{1}{|R|^k}.$$ Now let $$x \in D$$ be arbitrary. Since $$|D| \geq k$$, we can find $$x_2,\ldots,x_k \in D$$ such that $$x,x_2,\ldots,x_k$$ are distinct. For each $$y \in R$$, $$\Pr_{h \in H}[h(x)=y] = \sum_{y_2,\ldots,y_k \in R} \Pr_{h \in H}[h(x)=y,h(x_2)=y_2,\ldots,h(x_k)=y_k] = \\ \sum_{y_2,\ldots,y_k \in R} \frac{1}{|R|^k} = \frac{|R|^{k-1}}{|R|^k} = \frac{1}{|R|}.$$ This proves (i). To see (ii), let $$x_1,\ldots,x_k \in D$$ be distinct. Then for all $$y_1,\ldots,y_k$$, $$\Pr_{h \in H}[h(x_1)=y_1,\ldots,h(x_k)=y_k] = \frac{1}{|R|^k} = \prod_{i=1}^k \frac{1}{|R|} = \prod_{i=1}^k \Pr_{h \in H}[h(x_i) = y_i].$$

In the other direction, suppose that (i) and (ii) hold. Then for all distinct $$x_1,\ldots,x_k \in D$$ and $$y_1,\ldots,y_k \in R$$, $$\Pr_{h \in H}[h(x_1)=y_1,\ldots,h(x_k)=y_k] = \prod_{i=1}^k \Pr_{h \in H}[h(x_i) = y_i] = \prod_{i=1}^k \frac{1}{|R|} = \frac{1}{|R|^k} .$$

## 相关问题

6  Codomain /范围是哈希函数始终$mathbb {z}$或$mathbb {n}$？  ( Is the codomain range of a hash function always mathbbz or mathbbn )

14  传播输入的功能  ( Function that spreads input )

3  使用哈希的二进制搜索树的术语？  ( Term for binary search tree using hashes )

1  解释操作系统中的哈希页面表  ( Explain hashed page tables in operating system )

0  哈希函数订单从原产地最接近最近的船只  ( Hash function orders ships from closest to furthest from the origin )

1  一些固定间隔中MD5哈希的前缀的冲突  ( Collisions of prefixes of md5 hashes in some fixed interval )

1  关于佐色斯散系与碰撞机会的理论问题  ( Theoretical question about zobrist hashing and chances of collision with slight )

2  证明计算Minhash  ( Proving calculating minhash )