Conference
Tung Mai, Anup Rao, Matt Kapilevich, Ryan A. Rossi, Yasin Abbasi-Yadkori, Ritwik Sinha.

UAI, 2019.

One Permutation Hashing (OPH) is a significantly more efficient alternative to the popular minwise hashing. To produce a sketch of size (More...)padding-bottom: 0.277em;">k$k$, OPH requires just one hash function whereas the classical minwise hashing requires k$k$ hash functions. However, OPH does not have the desirable locality sensitive hashing (LSH) property that is important for indexing. [Srivastava and Li 2014, ICML] proposed the novel idea of densification to produce LSH sketches from OPH sketches, and gave the first densification routine.
In this paper, we give a necessary and sufficient condition for a densification routine to result in LSH sketches when applied to OPH sketches. Furthermore, we give a novel densification routine that for every input, takes O(klogk)$O(k\mathrm{log}k)$ time in expectation and achieves better variance than the previous best bound obtained by [Srivastava 2017, ICML]. The running time of the densification routine given in [Srivastava 2017, ICML] for worst case inputs is O(k2)$O({k}^{2})$ in expectation.

@inproceedings{mai-uai19,
author={Tung Mai and Anup Rao and Matt Kapilevich and Ryan A. Rossi and Yasin Abbasi-Yadkori and Ritwik Sinha},
title={On Densification for Minwise Hashing},
booktitle={UAI},
year={2019},
}