Hindi
-
Identifier::
hi - Language::Hindi
-
Physical Size::
10.2GB -
Number of Rows::
1982933 -
Unique URLs::
1982279 - SimHash Tokenization::character 6-gram
- SimHash Parameters::\((4,6)\)
-
SimHash Match Distribution::
{4: 4735, 3: 1855, 2: 633, 1: 185} -
SimHash Results::
7408 matches/914 clusters/3191 hashes - Substring Length Threshold:: \(100\)
-
Total Text Size::
10327894620 -
Substring Duplicate Size::
3087503973 (29.89%)