Urdu
-
Identifier::
ur - Language::Urdu
-
Physical Size::
1.6GB -
Number of Rows::
371691 -
Unique URLs::
371628 - SimHash Tokenization::character 6-gram
- SimHash Parameters::\((4,6)\)
-
SimHash Match Distribution::
{4: 135, 3: 96, 2: 35, 1: 19} -
SimHash Results::
285 matches/167 clusters/407 hashes - Substring Length Threshold:: \(100\)
-
Total Text Size::
1565304005 -
Substring Duplicate Size::
231043706 (14.76%)