Bengali
-
Identifier::
bn
-
Language::Bengali
-
Physical Size::
5GB
-
Number of Rows::
841724
-
Unique URLs::
841571
-
SimHash Tokenization::character 6-gram
-
SimHash Parameters::\((4,6)\)
-
SimHash Match Distribution::
{4: 377, 3: 221, 2: 108, 1: 50}
-
SimHash Results::
756 matches/273 clusters/638 hashes
-
Substring Length Threshold:: \(100\)
-
Total Text Size::
5008061572
-
Substring Duplicate Size::
1440245010 (28.76%)
Examples
[[Pasted image 20220227152418.png]]❌[[Pasted image 20220227152638.png]]❌[[Pasted image 20220227152550.png]]❌