Suffix Array deduplication
Progress
TABLE
Identifier AS "Identifier",
Language AS "Language",
row["Physical Size"] AS "Physical Size",
row["Total Text Size"] AS "Total Text Size (bytes)",
row["Substring Length Threshold"] AS "Substring Length Threshold",
row["Substring Duplicate Size"] AS "Substring Duplicate Size (bytes)"
FROM #deduplication AND #projectnotes
SORT Identifier^447e08
Goals
-
SimHash deduplication ^c61205
- Finish running deduplication by 2022-03-04
-
Suffix Array Substring deduplication ^3fb13c
- Finish running deduplication by 2022-03-04
-
Deduplication report
- Finish writing the report by 2022-03-05