Deduplicator

Deduplicator is the class for eliminating duplicate instruction samples that could adversely affect both pre-training stability and the performance of LLMs. Deduplicator can also enables efficient use and optimization of storage space.

Last updated