EasyInstruct
  • Overview
  • Installation
  • Documentations
    • Prompts
      • BasePrompt
      • ICLPrompt
      • CoTPrompt
      • IEPrompt
      • IndexPrompt
      • MMPrompt
      • BatchPrompt
    • Engines
      • BaseEngine
      • Llama2Engine
      • ChatGLM2Engine
    • Generators
      • BaseGenerator
      • SelfInstructGenerator
      • BacktranslationGenerator
      • EvolInstructGenerator
      • KG2InstructGenerator
    • Selectors
      • BaseSelector
      • Deduplicator
      • LengthSelector
      • RougeSelector
      • GPTScoreSelector
      • PPLSelector
      • MTLDSelector
      • CodeSelector
      • MultiSelector
Powered by GitBook
On this page
  1. Documentations
  2. Selectors

Deduplicator

Deduplicator is the class for eliminating duplicate instruction samples that could adversely affect both pre-training stability and the performance of LLMs. Deduplicator can also enables efficient use and optimization of storage space.

PreviousBaseSelectorNextLengthSelector

Last updated 1 year ago