IEPrompt
IEPrompt
is the class for information extraction prompt. We are now supporting Named Entity Recognition (ner), Relation Extraction (re), Event Extraction (ee), Relational Triple Extraction (rte) and Data Augmentation (da) for re.
Constructor
__init__(self, task='ner')
Parameters
task
(str): The task name, should be in one of ["ner", "re", "ee", "rte", "da"].
Example
from easyinstruct.prompts import IEPrompt
ie_prompt = IEPrompt("ner")
build_index
build_prompt(
self,
prompt: str,
head_entity: str = None,
head_type: str = None,
tail_entity: str = None,
tail_type: str = None,
language: str = 'en',
instruction: str = None,
in_context: bool = False,
domain: str = None,
labels: List = None,
examples: Dict = None
)
In the Named Entity Recognition (ner) task,
prompt
parameter is the prediction text;domain
is the domain of the prediction text, which can be empty;labels
is the entity label set, which can also be empty.In the Relation Extraction (re) task,
prompt
parameter is the text;domain
indicates the domain to which the text belongs, and it can be empty;labels
is the set of relationship type labels. If there is no custom label set, this parameter can be empty;head_entity
andtail_entity
are the head entity and tail entity of the relationship to be predicted, respectively;head_type
andtail_type
are the types of the head and tail entities to be predicted in the relationship.In the Event Extraction (ee) task,
prompt
parameter is the prediction text;domain
is the domain of the prediction text, which can also be empty.In the Relational Triple Extraction (rte) task,
prompt
parameter is the prediction text;domain
is the domain of the prediction text, which can also be empty.The specific meanings of other parameters are as follows:
language
indicates the language of the task, whereen
represents English extraction tasks, andch
represents Chinese extraction tasks;in_context
indicates whether in-context learning is used. When it is set toFalse
, only the instruction prompt model is used for information extraction, and when it is set toTrue
, in-context form is used for information extraction;instruction
parameter is used to specify the user-defined prompt instruction, and the default instruction is used when it is empty;examples
are the in-context examples used for in-context learning. The formats are defined as follows:ner re ee
examples = { "input" = "The input", "output" = "The output" }
re da
examples = { "relation" = "The relation", "context" = "The text to be extracted", "head_entity" = "head entity", "head_type" = "head entity type", "tail_entity" = "tail entity", "tail_type" = "tail entity type" }
Examples
(Please see Deepke llm for more details)
import os
import json
import hydra
from hydra import utils
import logging
from easyinstruct.prompts import IEPrompt
from .preprocess import prepare_examples
logger = logging.getLogger(__name__)
@hydra.main(version_base=None, config_path="conf", config_name="config")
def main(cfg):
cfg.cwd = utils.get_original_cwd()
text = cfg.text_input
if not cfg.api_key:
raise ValueError("Need an API Key.")
if cfg.engine not in ["text-davinci-003", "text-curie-001", "text-babbage-001", "text-ada-001"]:
raise ValueError("The OpenAI model is not supported now.")
os.environ['OPENAI_API_KEY'] = cfg.api_key
ie_prompter = IEPrompt(cfg.task)
examples = None
if not cfg.zero_shot:
examples = prepare_examples(cfg.data_path, cfg.task, cfg.language)
if cfg.task == 're':
ie_prompter.build_prompt(
prompt=text,
head_entity=cfg.head_entity,
head_type=cfg.head_type,
tail_entity=cfg.tail_entity,
tail_type=cfg.tail_type,
language=cfg.language,
instruction=cfg.instruction,
in_context=not cfg.zero_shot,
domain=cfg.domain,
labels=cfg.labels,
examples=examples
)
else:
ie_prompter.build_prompt(
prompt=text,
language=cfg.language,
instruction=cfg.instruction,
in_context=not cfg.zero_shot,
domain=cfg.domain,
labels=cfg.labels,
examples=examples
)
result = ie_prompter.get_openai_result()
logger.info(result)
if __name__ == '__main__':
main()
Last updated