BasePrompt
BasePrompt
is the base class for all prompts. Currently we support building prompts to instruct LLM by calling LLM API service of OpenAI (GPT-3, ChatGPT), Anthropic (Claude) and Cohere (Command) or by requesting locally deployed LLM like Llama2, ChatGLM2, etc. We will support more available LLM products in the future.
You can also easily inherit this base class to customize your own prompt class. Just override the
build_prompt
method andparse_response
method.
Constructor
__init__(self)
Parameters
None.
Example
build_prompt
build_prompt(self, prompt: str)
Description
Build a prompt from a given string input.
Parameters
prompt
(str): The prompt string.
Example
get_openai_result
Description
Get the response from OpenAI API.
Parameters
engine
(str): The OpenAI engine to use for the API call. Defaults to "gpt-3.5-turbo". Available engines include "text-davinci-003", "text-davinci-002", "gpt-3.5-turbo", "gpt-3.5-turbo-0301", "gpt-3.5-turbo-0613", "gpt-3.5-turbo-16k", "gpt-3.5-turbo-16k-0613", "gpt-4", "gpt-4-0613", "gpt-4-0314".system_message
(str): System messages provided to ChatGPT. Defaults to "You are a helpful assistant.".temperature
(float): What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Defaults to 0.max_tokens
(int): The maximum number of tokens to generate in the completion. Defaults to 1024.top_p
(float): An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Defaults to 1.0.n
(int): The number of completions to generate for each prompt. Defaults to 1.frequency_penalty
(float): Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Defaults to 0.0.presence_penalty
(float): Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. Defaults to 0.0.
Example
get_anthropic_result
Description
Get the response from Anthropic API.
Parameters
engine
(str): The Anthropic engine to use for the API call. Defaults to "claude-2". Available engines include "claude-2", "claude-2.0", "claude-instant-1", "claude-instant-1.2".max_token_to_sample
(int): A maximum number of tokens to generate before stopping. Defaults to 1024.stop_sequences
(List[str]): A list of strings upon which to stop generating. You probably want["\n\nHuman:"]
, as that's the cue for the next turn in the dialog agent.temperature
(float): Amount of randomness injected into the response. Ranges from 0 to 1. Use temp closer to 0 for analytical / multiple choice, and temp closer to 1 for creative and generative tasks. Defaults to 1.top_k
(int): Only sample from the top K options for each subsequent token. Used to remove "long tail" low probability responses. Defaults to -1, which disables it.top_p
(float): Does nucleus sampling, in which we compute the cumulative distribution over all the options for each subsequent token in decreasing probability order and cut it off once it reaches a particular probability specified bytop_p
. Defaults to -1, which disables it. Note that you should either altertemperature
ortop_p
, but not both.
Example
get_cohere_result
Description
Get the response from Anthropic API.
Parameters
engine
(str): The Cohere engine to use for the API call. Defaults to "command". Available engines include "command", "command-nightly", "command-light", "command-light-nightly".max_tokens
(int): The maximum number of tokens the model will generate as part of the response. Defaults to 1024.temperature
(float): A non-negative float that tunes the degree of randomness in generation. Lower temperatures mean less random generations. See Temperature for more details. Defaults to 0.75.k
(int): Ensures only the topk
most likely tokens are considered for generation at each step. Defaults to 0, min value of 0, max value of 500.p
(float): Ensures that only the most likely tokens, with total probability mass ofp
, are considered for generation at each step. If bothk
andp
are enabled,p
acts afterk
. Defaults to 0.75, min value of 0.01, max value of 0.99.frequency_penalty
(float): Used to reduce repetitiveness of generated tokens. The higher the value, the stronger a penalty is applied to previously present tokens, proportional to how many times they have already appeared in the prompt or prior generation.presence_penalty
(float): Can be used to reduce repetitiveness of generated tokens. Similar tofrequency_penalty
, except that this penalty is applied equally to all tokens that have already appeared, regardless of their exact frequencies. Defaults to 0.0, min value of 0.0, max value of 1.0.
Example
get_engine_result
Description
Get the response from locally deployed LLM engine.
Parameters
engine
(BaseEngine): Instance of BaseEngine and all its subclasses. See Engines for more details.
Example
parse_response
Implemented in subclasses.
Last updated