You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a particular dataset format required for finetuning codellama? I have the dataset in the OpenAI suggested format which is basically a jsonl with each entry having messages: [{role: 'system', content: '<system prompt>'}, {role: 'user', content: '<user prompt>'}, {role: 'assistant', content: '<assistant reply>'}]} object. Will this format work?
The text was updated successfully, but these errors were encountered:
def formatting_prompts_func(example):
output_texts = []
for i in range(len(example['prompt'])):
text = f"An AI tool that corrects and rephrase user text grammar errors delimited by triple backticks to standard English.\n### Input: {example['prompt'][i]}\n ### Output: {example['completion'][i]}"
output_texts.append(text)
return output_texts
To create a learning format from this source code, [INST] < <>{text} [/INST]
Is there a particular dataset format required for finetuning codellama? I have the dataset in the OpenAI suggested format which is basically a jsonl with each entry having
messages: [{role: 'system', content: '<system prompt>'}, {role: 'user', content: '<user prompt>'}, {role: 'assistant', content: '<assistant reply>'}]}
object. Will this format work?The text was updated successfully, but these errors were encountered: