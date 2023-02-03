Large Language Models (LLMs) can complete various tasks without the need for fine-tuning with the help of few-shot demos or samples of the inputs and outputs for a task. Chain-of-thought prompting, which offers intermediate steps for the task’s reasoning, can help LLMs perform even better. However, the demonstration quality significantly impacts the LLMs’ few-shot performance, particularly for reasoning tasks that call for sophisticated and varied reasoning patterns. It is expensive and time-consuming to manually create a wide and varied set of instances for demonstration selection, and relying on a small number of demos could prevent the LLMs from generalizing and adapting to various test inputs.

