We value your privacy. We use cookies to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies. Read our Privacy Policy for more information.

Methodology

Engineering

Blogs

Prompt engineering when building LLMs into your product: lessons learned

Tuesday, September 24, 2024

Stefanos Peros

Software engineer

Since the introduction of ChatGPT's API, we have embarked on a journey to incorporate Large Language Models (LLMs) into our products in innovative ways, aiming to enhance user experience and create value for our customers. Just as programming languages enable developers to command their computers, prompts enable users to guide LLMs in accomplishing specific tasks. However, the parallel doesn’t stop there; just like poorly written code results in flawed programs, ineffective prompts can lead to inaccurate responses from AI models. Quite some research has delved into the intricacies of prompt engineering, unveiling best practices that have since become integral to our interactions with LLMs. Let's dive into the essential do's and don'ts of prompt engineering, and how they can dramatically influence the performance of LLMs in your products.

Lessons learned

Standardise output

In our experience, one of the most common reasons for errors has been with parsing the model’s response. This is why we shifted early on to using third-party frameworks, particularly LangChain, as it maps the LLM’s response to dictionaries and objects, shielding the developer from the underlying prompt instructions to structure the response. While this does not guarantee that the LLM response will always be valid, it certainly decreased the number of parsing errors when processing LLM responses.

Standardise input

For many of our products, we use LangChain’s built-in PromptTemplate class to construct our prompts. Generally, prompt templates contain placeholders, which are populated at runtime by the corresponding variables declared in the code. This essentially makes prompts largely reusable across the application, and much easier to maintain since changing their contents only needs to happen in a single place instead of every place where they are used.

Keep prompts short and specific

Just as in coding, where functions should be concise and well-structured, concise and specific prompts lead to more accurate and relevant AI responses. This parallel extends further when considering the specificity of both prompts and code: just as specific function parameters in code ensure that a function behaves as expected under defined conditions, specific prompts guide the AI more precisely, reducing the likelihood of irrelevant or inaccurate outputs.

As such, when accurate and consistent LLM responses are crucial, we break down complex tasks into multiple steps, each of which would correspond to a separate prompt, which are then chained together. While there is a performance and cost penalty attached to this approach, due to the larger number of prompts, this penalty is becoming less significant as advancements in language model optimization and infrastructure efficiency continue to reduce the cost and increase the speed of LLMs.

Incorporate examples

GPTs are exactly that: pre-trained models on an extensive amount of data. When using GPTs for specialised use cases, it is likely that their pre-training is too broad in that domain which can lead to inaccurate responses. Whereas fine-tuning these models is sometimes possible, it remains costly and requires an extensive dataset. A cheaper alternative to increase response accuracy and consistency is to provide a handful of input / desired output examples in the prompt as additional context. By doing so, we observed substantial improvement in the responses when using GPT-4 to match clinical trial criteria to medical questions for one of our use cases.

Closing statement

Writing code typically presents a steeper learning curve than crafting prompts. However, many of the fundamental principles that underpin effective coding also apply to prompt creation, making all the difference with respect to the usability of AI model responses. Our journey with large language models (LLMs) began with the release of OpenAI's inaugural API, and it has been a rich learning experience ever since. Over time, we've significantly expanded our capabilities, continuously discovering new ways to harness this groundbreaking technology and tackle ever more complex use cases for our customers.

‍