Skip to main content

Large Language Models (LLMs) are one of the most exciting developments to come out of the world of AI in recent months and years. LLMs are adept at accessing, synthesizing and outputting vast amounts of information almost instantaneously in response to a prompt you feed into it. Millions of people use LLMs like Chat-GPT, Claude and Google Gemini daily for such diverse applications as:

  • Content generation
  • Virtual assistants
  • Text summarization
  • Content moderation
  • Computer code authoring & troubleshooting
  • Customer support agents
  • Internal knowledge base question and answering

For the casual LLM user with a general question, simply typing in a query and awaiting the response may be more than enough to produce a useful response. But as the user’s desire for accuracy and domain-specific knowledge deepens, she will be called to explore the available methods for significantly improving and customizing the LLM’s output. Those methods are the subject of this post.

Why LLMs Need Our Help to Perform Their Best

There are some very significant drawbacks when using LLMs “out of the box” by simply typing a question or information request to get a desired response, including:

1. They are known to sometimes confidently return incorrect information with a high degree of confidence, a phenomenon known as “hallucination”

Simply put, LLMs are trained do one thing: return text based upon an input image, block of text or audio prompt (depending upon the model). In other words, at its core it they are simply question-and-answer machines. Because the LLM is designed to simply return an answer to your question (or “prompt”), it’s going to do that even if it doesn’t have enough information to go on. It is not programmed to stay quiet. It’s going to tell you SOMETHING – even if that thing is not actually correct.

2. They lack access to domain-specific knowledge about your company, organization or other locally important topic

Large language models are trained on publicly available information – and lots of it. This gives them a remarkable breadth and depth of knowledge. However, what if you want your LLM to output last month’s sales figures for your company as part of its query response? Without some additional configuration, that is just not going to happen.

3. They lack up-to-date information on most topics

The LLM you like to use was trained weeks, months or even years ago. That means its knowledge is based on the state of the art at that point in time. Take scientific facts for example: if a new type of mammal was discovered by researchers in the Amazon jungle last month, it is likely your LLM doesn’t have a clue about that discovery since it was likely trained months ago. So don’t bother asking it questions about the new mammal.

4. They lack the writing tone, style and voice of people you know within your organization

Your LLM was trained on public data written by thousands of other people. Imagine that you want your LLM to ghostwrite a blog for you. Do you typically use longer or shorter sentences when you write? Do you tend toward direct, terse descriptions or do you spice it up with some creative, flowery language? Do you employ humor in your writing or employ a “just the facts, ma’am” style? Do you tend toward highly technical, involved explanations or do you usually prefer to summarize things? In its native state, your LLM has no way of knowing any of these things about your writing style, tone or vibe.

Types of Methods to Help Your LLM Become Smarter

Fortunately, data engineers continue to come up with new methods to help LLM users elicit the kind of content they seek. These methods cause LLMs to output content that is much better-suited to their needs: delighting users, increasing productivity, broadening and deepening access to domain knowledge, and expanding opportunities for creativity.

Broadly defined, there are two main types of methods to employ when you want to get more out of your LLMs: fine-tuning and prompt engineering. We say “types of methods” because there are actually multiple specific ways to approach and implement fine-tuning. And the same goes for prompt engineering.

Let’s explore the features of each type of method.

Fine-Tuning: Updating the Model Itself to Better Suit Your Needed Output

Fine-tuning a large language model involves retraining an existing LLM by feeding it a relatively small amount of your own data and thereby updating its actual model parameters. Here’s the key feature of this type of method: the result is actually a new model (or at least, a modified version of an existing model). A fine-tuned model will still be able to draw upon the vast knowledge base upon which it was initially trained. But it is also able to incorporate specific information and writing style(s) you desire, based upon the data you fine-tuned the model with.

Model fine-tuning is something that a data engineer spends some time and resources on to carry out. It is not carried out at the type of the prompt input, hence, it is not prompt engineering (see next section). To fine-tune a model, the user must prepare dozens, hundreds or even thousands of rows of “structured data,” with an input –> output type pairing per row. For example, you might prepare an FAQ file, with each row containing a carefully crafted question and corresponding answer – and training the model on that.

Major advantages of fine-tuning

Fine-tuning does not need to be done each time you enter a new prompt into your LLM. If you are seeking responses that are fine-tuned to your situation (i.e., domain-specific knowledge base), simply choose a fine-tuned version of your LLM (or create a new one) and enter your prompt there. The model will faithfully return text that is suitable for the task, based upon the fine-tuning parameters. Important note: As with all computing operations, the garbage in, garbage out (GIGO) maxim still applies here. You need to train it on good data if you want good responses.

Another advantage of fine-tuned models is that you do not need to maintain up-to-the-minute database of the latest facts and figures. Rather, once the model is trained, it can be used indefinitely without further context being provided to the LLM.

Major disadvantages of fine-tuning

Fine-tuned models are not adept at returning specific facts pertaining to locally sourced knowledge, such as dollar values, measurements, counts, dates and percentages. At least, this is the case when performing complex queries – like database queries (viz., think “SQL query with multiple table joins, for you database nerds”) – and expecting a reliable level of accuracy. Another disadvantage is that a fine-tuned model is only going to be as current as the last time you trained it, in terms of information recency.

Pitfalls in the use of fine-tuned model approaches include:

  • Using an insufficient amount of data to fine-tune the model
  • Using incorrect data to fine-tune the model
  • Trying to load complex, layered factual data into the model
  • Expecting the model to return specific, factual knowledge in an accurate manner

Prompt Engineering: Adding Context and Specific Data to the LLM’s Responses

In contrast to fine tuning, prompt engineering methods for improving LLM output involve baking in an additional step (or steps) to the prompt, before it is submitted to the LLM. There are several ways to go about this, including, for example: One-Shot Learning, Chain-of-Thought Processing and Retrieval Augmented Generation (RAG).

Some popular prompt engineering methods involve simply “seeding” your prompt with additional context. This can be accomplished through several methods, such as:

  1. Assistants: Preparing an assistant profile that you define in detail as to its knowledge domain and style
  2. One-Shot Learning and Few-Shot Learning: Typing in some contextual data before actually writing the prompt, so as to set the LLM off on the right foo
  3. Chain-of-Thought (COT) Prompting: Breaking the problem out into multiple logical steps or stages and feeding them into the LLM one by one, in order to lead the LLM by the virtual hand down the logical path you would like it to follow
  4. Retrieval Augmented Generation (RAG): Sending the prompt to a special, locally maintained database called a “vector database” which pulls data based on your query that is directly relevant and feeds related results back to the LLM before the LLM returns its response to your query

Major advantages of prompt engineering

Prompt engineering, depending upon the specific method employed, can be highly accurate in terms of returned data. This is especially true in the case of Retrieval Augmented Generation (RAG). RAG is very useful for forcing the LLM to access a locally owned, internal database with relevant and timely facts and figures – incorporating them into the LLM’s response.

Major disadvantages of prompt engineering

Prompt engineering requires the user to constantly add context to each prompt when submitting it to the LLMs. In the case of RAGs, it requires that your team maintain an up-to-date database of information that can be fed into the query results.

Another disadvantage is that prompt engineering can give a false sense of security that the model is providing accurate data, even when the underlying databases have not be properly updated – or not updated recently enough.

Using Fine-Tuning and Prompt Engineering Methods in Tandem

Finding it hard to choose which type of method described above is best for your LLM use case? You don’t have to choose! You can actually chain multiple methods together. (A popular platform for doing so is LangChain).

For example, say you want to build a chatbot for helping your internal sales team choose the best product to recommend to prospective clients they are talking to on the phone, in real time. To build your sales team such a recommender chatbot, you could start by fine-tuning an existing, off-the-shelf LLM with your company’s product information, installation overviews, shipping details and general company information. Then, you could set up a vector embedding database containing product technical specifications, pricing by model or SKU, and other data. Finally, you could use RAG to augment your fine-tune LLM’s responses with that vectorized data. This way, you could combine more general corporate knowledge and communication styles with up-to-date product information for use by your internal chatbot.

Are you fascinated by the possibilities? Contact us today to discuss how JLytics can build you a customized LLM-based chatbot, recommender engine or virtual assistant for internal or customer-facing use.

Be sure to bookmark this JLytics blog in order to continue to explore LLMs, machine learning and other data-driven topics right here with us.

Start the Conversation

Interested in exploring a relationship with a data partner dedicated to supporting executive decision-making? Start the conversation today with JLytics.