Build a local chatbot using langchain

3 minute read

Published:

Large Language Models (LLM) have surprised us in many ways, such as talking like human, processing paper work, codeing copilot, analyzing data…, simply by predicting the next token using probability model. As the LLMs get smarter and lighter, we can leverage LLMs that are close to or even surpasse the capabilities of ChatGPT on our device, which is going to make our life much easier. This blog shows how to build a chatbot that runs 100% locally and shares my thoughts on how the LLMs can help in our daily life.

Model

Let’s choose a LLM first. There are a lot of LLMs available on huggingface, we have to pick one that our local device can handle. Thanks to LLamacpp, people that cannot afford an professinal graphics card (me included) have a chance to play with different open sourced LLMs. For example, I choose vicuna-13b-4bit-ggml, which belongs to Llamma model family and is a 4bit quantized version in ggml format (official format in llamacpp). Now it’s time to dive into the technical part.

Langchain

To get a chatbot, we need some extra seasoning since what the raw LLM can do is to predict the next word based on your input prompt. Langchain offers modular components and off-the-shelf chains to harness the LLM to complete high level tasks, including adding memory to our chatbot, enabling the chatbot to retrieve useful information from different sources. Don’t be intimidated by the complex concepts, I will show you how to build a chatbot step by step.

Prompt Template

CLICK ME


A prompt template is a way to provide additional context and instructions to a language model. It allows the user to input specific information that will be incorporated into the generated text. The template consists of a prompt that includes variables, which will be filled in with user input. A template for a chatbot looks like this:

template = '''You are an AI chatbot having a conversation with a human. 
{history}
Human: {input}
Chatbot:'''

Conversation Memory

CLICK ME


As mentioned above, chatbot needs to remember the history as the chat goes on. The simplest way is to save the history into a buffer and combine the history and current prompt as the whole prompt input into the LLM. Langchain wraps this up and provides multiple classes of memory for various use. Take 'ConversationBufferMemory' for instance, it keeps a buffer of all prior messages in a conversation which can be extracted as a string or as a list of messages. More advanced option is ConversationSummaryMemory, which stores condensed information summarized by LLM from conversation history, thus it captures and utilizes important information for a more sophisticated conversation experience.


Now let’s combine LLM, prompt temlate and memory together, or ‘chain them up’, by using chain, the core value langchain provides. There are many chains available, I encourage you to try and find the best for you. Here I use ‘ConversationChain’ by simply pass the elements I want to combine:

conversation_chain = ConversationChain(llm=llm,prompt=prompt,memory=chat_Memory())
response = conversation_chain({'input': user_question}) 

That’s the backbone of a chatbot, you can put them in a while loop and start your conversation in terminal. But it’s nice to have an interface like ChatGPT webpage. In case that you don’t have experience in building website, Streamlit is an option to fast build a web app.

Streamlit Web App

To be continued…