What is persistent memory in Generative AI?

Generative AI, particularly conversational AI, has made significant strides in the latest couple of years, thanks to the development of Large Language Models (LLMs) like GPT-3 and its successors. These models have the ability to generate human-like text and carry on conversations that are often indistinguishable from those with a human. However, one of the key challenges in creating truly human-like conversations is the ability to remember and recall information from previous interactions. This is where persistent memory comes into play. In this article, we'll explore the concept of persistent memory in Generative AI and its significance in creating more human-like conversations.

One of the key aspects of implementing persistent Memory in Generative AI is to enable the AI to remember and recall information from previous interactions. To do this effectively, the AI needs to be able to store and retrieve information from a dataset of previous interactions. Storing and retrieving the dataset will depend on the architecture of the implementation, but will most likely involve a database or some form of storage system. Two common database systems used in AI are NoSQL and Vector databases.

NoSQL databases store data in formats like key-value pairs, documents, graphs, or wide-column stores. They're ideal for managing vast data volumes and are commonly used in big data, real-time web, AI, and machine learning applications due to their ability to handle unstructured data and scale flexibly. In Generative AI, a NoSQL database might be used to maintain a dataset of past interactions. This dataset aids the AI in crafting responses to new inputs and gets updated with each new interaction.
Vector databases specialize in storing data as vectors, making them efficient for handling high-dimensional data. For Generative AI, using a vector database would mean storing past interactions as vectors. This setup facilitates quick retrieval and use of this data to inform responses to new queries, with the database continually being updated with fresh interactions.

A Typical User Interaction

A typical user interaction with a conversational AI might look like this:

User sends a message to the AI.
The AI retrieves the conversation history from the NoSQL database.
The AI uses the conversation history to generate a response to the user's message.
The user message and the AI response are stored in the NoSQL database.

We can iteratively build the dataset in the NoSQL database by storing each user message and the AI response as a document.

In this article, we'll investigate an implementation of the Naive NoSQL conversational history retrieval strategy, which involves using a NoSQL database to store and retrieve conversational history, which the AI can then use to generate responses to new user inputs. We'll explore how this strategy works, its benefits and limitations, and how it can be optimized for better performance.

Dataset & Prompt

An SQL representation of the sample dataset could look something like this:

| conversation_id | sender | message                                      | created_at           |
|-----------------|--------|----------------------------------------------|----------------------|
| 12345           | user   | Hello, how are you?                          | 2024-02-19T12:00:00Z |
| 12345           | ai     | I'm good, thanks. How can I help you today?  | 2024-02-19T12:01:00Z |

And in a NoSQL database, it could look something like this:

{
  "conversation_id": "12345",
  "messages": [
    {
      "conversation": {
        "id": "12345",
        "conversation_id": "12345",
        "sender": "user",
        "message": "Hello, how are you?",
        "created_at": "2024-02-19T12:00:00Z"
      }
    },
    {
      "conversation": {
        "id": "12346",
        "conversation_id": "12345",
        "sender": "ai",
        "message": "I'm good, thanks. How can I help you today?",
        "created_at": "2024-02-19T12:01:00Z"
      }
    }
  ]
}

In this schema, each conversation is composed of a user message and an ai response. The created_at field is a timestamp that indicates when the message was sent. This will be helpful later on when we want to retrieve the messages in chronological order and to help the AI retrieve information more precisely. In order to ensure that the AI better understands how to make good use of the dataset, when responding to a user message, we need to build a prompt, which will be included in every message being sent to the AI.

Here's one example of how it could look like:

Instruction 1: Below is your conversation History. Draw inspiration from it to respond to the user's message.

HISTORY:

1. Date: 2024-02-19T12:00:00Z. Sender: User. Message: "Hello, my name is Bob."
2. Date: 2024-02-19T12:01:00Z. Sender: AI. Message: "Hi Bob. Nice to meet you."

Instruction 2: Below is the user's latest message. Use the Conversation History above to respond to it.

MESSAGE:

Date: 2024-02-19T12:02:00Z. Sender: User. Message: "Can you recall my name?"

By constructing the prompt above, we ensure that the AI has access to the conversation history and the user's latest message, which will help it to generate a more relevant response. We need to ensure that the instructions are clear and separated from the conversation history and the user's latest message, so that the AI can easily distinguish between them. Also, LLMs love to see bullent points, so use them as much as possible.

Some Considerations

It's only natural that there may be several considerations to be addressed. Here are a few of them:

What if there is no conversation history?

In this case, the AI should be able to generate a response based on the user's latest message.

What if the conversation history is too long?

Since we are retrieving the conversation history from a NoSQL database, we can limit the number of messages to be retrieved. For example, we could retrieve the last 10 messages, or the last 5 minutes of conversation history.
We can also limit the total number of characters to be retrieved. For example, we could retrieve the last 1000 characters of conversation history.

Is this solution scalable?

In terms of the database, yes. NoSQL databases are designed to be scalable and can handle large volumes of data.
However, the AI model itself may not be able to handle large volumes of data, especially when we include a large number of messages in the prompt.

What are the costs like?

The costs will depend on the database provider and the amount of data being stored and retrieved. NoSQL databases are designed to be cost-effective and can handle large volumes of data at a low cost.
LLM costs will depend on how much data is sent with every message and how large the response is. So, naturally, If we send a large amount of data with every message, the costs will be higher.

How effective is this solution?

Generally speaking, not the most effective, as the AI will have to potentially process a large amount of data with every message, which can slow down the response time and make it less relevant. Since the data is not optimized for relevance to the prompt, precision may also take a hit.

Can we do better?

This strategy would be a good starting point, but we can optimize the solution to make it more efficient, effective and potentially cheaper. Let's explore a couple of ways to do this:

Summarize the conversation history

Since the chat history stored in the NoSQL database can be quite long, we can use a summary of the conversation history instead of the entire history. This will help to reduce the amount of data being sent with every message and will also help the AI to generate more relevant responses, since there will be less text to process. We should be cautious with this approach, though, as we don't want to lose important information. So it is important to instruct the AI to summarize, but also to include the most relevant information.
Once the summary is generated, we can include it in the prompt, so that the AI can use it to generate a response. We can also define a maximum number of characters to be included in the summary.

Step 1: summary

Instruction 1: Below is your conversation History. Summarize it to be shorter, Include all relevant information and do not exceed 500 characters.

HISTORY:

1. Date: 2024-02-19T12:00:00Z. Sender: User. Message: "Hello, my name is Bob."
2. Date: 2024-02-19T12:01:00Z. Sender: AI. Message: "Hi Bob. Nice to meet you."

Step 2: Use the summary to get a response

Instruction 1: Below is your Summarized History. Draw inspiration from it to respond to the user's message.

SUMMARY:
On 2024-02-19T12:00, user Introduced himself as Bob. On 2024-02-19T12:01, the AI confirmed kindly.

Instruction 2: Below is the user's latest message. Use the Summart above to respond to it.

MESSAGE:
Date: 2024-02-19T12:02:00Z. Sender: User. Message: "Can you recall my name?"

Positives:

The context size is kept small, which can result to an overall improvement in the response time and relevance of the AI's response.
With good prompt instructions, the AI can be more precise in its response.

Negatives:

An additional API call to is needed to summarize the conversation, whenever it becomes sufficiently large. Although it's within our control to decide how often to summarize, it can result into additional costs and slow down the response time.
Summarizing the conversation history may still lead to loss of important information, as it's very dependent on how the AI interprets the user instructions and conversation history.
Memory is still short-term and still does not scale as the conversation history grows.

Adding metadata to the conversation

We can use libraries that can analyze a user/AI message and add metadata to help the AI to better understand the context of the conversation. For example, we can add metadata to indicate the topic of the conversation, the sentiment of the conversation, or the type of conversation (e.g. casual, formal, business). This will help the AI to generate more relevant responses, since it will have more information about the context of the conversation.

In SQL, the dataset would look something like this:

| conversation_id | sender | message                                      | created_at           | topic           | sentiment       | type            |
|-----------------|--------|----------------------------------------------|----------------------|-----------------|-----------------|-----------------|
| 12345           | user   | Hello, how are you?                          | 2024-02-19T12:00:00Z | introduction    | positive        | casual          |
| 12345           | ai     | I'm good, thanks. How can I help you today?  | 2024-02-19T12:01:00Z | introduction    | pleasant        | casual          |

And in JSON format:

{
  "conversation_id": "12345",
  "messages": [
    {
      "conversation": {
        "id": "12345",
        "conversation_id": "12345",
        "sender": "user",
        "message": "Hello, how are you?",
        "created_at": "2024-02-19T12:00:00Z",
        "metadata": {
          "topic": "introduction",
          "sentiment": "positive",
          "type": "casual"
        }
      }
    },
    {
      "conversation": {
        "id": "12346",
        "conversation_id": "12345",
        "sender": "ai",
        "message": "I'm good, thanks. How can I help you today?",
        "created_at": "2024-02-19T12:01:00Z",
        "metadata": {
          "topic": "introduction",
          "sentiment": "pleasant",
          "type": "casual"
        }
      }
    }
  ]
}

Our search would change a little. For example, we would search our NoSQL database with keywords extracted from the message and retrieve the last N messages with a specific topic, sentiment, or type, with the highest relevance to the user's latest message.

With this optimization, the AI can use the metadata to generate more relevant responses, since it will have more information about the context of the conversation. For example, if the conversation is about a casual topic, the AI can generate a more casual response. If the conversation is about a formal topic, the AI can generate a more formal response.

Positives:

The AI can use the metadata to generate more relevant responses, since it will have more information about the context of the conversation.
This optimization can potentially improve the longevity of the memory, as the metadata can be used to better organize and retrieve the conversation history.

Negatives:

This solution adds more complexity overhead to the user and a dependency on good NLP libraries to extract the metadata.

Closing Thoughts

Persistent memory in Generative AI is a crucial component that allows the AI to remember and recall information from previous interactions. In this article, we explored the Naive NoSQL conversational history retrieval strategy, which involves using a NoSQL database to store and retrieve conversational history, which the AI can then use to generate responses to new user inputs. We also explored its benefits and limitations, and a few ways on how it can be optimized to increase its memory capacity, relevance, efficiency and cost.

There are better ways to search for metadata in a conversation, such as using a vector database, which uses similarity search to find similar conversations and can be more efficient and effective. We can also use a combination of NoSQL and Vector databases to store and retrieve the conversation history, which can be more efficient and effective than using a NoSQL database alone. Stay tuned for more on that in future articles.

Interested in collaborating? Got any cool ideas? feel free to reach out to me on GitHub, LinkedIn, or via email.

Until next time, take care and keep learning!