logo
Published on

Thoughts on the Latest OpenAI APIs and starting a New Project

Authors
  • avatar
    Name
    Athos Georgiou
    Twitter

Thoughts on the Latest OpenAI APIs and starting a New Project

example

It's been a bit over a month since OpenAI released their latest API. When I read the announcement of the new features I first felt thrilled, but then I was somewhat taken aback. I wasn't sure what to make of it, because some of the new APIs rendered a number of my previous projects obsolete. So, I took my hands off the keyboard for a few days and started thinking about what the new features are and what they could mean for the future of Generative AI, as well as my own projects. And then I started a new project from scratch to see what the buzz is all about.

The New Features

The gist of the new features (that I will focus on) is:

  • Image generation using DALL-E-3
  • Text-to-Speech to complement the existing Speech-to-Text Whisper API
  • Vision, which allows the model to take in images and answer questions about them
  • The Assistants API, which currently supports
    • Code Interpreter
    • Retrieval
    • Function calling

You can read more about the new features in the Official OpenAI API.

This is cool stuff. But why was I taken aback?

Simply put, the addition of Assistants meant that I had to rethink my approach to building AI Assistants. Features, such as long term memory, RAG (Retrieval-Augmented Generation), and code interpretation, used to be features that we had to implement ourselves. This has now changed, and we can use them with a few lines of code.

But that's cool, right? Well, yes and no.

Yes, because:

  • We can now build more complex applications with less code
  • Assistants can work as abstractions in a way that more contributors can use them, without having to worry about the underlying implementation
  • Furthermore, RAG (Retrieval-Augmented Generation), persistent memory, code interpretation among other features are now baked in the Assistants API and can be used with a few lines of code. So, we can focus more on personalizing the experience, rather than implementing the features ourselves

But also no, because:

  • We now have less control of the underlying implementation, thus less control over our data. Our conversations, uploaded files and other data are now stored in OpenAI's servers, and we can't really know how they are stored, or how they are used
  • We can't really know what RAG approach is used, how the memory is implemented, etc. Drastic changes in the implementation of these features could break our applications
  • Assistants are still in beta, and although unlikely, there's no guarantee that they will be supported in the future. But for now, this means that we can't really rely on them for production applications
  • Streaming chat support is not yet supported for Assistants. Not only that, but document queries can take a while to complete. To be fair, this has been acknowledged by OpenAI, and they are working on introducing streaming support for Assistants in the near future

No problemo, let's see what this new stuff is all about!

So, I thought "Why not start a new project from scratch and implement some of the new API features?" I've been working on it for the last month or so and I called it Titanium, after the element from the periodic table. It's an AI Assistant Template that includes a variety of features, such as:

  • Multi-user Authentication using next-auth ✅ (Including a custom CredentialProvider for guest accounts)
  • Customizable, Multipurpose Assistants with File Upload support ✅ (Also supports complete deletion of all Assistant related data)
  • Code Interpretation/Generation ✅ (Supported as part of Assistants implementation)
  • Query/Discussion of uploaded documents. ✅ (Supported as part of Assistants implementation)
  • Image Analysis/Generation 🔄 (TODO)
  • Traditional RAG, using Vector DB's 🔄 (TODO)
  • Persistent multi-user memory 🔄 (TODO)

I also took the opportunity to brush up my React skills and try out Material-UI, which I've had a blast building the App with.

Overall I'm pretty happy with the learnings of this journey. Sometimes we have to take a step back and rethink our approach to things. Once we come back, we can see from a different perspective and try out something new. I'm also pretty happy with the new project, and I'm looking forward to adding more features to it. Got any ideas? Feel free to open an issue or a PR on Titanium.

That's all for now!

I hope you've found this post useful. If you have any questions or comments, feel free to reach out to me on GitHub, LinkedIn, or via email.

See ya around and happy coding!