logo
Published on

So, I've been doing stuff...

Authors
  • avatar
    Name
    Athos Georgiou
    Twitter

2023 - The year of GenAI

2023 - The year of GenAI

2023 was quite a year, wasn't it? From the early days when ChatGPT was released to the most recent announcement of Sora, OpenAI's newest diffusion model that can generate video from text, I feel I've been on a Journey that keeps on giving. I've learned so much, and I've had the pleasure of meeting some amazing people along the way. I've also had the opportunity to build some cool stuff, and I'd like to share some of it with you. But first, let's take a step back and reflect on the past year.

I've been doing stuff!

What's up?

With the recent breakthroughs in Generative and Conversational AI, I got to become a kid again, but this time around by being part of the journey. So I put my head down and started learning and before I knew it, I had already made over 1,000 contributions on GitHub before the year was over. That was over 1,000 more than I had made in the previous year, or the year before. I had to stop and think; Wow, I've been doing stuff!

In reality, such contributions don't mean much, if at all. Most of them were mostly about learning and experimenting with new technologies and ideas. But it was a start, and I was happy to be part of the journey. And when I look back, the best experiences I've had in this journey are the ones involving collaboration, learning, and sharing. I've had the pleasure of joining Hackathons and meeting some amazing people, which I still talk to and collaborate with today. I've built a few things that I'm proud of, however and I'd like to show them off a little bit.

Well, what kind of things?

Well, what kind of stuff? First, Let me tell you a little story - When OpenAI released the Assistants API towards the end of 2023, I was caught by surprise. I thought that concepts I had learned about, such as RAG (Retrievement Augmented Generation) and Long Term Memory were to become obsolete. I mean, that thing has a lot going for it - Built-in RAG, Long term memory, A Code Interpreter, you name it. In a way I was excited, because New toys!, but I was also a little bit concerned. I had spent the better part of the year learning about these concepts and building my own implementations, and now it seemed like it was already obsolete. So I took a week off to reflect and think about what I wanted to do next. And then it hit me - "What if I built something that would Implement all the latest APIs from OpenAI, such as the Assistants API, Vision and Speech? That way, I would really get an insight on how good this Assistant thing really is and whether my initial concerns were valid or not. (Let me save you the trouble - They were not.)

And thus, Titanium was born.

Like, no, I didn't create the Element Titanium from the periodic table. But since I named all my previous projects after elements, why not?

Titanium - The Super Simple Template

Um, what?

Titanium is a modern web application built with Next.js, leveraging the latest OpenAI APIs to offer an advanced Generative and Conversational AI experience. It's still pretty much a prototype, but I think it's a good start. Here's a list of some of the features:

  • Multi-user Authentication using next-auth, including a custom CredentialProvider for guest accounts.✅
  • Customizable, Multipurpose Assistants with File Upload support. Also supports complete deletion of all Assistant related data.✅
  • Vision via 'gpt-4-vision-preview'. Currently supports Image Analysis for multiple urls. File uploads may come later, but not a priority.✅
  • Text to Speech (TTS), Supporting tts-1, tts-1-hd and all available voice models.✅
  • Speech to Text (STT), available via button toggle in the input chat box.✅
  • Retrieval Augmented Generation (RAG), Using advanced document parsing by Unstructured.io API, ada-003 Embeddings by OpenAI and Pinecone Serverless for fast and efficient indexing & retrieval.✅

Some of the features I'm working on include:

  • Persistent multi-user Memory.🚧
  • Image Generation via DALLE-3.🚧
  • Video (TTV) - As per latest reveal from OpenAI's latest Diffusion Model, called Sora.🚧

And the obligatory:

  • Bug fixes and performance improvements.🐛
  • Refactor the spaghetti.🍝

I call it super simple, because It's as minimal as possible, and it's designed to be a template for anyone who wants to build their own Generative and Conversational AI application. It's also built in Typescript, which is the best thing since sliced bread. Actually, I'm more of a sourdough guy, but you get the point.

I think I may have just pissed off a few people.

What's next?

My goal is to build a template as modular as possible, so that each feature is a separate module that can be easily added or removed - Something like a mono-repo, but not quite. So in that sense, there's quite a bit to do, in order to de-spaghettify the code. I also want to add more features, especially Text to Video, which I believe will be the highlight of 2024. The main goal is to keep learning , experimenting and sharing.

So, take it, break it, make it your own. Make a Million dollars (Or Euros, or Peanuts), and then give me a cut. Or not. I'm not picky. I just hope it can help you save time and effort, and that you can learn something from it. Or, even better - Maybe you can fix my bugs and teach me something.

So, yeah - I've been doing stuff. And I'm excited to see what the future holds. I hope you are too!

Interested in collaborating? Got any cool ideas? feel free to reach out to me on GitHub, LinkedIn, or via email.

Until next time, take care and keep learning!