I built an AI Avatars Generator using Stable Diffusion. Here’s how to build your own. Part 1.
Step-by-step guide with code open-sourced.
I built my own AI Avatars Generator ParallaxAI (www.parallaxai.app) and open-sourced it. The code is available on Github (Repo link). In this post I tell about pre-code work, high-level architecture, frameworks and tooling, and some reflection on why it became live in less than 5 days. By the end of this post, we'll have a high-level product and architectural overview. I'll explore code components in future articles, so subscribe for updates.
Earlier this year I released ParallaxAI (www.parallaxai.app) an AI Avatar Generator which generates avatars in my country’s traditional(Kazakh) style.
It took me about 5-7 day from idea to first customer.
This was my first ship for a side project which actually brought me some revenue and paying customers, although it definitely wasn’t the first try. I was very happy with the final product and want to share what I used to make it happen in such a short timeframe.
AI Technologies used
The original idea came from the viral success of LensaAI. However, after trying various AI image generators, I was disappointed that none of them recognized Kazakh national apparel and style. This wasn't surprising given the limited training images available online and in training datasets. So the idea of a national apparel-style AI avatar generator stuck with me.
Before committing to the project, I researched technologies for producing high-quality avatars and fine-tuning techniques.
Fine-tuning is a term in AI which means taking an already pre-trained model and adjusting it to specific style or object. In my case its user’s face and Kazakh style.
There were 2 main text to image generation technologies available on the market.
Midjourney - exceptional quality, no API, no fine-tuning, developed by a small team of developers, low likelyhood of open sourcing.
Stable Diffusion - lower quality compared to Midjourney, open sourced from day one, 3rd party API providers, since weights are open sourced and available on Hugging Face (github for AI models). I will likely make another post about Hugging Face and its role in current AI race, so subscribe if you are interested.
It was clear I need to use Stable Diffusion but a question of fine-tuning for a user face features was still open. After googling “fine tuning Stable Diffusion” in Feb 2023 one of the first results was DreamBooth technology, which was available as an API on Replicate Platform. It was working nice, but it costed 2.5$ to train one model and took more than 20 minutes depending on training set.
One day I read Twitter and saw a tweet about newly emerged Lora training method for a Stable Diffusion (github) which decreased computational resources needed for fine-tuning two times compared to then state-of-the-art DreamBooth. At that point, I knew I had to find a way to apply this emerging technology to my users' photos. And later the next day Replicate announced it added Lora as an API to its platform and with Lora it now costs less than 1$ to train a model for less than 8 minutes of time.
A brief note on the Replicate Platform:
Replicate enables running COG containers (a special subset of Docker containers abstracting away Machine Learning aspects) as an API on machines provided by the platform. Replicate essentially solves three critical problems with running AI models:
Containerization for models
Models deployment + API access
Providing compute resources
With Replicate, accessing the latest state-of-the-art models is as easy as an API call, and deployment is as simple as deploying a Python server application on AWS. If you're an AI enthusiast, definitely check out their platform.
After completing the prep work, I was confident about my approach. It was time to outline the MVP architecture and choose the right stack.
After an hour of “intensive” system design, I came up with the following high-level overview:
Here we have 3 main components:
Web Frontend stores photos to S3 directly from browser and makes a request to backend for a new incoming job.
Backend stores job in DB.
Cron calls a job processor endpoint once in a minute to move Job to a next state. All in all we are goin to have 5 states:
PENDING - we need to create request for model creation to Replicate
MODEL_CREATING - we need to ping Replicate if model creation is done using polling
MODEL_CREATED - we need to create request for avatars generation to Replicate
INFERENCING - we need to check if avatars are done by polling
INFERENCING_COMPLETED - we need to send an email to a user with photos
FINISHED - all done
As you can see everything is pretty straightforward it is literally one web form on frontend and one cron with two API calls on a backend moving things forward.
Throughout my Software Development career I developed web components for online games in Wargaming with Python, ride-hailing service Yandex Taxi in C++ and now I develop another ride-hailing service Bolt in Typescript/NodeJs. So choosing a stack for my project was a challenge.
Given the simplicity of the project setup for the sake of the MVP and to avoid wasting time on deployment management, I decided to try Vercel. It turned out to be the best decision, as Vercel offers CI/CD, hosting, and edge functions with almost limitless bandwidth and build minutes for solo developers in their Free Tier. Their integration with Next.js works like magic since it's essentially a native environment for Next.js. Production deployments are created with every merge to the main branch on GitHub, making iterations quick. With Vercel, I set up a boilerplate project in less than 10 minutes, and after that, I spent 99% of my time coding my app's business logic, as Vercel handled everything else, including deployments, monitoring, and basic analytics. As of May
As of 1st of May 2023 they also announced the platform now provides storage options on the Edge, meaning Vercel now offers a full shipment pipeline from code to storage to CI to deployment and hosting.
Also a quick hack for my fellow coders: abandon commit messages for you solo projects! I made this Makefile and only worked on main branch.
g: git add . && git commit -m "lego" && git push
With this production deployment was as easy as `make g` and I did 5-10 production deployments a day.
(Yes I know it won’t work for large projects with mission critical flows but at a time a had 1.5 customers so breaking things was ok)
Why it worked this time?
Personally I am extremely bad at bringing projects to live. My last successful side project attempt was on my senior year at college. Although it wasn’t that much complex it was finished and was actually useful since it boosted my response rate for my resume while applying for my first job as a Software Developer.
Since then, I've had two or three other side projects that were abandoned midway through development. So, after some reflection, I outlined three main reasons why I finished this project and not the others, sorted by impact, so you can use it as a sanity checklist:
MVP functionality was clearly outlined.
From the very beginning From the beginning, I set boundaries defining functional requirements (what an MVP should do) for the MVP:
Collect user photos for training a model
Collect payment from the user
Train a model and generate 100 avatars (because LensaAI generated 100)
Send the link with generated avatars via email
I also committed not to extend functionality until the minimum was implemented. With clear and outlined app functionality, it's easy to stay focused.
The scope was inspiring and interesting for me.
It was the beginning of 2023, AI was gaining popularity, and LensaAI was the only personalized AI Avatars app. Breakthroughs were happening daily in the field, and it was hugely inspiring to try something new.
I did some prep work.
Researching the available AI opportunities on the market helped me avoid frustration midway through the project due to a lack of viable implementation solutions.
Final thoughts and next steps
I was amazed at how smooth and enjoyable this project was for me, and I hope this blog post is useful to you as well. This was also the first project where I extensively used Copilot, and it wrote almost 60% of all .ts files in the repo. I'll discuss the code part and my case studies on Copilot in future posts.
If you enjoyed this post and want to stay updated with the latest on web development, product engineering, AI, and weekly tech reviews, consider subscribing to my newsletter. You'll get exclusive insights, tips, and discussions sent straight to your inbox. Subscribe now and don't miss out on any exciting updates.
Next posts coming May 2023:
I built an AI Avatars Generator using Stable Diffusion. Here’s how to build your own. Part 2. Code Review.
I’ve got promoted 6 times in 3 years, here is what I learned
Hugging Face - the blood system of modern AI race
Copilot wrote 60% of code for my last project, but I still give up on it
Mojo lang - Python on steroids, future of AI dev