AI SaaS Founders Use this Tech to Make $70k/m

You’ve probably seen these AI image generators racking in over $70k/m in revenue for SaaS founders like Pieter Levels.

Pieter Levels PhotoAI generates 70K / month

In this blog I will breakdown the underlying tech behind these image generators that these founders use over API in their SaaS products.

Hi 👋, I’m Vlad , and I run a unique software dev agency called DevSquadSix.

I started helping aspiring founders launch their SaaS by giving away free software templates you can find on vladshostak.com.

Running your own AI image generation model requires running a GPU in the cloud (or on your local machine) and setting up the packages to run it correctly.

So there are 2 very popular services that do that for you

and give you access to all of the functionality via an API

These 2 services are:

#1 Replicate -> https://replicate.com

replicate ai image generation

#2 fal -> https://fal.ai

fal AI image generation with flux and dream booth

#3 (Noteworthy mention) Ideogram 2.0 -> https://ideogram.ai/

diagram ai image generation

However, if you wanted to run these models on your own local hardware

or on your own cloud infrastructure (AWS, Cloud, Azure) you could do that with DreamBooth.

The 2 services above use Dreambooth underneath.

So let’s breakdown what Dreambooth, Flux, and all these other terms mean,

so we can understand exactly what this tech is and how it works.

What is DreamBooth?

DreamBooth is a technique to personalize text-to-image models like Stable Diffusion using only a few images of a specific subject.

For example, if you have a few pictures of yourself, DreamBooth can fine-tune the model so it can generate new images of your you in various scenes or styles, even if it hasn’t seen those exact scenarios before.

Matt Wolfe using ai image generation Flux and LoRa

link to tweet

What is FLUX.1?

FLUX.1 is a specific text-to-image model that can be trained using DreamBooth.

it is now the NEW best open-source AI image model currently available

stablecog article on Flux.1 Black Forest labs

It’s a powerful tool, but it requires significant computational resources, particularly in terms of memory (VRAM).

That is why these services mentioned above are

the way to go when using in your SaaS product.

flux image generation example

Link to tweet

and btw Runway is the go-to service for AI video generation

More examples:

more flux ai image generation examples

Link to tweet

As you can see it generates a lot of fine detail, making images look very realistic or artistically complex.

Also, Flux relies on this thing called LoRa…

What is LoRA (Low-Rank Adaptation) ?

Big AI models like FLUX.1 have millions or even billions of parameters, making them expensive and resource-intensive to train from scratch.

LoRA allows you to fine-tune these models with much fewer resources by focusing only on the most relevant parts of the model, reducing memory and computational requirements.

It’s a technique used to fine-tune large AI models efficiently.

How Do DreamBooth and LoRA Work Together?

When you use DreamBooth to fine-tune a model like FLUX.1, you can incorporate LoRA to make the process more efficient.

Essentially, DreamBooth handles the personalization of the model (teaching it about your specific subject),

while LoRA ensures that this fine-tuning process doesn’t require excessive computational resources.

What exactly is DreamBooth, and how does it integrate with FLUX.1?

Remember, DreamBooth is a fine-tuning technique designed to **personalize **text-to-image models like FLUX.1 using just a few images of a specific subject.

The goal is to make the model capable of generating new images of that subject in various contexts and styles,

even if it hasn’t seen those exact scenarios before.

When working with FLUX.1, you start by collecting a small set of images representing the subject you want the model to learn about.

This subject could be anything from a person to an object, like a specific type of dog or a unique landmark.

What makes DreamBooth with FLUX.1 different from other fine-tuning methods?

DreamBooth’s focus on personalization with minimal data (3–5 images) and its integration with a powerful model like FLUX.1 means you get highly specific and detailed results.

This is especially valuable in creative industries where you need precise control over image generation.

What does it mean that FLUX.1 is a gated model, and why is this important?

A gated model means that access to FLUX.1 is restricted or controlled. Before you can use it, you need to request access through Hugging Face.

Why is FLUX.1 Gated? Gated models are often gated to:

Control Usage: Ensure that only approved users can access the model, often due to its powerful capabilities or to manage computational resources.
Compliance: Certain models might have ethical or legal considerations that require controlled access.

What are the practical steps to run DreamBooth with FLUX.1 on my local machine?

Follow THESE steps, but generally:

To run DreamBooth training locally, you’ll need to:

Install Dependencies: Use pip to install libraries from the provided requirements file. This ensures you have all the necessary tools to run the training scripts.
Configure Your Environment: Use tools like accelerate config to optimize your environment for running the training. This might involve configuring settings for mixed precision training (using bf16 instead of full precision to save memory).
Execute the Training Script: With the environment set up, you’ll run the training script (train_dreambooth_flux.py). This script will take your images and fine-tune FLUX.1 according to the parameters you specify (e.g., learning rate, prompt, resolution).

BUT

Running DreamBooth locally can be challenging due to the high memory requirements of FLUX.1.

If your hardware is limited, you might encounter out-of-memory errors or slow processing times. Tools like LoRA can help mitigate this by reducing the memory footprint.

Why does FLUX.1 require so much memory, and how does LoRA help?

FLUX.1 is a large model with 12 billion parameters, which requires a lot of memory (VRAM) to process. The more parameters, the more memory needed, especially when training at high resolutions (e.g., 1024x1024 pixels).

LoRA (Low-Rank Adaptation) helps by reducing the number of parameters that need to be fine-tuned during training.

Instead of adjusting all 12 billion parameters, LoRA focuses on a smaller subset, which significantly cuts down the memory requirements.

This makes it possible to fine-tune FLUX.1 on consumer-grade GPUs, though some trade-offs in training speed or output quality might occur.

What is a text encoder, and why would I need to train it?

A text encoder in the context of AI models like FLUX.1 is a component that processes the input text (your prompts) and converts it into a form the model can understand and use to generate images.

The better the text encoder, the more accurately the model can interpret and visualize your prompts.

FLUX.1’s Text Encoders:

CLIP: A popular model for processing text and images together. It helps the model understand and match text prompts with image generation.
T5: Another text processing model, though in FLUX.1, only CLIP is fine-tunable.

Why would I fine-tune a text encoder?

Fine-tuning the text encoder can improve the model’s ability to understand specific or complex prompts, making the generated images more accurate to what you describe.

This is especially useful if you’re working with specialized vocabulary or unique styles.

How do I fine-tune the text encoder?

To fine-tune the CLIP text encoder, you would add the --train_text_encoderflag when running your DreamBooth script.

This tells the model to not only adjust its image-generating capabilities but also improve how it interprets the text you give it.

I want to train my own AI model and build a custom software using the latest AI image models

We will help get you started with a free call and a free guide at DevSquadSix to help clarify your vision and direction.