Comparative Study: SD1.5, SDXL, SD3, Pony, Flux

Flux Comparison

The goal of this project is to improve image generation decision-making by analyzing the products of Flux models against other models. I am working to become a data analyst and I want to visually analyze the products of new generative AI models. The foundation of this comparison will rely on a single prompt which will provide a large visual distinction. The order of testing will be chronological and will end with Flux. To know how good the technology is today we must know what it was like before.

Method:

The same prompt will be used throughout this test: “There are four individual characters each sitting at a business meeting. They are Pikachu, Darth Vader, Garfield, and Mickey Mouse.” No negative prompt will be used. No LoRA will be used. I use grid comparisons with Efficiency Nodes in ComfyUI. Each model has 3 checkpoints tested with 5 random seed images each. For Flux I will go deeper to compare the effects of different CFGs and steps to visually represent what parameters are the most desirable.

Limitations:

My capability to use each checkpoint to the best of its ability is limited. I try to obtain objectivity by using exactly the same prompt. I can’t use exactly the same settings across all models. My notes are subjective and open to interpretation. Admittedly, this test is not designed to show the strengths of every model demonstrated, but was created with the intention of demonstrating Flux's capabilities. I do not currently plan to run any tests on speed, but did try to include estimates of the time it took to run the tests. I am not paid to do any of this.

Core Hardware:

CPU: Ryzen 7 7800X3D 8-Core

GPU: RTX 4070 Ti Super Trinity OC

RAM: DDR5 32GB (2x16GB) 7200MHz

Stable Diffusion 1.5:

dreamshaper_8:chilloutmix_NiPrunedFp32Fix: realisticVisionV60B1_v51VAE:

SD1.5 Observation Notes:

It is evident with these 3 tests that SD1.5 is incapable of adhering to the prompt. Most images don’t even have 4 individuals. It clearly recognizes Pikachu across almost every image, possibly because it is the first in the list, and possibly because it is a very popular character globally. Garfield is by far the least represented even though Mickey Mouse is last on the list. Every image contains some hybrid, only a few instances of full separation. The generation time is very fast at about 11.9it/s at 768x768. SD1.5 was officially released October 2022. Some of these images are hilarious though, I’ll give it that. I hope the humor from bad AI image generations is never lost.

Stable Diffusion XL 1.0:

dreamshaperXL_v21TurboDPMSDE:Juggernaut_X_RunDiffusion_Hyper:realvisxlV40_v40Lightning:

SDXL Observation Notes:

SDXL is better in detail, resolution and prompt adherence compared to SD1.5. There are several images containing 4 individuals, and many with clearly defined Pikachu, but it still did not get the prompt fully correct a single time. I personally like the variety of realvisxlV40 the most. I used the recommended steps, scheduler, and sampler for each checkpoint. realvisxlV40_v40Lightning took 28.02 seconds at about 2.43it/s. SDXL was official released July 2023.

Pony V6 XL:

ponyDiffusionV6XL_v6:ponyRealism_v21MainVAE:autismmixPony:

Pony Observation Notes:

Was it even worth testing? Difficult to know, for science, I think it was worth it. This model clearly has an agenda of its own, to say the least. None of the images adhered to the prompt, but it got Pikachu and Darth Vader (storm trooper?) almost every time. Garfield and Mickey Mouse aren't represented at all. 30 steps, 7.0 cfg, euler_ancestral, normal. Run time took 39.13 seconds for the base model at about 4.93it/s. Pony V6 XL was released January 2024.

Stable Diffusion 3:

stableDiffusion3SD3_sd3MediumInclT5XXL:realisticFreedom3_auroraV09:nepotismUnleashed_V2:

SD3 Observation Notes:

Prompt adherence was much better overall with SD3 compared to SDXL and SD1.5. It conformed towards a 2d cartoon style in some images. For the base model, out of the 5 images total, 4 visibly demonstrate all of the characters. Only 1 of the images replaced Garfield with a person. The second image to the left generated Mickey Mouse terribly, but I still consider that a success. Nepotism had the worst results overall, and realisticFreedom almost did as well as the base model. The run times took 35.39, 46.57, and 46.08 seconds respectively. I don’t think SD3 will be developed much further, especially with the release of Flux. 25 steps, 5.0 cfg, euler, sgm_uniform for all SD3 generations. SD3 weights were released June 2024.

Flux GGUF Quants:

FluxFusionDS_v0_Q2_K:FusionDS_v0_Q4:FusionDS_v0_Q8:

Flux GGUF Quants Observation Notes:

Flux achieved the prompt correctly almost every time. It adapted to a cartoon drawing art style which, in my opinion, is fitting for the demand. Out of 15, there are 3 with duplicate errors. The biggest issue is duplicates. Surprisingly Q2 (the smallest quant) got the prompt correct 5 of 5 times, although some of the legs are weird and the quality is visibly worse. Q8 provided the highest quality.

6 steps, 1 cfg, euler, normal for all generations.

Run times for the batch of 5:

FluxFusionDS_v0_Q2_K: 51.28 seconds, ~1.4s/it

FusionDS_v0_Q4: 51.03 seconds, ~1.4s/it

FusionDS_v0_Q8: 55.84 seconds, ~1.3s/it

Each step takes significantly longer than SD3, but with only 6 steps, it overall doesn't take much more time, and the success rate is clearly much higher.

FluxFusionDS_v0_Q2_K Steps + CFG Test:

Observation Notes:

This grid makes it pretty evident that a CFG of around 1.0 is the best. Anything lower than 0.8 will produce washed out undesirable results. I typically just stick to exactly 1.0. As for the steps, there do seem to be some benefits to quality with 8 steps rather than 6, but negligible. This grid demonstrates why I just stick to 6 steps and 1.0 CFG when using Flux GGUF. Always weird how a low CFG will produce something completely unexpected like a person running in a desert with a cloudy blue sky background. From my experience, anything above a 1.4 CFG will result in a huge loss of detail and over saturation.

FluxFusionDS_v0_Q2_K CFG Speed Test:

CFG 0.5: 2.59s/it, 2.51s/it, 2.55s/it.

CFG 1.0: 1.48s/it, 1.46s/it, 1.42s/it. (1.0 is ~43% faster)

CFG 1.5: 2.59s/it, 2.57s/it, 2.53s/it.

A CFG of exactly 1.0 gives a much faster generation time.

FluxFusionDS_v0_Q2_K Steps Test:

Observation Notes:

There are benefits to having 10 steps rather than 6 steps, but at 14 steps there is no noticeable improvement over 10 steps. Anything below 6 steps degrades quality. I think that going for 10 steps is worth it in this case.

Direct Model Comparisons:

SD1.5 dreamshaper_8:SDXL realvisxlV40_v40Lightning:Pony ponyDiffusionV6XL_v6:
SD3 stableDiffusion3SD3_sd3MediumInclT5XXL:Flux FusionDS_v0_Q8:Craiyon (for fun):

Conclusion:

Flux successfully adhered to the prompt. The base SD3 model was also fairly successful. SD1.5, SDXL, and Pony were incapable of adhering to the prompt.

6 to 10 steps, 1.0 CFG for Flux GGUF models is the best. A 1.0 CFG for Flux GGUF is also ~43% faster than any other CFG I've tested.

If you have any suggestions/tips or want to see any specific comparisons, please feel free to comment. This study is now concluded.

Comparative Study: SD1.5, SDXL, SD3, Pony, Flux | Civitai (2024)