Z-Image-Turbo (VideoX-Fun)

This model is an implementation of Z-Image-Turbo based on the VideoX-Fun repository. It utilizes the Union ControlNet (alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union) to allow for highly controllable image generation using various conditions (Canny, Depth, Pose, HED) while supporting custom LoRA weights.

✨ Features

Turbo Generation: High-quality image generation with fewer inference steps (typically 20 steps).
Union ControlNet: Supports multiple control modes in a single model:
- canny: Edge detection.
- depth: Depth map estimation.
- pose: Human pose estimation.
- hed: Soft edge detection.
Custom LoRA Support: Dynamically load LoRA weights (.safetensors) from a URL to stylize your generations.
Smart Resizing: Automatically adjusts output resolution to match the dimensions of your ControlNet input image.

🚀 How to use

Basic Parameters

prompt: The text description of the image you want to generate.
num_outputs: Number of images to generate (default: 1).
num_inference_steps: The number of denoising steps. Default is 20.
guidance_scale: The classifier-free guidance scale. Set to 0 for Turbo models or adjust as needed.
seed: Random seed for reproducibility. Leave blank or set to -1 for random.

ControlNet Parameters

To guide the generation structure:

controlnet_1: Select the control type (canny, depth, pose, hed, or none).
controlnet_1_image: URL or file upload of the image to use as the structural reference.
controlnet_1_end: Control strength (Control Context Scale). Default is 1.0. Lower values (e.g., 0.6 - 0.8) allow the model more freedom away from the reference structure.

LoRA Parameters

To apply a specific style:

lora_weights: URL to a .safetensors (or .tar) file containing the LoRA weights.
lora_scale: Strength of the LoRA application (0.0 to 2.0). Default is 1.0.

🔧 Technical Details

Base Model: Tongyi-MAI/Z-Image-Turbo
Architecture: S3-DiT (Diffusion Transformer)
ControlNet: Uses a unified ControlNet model capable of handling multiple conditions by projecting them into the correct latent space.

🔗 Credits

Based on VideoX-Fun by Alibaba PAI & Tongyi-MAI.
Original weights: HuggingFace.

Model created 2 days, 22 hours ago

Model updated 1 day, 18 hours ago