benjyazoulay/z-image-turbo-lora-controlnet

Z-Image-Turbo with Union ControlNet (Canny, Depth, Pose, HED) and custom LoRA support.

Public
40 runs

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Z-Image-Turbo (VideoX-Fun)

This model is an implementation of Z-Image-Turbo based on the VideoX-Fun repository. It utilizes the Union ControlNet (alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union) to allow for highly controllable image generation using various conditions (Canny, Depth, Pose, HED) while supporting custom LoRA weights.

✨ Features

  • Turbo Generation: High-quality image generation with fewer inference steps (typically 20 steps).
  • Union ControlNet: Supports multiple control modes in a single model:
    • canny: Edge detection.
    • depth: Depth map estimation.
    • pose: Human pose estimation.
    • hed: Soft edge detection.
  • Custom LoRA Support: Dynamically load LoRA weights (.safetensors) from a URL to stylize your generations.
  • Smart Resizing: Automatically adjusts output resolution to match the dimensions of your ControlNet input image.

🚀 How to use

Basic Parameters

  • prompt: The text description of the image you want to generate.
  • num_outputs: Number of images to generate (default: 1).
  • num_inference_steps: The number of denoising steps. Default is 20.
  • guidance_scale: The classifier-free guidance scale. Set to 0 for Turbo models or adjust as needed.
  • seed: Random seed for reproducibility. Leave blank or set to -1 for random.

ControlNet Parameters

To guide the generation structure:

  • controlnet_1: Select the control type (canny, depth, pose, hed, or none).
  • controlnet_1_image: URL or file upload of the image to use as the structural reference.
  • controlnet_1_end: Control strength (Control Context Scale). Default is 1.0. Lower values (e.g., 0.6 - 0.8) allow the model more freedom away from the reference structure.

LoRA Parameters

To apply a specific style:

  • lora_weights: URL to a .safetensors (or .tar) file containing the LoRA weights.
  • lora_scale: Strength of the LoRA application (0.0 to 2.0). Default is 1.0.

🔧 Technical Details

  • Base Model: Tongyi-MAI/Z-Image-Turbo
  • Architecture: S3-DiT (Diffusion Transformer)
  • ControlNet: Uses a unified ControlNet model capable of handling multiple conditions by projecting them into the correct latent space.

🔗 Credits

Model created
Model updated