benjyazoulay/z-image-turbo-lora-controlnet

Z-Image-Turbo with Union ControlNet (Canny, Depth, Pose, HED) and custom LoRA support.

Public
40 runs

Z-Image-Turbo (VideoX-Fun)

This model is an implementation of Z-Image-Turbo based on the VideoX-Fun repository. It utilizes the Union ControlNet (alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union) to allow for highly controllable image generation using various conditions (Canny, Depth, Pose, HED) while supporting custom LoRA weights.

✨ Features

  • Turbo Generation: High-quality image generation with fewer inference steps (typically 20 steps).
  • Union ControlNet: Supports multiple control modes in a single model:
    • canny: Edge detection.
    • depth: Depth map estimation.
    • pose: Human pose estimation.
    • hed: Soft edge detection.
  • Custom LoRA Support: Dynamically load LoRA weights (.safetensors) from a URL to stylize your generations.
  • Smart Resizing: Automatically adjusts output resolution to match the dimensions of your ControlNet input image.

🚀 How to use

Basic Parameters

  • prompt: The text description of the image you want to generate.
  • num_outputs: Number of images to generate (default: 1).
  • num_inference_steps: The number of denoising steps. Default is 20.
  • guidance_scale: The classifier-free guidance scale. Set to 0 for Turbo models or adjust as needed.
  • seed: Random seed for reproducibility. Leave blank or set to -1 for random.

ControlNet Parameters

To guide the generation structure:

  • controlnet_1: Select the control type (canny, depth, pose, hed, or none).
  • controlnet_1_image: URL or file upload of the image to use as the structural reference.
  • controlnet_1_end: Control strength (Control Context Scale). Default is 1.0. Lower values (e.g., 0.6 - 0.8) allow the model more freedom away from the reference structure.

LoRA Parameters

To apply a specific style:

  • lora_weights: URL to a .safetensors (or .tar) file containing the LoRA weights.
  • lora_scale: Strength of the LoRA application (0.0 to 2.0). Default is 1.0.

🔧 Technical Details

  • Base Model: Tongyi-MAI/Z-Image-Turbo
  • Architecture: S3-DiT (Diffusion Transformer)
  • ControlNet: Uses a unified ControlNet model capable of handling multiple conditions by projecting them into the correct latent space.

🔗 Credits

Model created
Model updated