zsxkib/seedvr2

🔥 SeedVR2: one-step video & image restoration with 3B/7B hot‑swap and optional color fix 🎬✨

Public
6.7K runs

SeedVR2 Cog (3B & 7B) 🎥✨

Overview

SeedVR2 Cog packages ByteDance-Seed’s one-step diffusion transformer for both videos and stills. This build hot-swaps between the 3B and 7B checkpoints on a single GPU, adds CDN-friendly weight caching, keeps source audio when returning MP4s, and now exposes an optional wavelet-based colour correction (apply_color_fix) for users who want the same hue preservation as the official Gradio demo.

Give the original research team some love: - Project page — SeedVR2

What’s included

  • Dual checkpoints: 3B loads by default; set model_variant="7b" to bring in the larger model when you have the VRAM.

  • Optional colour fix: Flip apply_color_fix=true to blend the model’s high-frequency detail with the original colour field.

  • Audio passthrough: MP4 outputs inherit the source audio stream when ffmpeg is available (Replicate’s image already includes it).

  • Deterministic caching: All large assets download once via Replicate’s CDN with pget, so versioned builds stay reproducible.

Inputs

Name Type Description Default
media file / URL Video (.mp4, .mov) or image (.png, .jpg, .webp).
model_variant string "3b" or "7b". 7B provides higher fidelity if your GPU can keep it resident. 3b
sample_steps int Diffusion steps (1 = one-pass mode as in the paper). 1
cfg_scale float Guidance strength; >1 sharpens, <1 softens. 1.0
apply_color_fix bool Wavelet colour reconstruction that aligns hues with the input. false
sp_size int Leave at 1 for single-GPU runs; higher values only adjust padding. 1
fps int Output frame rate for videos. 24
seed int? Optional deterministic seed. random
output_format string Image outputs: "png", "webp", "jpg". webp
output_quality int JPEG/WebP quality when using lossy formats. 90

Tips

  • GPU sizing: 3B fits comfortably on 80 GB cards (A100/H100 80G). The auto dual-load feature preloads both checkpoints only when VRAM ≥120 GB (e.g., H200). Otherwise it stages them between GPU and CPU memory.

  • Colour fix: Leave it off for the legacy look; turn it on to keep input hues on skin tones and skies—especially when the model is aggressively sharpening.

  • Long clips: SeedVR2 was trained up to 121 frames. We automatically pad/truncate beyond that so you don’t have to pre-chunk.

  • Audio: MP4 outputs preserve the original soundtrack via an audio copy step, so you keep sync without re-encoding.

Limitations

  • Heavy motion blur or extreme low light can still stump the model.

  • Over-sharpening can occur on already clean footage—turn cfg_scale down or keep colour fix off if it feels too crunchy.

  • This build is tuned for single-GPU inference; multi-GPU sequence parallel isn’t enabled.

Credits & License

I just made it behave nicely on Replicate.


⭐ Star the repo on GitHub!
🐦 Follow me on X/Twitter: @zsakib_
💻 More projects: github.com/zsxkib

Model created
Model updated