SeedVR2 Cog (3B & 7B) 🎥✨

Overview

SeedVR2 Cog packages ByteDance-Seed’s one-step diffusion transformer for both videos and stills. This build hot-swaps between the 3B and 7B checkpoints on a single GPU, adds CDN-friendly weight caching, keeps source audio when returning MP4s, and now exposes an optional wavelet-based colour correction (apply_color_fix) for users who want the same hue preservation as the official Gradio demo.

Give the original research team some love: - Project page — SeedVR2

Hugging Face release — ByteDance-Seed/SeedVR2
Demo space — SeedVR2 on Hugging Face Spaces
Paper — SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

What’s included

Dual checkpoints: 3B loads by default; set model_variant="7b" to bring in the larger model when you have the VRAM.
Optional colour fix: Flip apply_color_fix=true to blend the model’s high-frequency detail with the original colour field.
Audio passthrough: MP4 outputs inherit the source audio stream when ffmpeg is available (Replicate’s image already includes it).
Deterministic caching: All large assets download once via Replicate’s CDN with pget, so versioned builds stay reproducible.

Inputs

Name	Type	Description	Default
`media`	file / URL	Video (`.mp4`, `.mov`) or image (`.png`, `.jpg`, `.webp`).	–
`model_variant`	string	`"3b"` or `"7b"`. 7B provides higher fidelity if your GPU can keep it resident.	`3b`
`sample_steps`	int	Diffusion steps (1 = one-pass mode as in the paper).	`1`
`cfg_scale`	float	Guidance strength; >1 sharpens, <1 softens.	`1.0`
`apply_color_fix`	bool	Wavelet colour reconstruction that aligns hues with the input.	`false`
`sp_size`	int	Leave at `1` for single-GPU runs; higher values only adjust padding.	`1`
`fps`	int	Output frame rate for videos.	`24`
`seed`	int?	Optional deterministic seed.	random
`output_format`	string	Image outputs: `"png"`, `"webp"`, `"jpg"`.	`webp`
`output_quality`	int	JPEG/WebP quality when using lossy formats.	`90`

Tips

GPU sizing: 3B fits comfortably on 80 GB cards (A100/H100 80G). The auto dual-load feature preloads both checkpoints only when VRAM ≥120 GB (e.g., H200). Otherwise it stages them between GPU and CPU memory.
Colour fix: Leave it off for the legacy look; turn it on to keep input hues on skin tones and skies—especially when the model is aggressively sharpening.
Long clips: SeedVR2 was trained up to 121 frames. We automatically pad/truncate beyond that so you don’t have to pre-chunk.
Audio: MP4 outputs preserve the original soundtrack via an audio copy step, so you keep sync without re-encoding.

Limitations

Heavy motion blur or extreme low light can still stump the model.
Over-sharpening can occur on already clean footage—turn cfg_scale down or keep colour fix off if it feels too crunchy.
This build is tuned for single-GPU inference; multi-GPU sequence parallel isn’t enabled.

Credits & License

Research, training & weights: Jianyi Wang, Shanchuan Lin, Zhijie Lin, Yuxi Ren, Meng Wei, Zongsheng Yue, Shangchen Zhou, Hao Chen, Yang Zhao, Ceyuan Yang, Xuefeng Xiao, Chen Change Loy, Lu Jiang.
Upstream GitHub: ByteDance-Seed/SeedVR (https://github.com/ByteDance-Seed/SeedVR) (Apache 2.0).
This Cog wrapper: MIT licensed — github.com/zsxkib/cog-ByteDance-Seed-SeedVR2-3B (https://github.com/zsxkib/cog-ByteDance-Seed-SeedVR2-3B)

I just made it behave nicely on Replicate.

⭐ Star the repo on GitHub!
🐦 Follow me on X/Twitter: @zsakib_
💻 More projects: github.com/zsxkib

Model created 3 months, 1 week ago

Model updated 3 months ago