zsxkib/samurai

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Public
545 runs

Run time and cost

This model costs approximately $0.021 to run on Replicate, or 47 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 15 seconds. The predict time for this model varies significantly based on the inputs.

Readme

SAMURAI Object Tracker

A simple-to-use model that tracks objects in videos using SAM 2’s technology. Just point to what you want to track in the first frame, and it will follow that object throughout the video.

How It Works

You provide: - A video file or folder of frames - The starting position (x, y coordinates) and size (width, height) of what you want to track

The model gives you: - A video showing what’s being tracked (with a red highlight) - Frame-by-frame tracking data in a standard format called COCO RLE (which is a space-efficient way to store mask information)

Output Format

The tracking data comes as a dictionary where:

frame_number: [{
    "size": [height, width],    # Size of the video frame
    "counts": "encoded_string", # Mask data in COCO RLE format
    "object_id": 0             # ID of the tracked object
}]

Credits

This model is powered by: - SAMURAI by Yang et al. from the University of Washington’s Information Processing Lab - SAM 2 (Segment Anything Model 2) by Meta FAIR - Original paper: “SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory”

License

Apache-2.0


Follow me on Twitter/X

Model created