infiniteTalk/video-to-vedio creates new videos by combining input silent videos and audio. It ensures precise lip-sync while aligning head, facial, and body movements with the audio. With optional masks and prompts, users can control which areas move and how the scene is presented. The model also maintains visual recognition for natural and consistent results. A ready-to-use REST inference API delivers optimal performance, with no cold starts and affordable pricing.Usage steps:
1.
Upload an audio file
2.
Upload a video as the base
3.
Upload a mask image to control which areas can move (optional)
4.
Write a prompt word to guide style, pose, or expression (optional)
5.
Select the output resolution (480p or 720p)
6.
If you want repeatability, set a seed first
7.
After submission, use the ID to retrieve the results via the GET interface
Price: 480p: 0.15 PTC/5 seconds; Maximum length 10 minutes 720p: 0.30 PTC/5 seconds; Maximum length 10 minutes