- Large Language Model
- API Migration Guide
- Exclusive Feature
- Model Support
- OpenAI
- Chat(Talk)
- Chat(Streamed return.)
- Chat (gpt-4o Image Analysis)
- Chat (gpt-4o Structured Output)
- Chat (gpt-4o function call)
- Chat (gpt-4-plus image analysis)
- Chat (gpt-4-plus image generation)
- Chat(gpt-4o-image-generation modify image)
- Chat (gpts model)
- Chat (chatgpt-4o-latest)
- Chat (o1 Series Model)
- Chat (o3 Series Model)
- Chat(o4 Series)
- Chat(gpt-4o audio model)
- Anthropic
- Gemini
- China Model
- Chat (Baidu ERNIE)
- Chat (Tongyi Qianwen)
- Chat (Tongyi Qianwen-VL)
- Chat(Tongyi Qianwen-OCR)
- Chat (Zhipu GLM-4)
- Chat (Zhipu GLM-4V)
- Chat (Baichuan AI)
- Chat (Moonshot AI)
- Chat (Moonshot AI-Vision)
- Chat (01.AI)
- Chat (01.AI-VL)
- Chat (DeepSeek)
- Chat (DeepSeek-VL2)
- Chat (ByteDance Doubao)
- Chat (ByteDance Doubao-Vision)
- Chat(ByteDance Doubao Image Generation)
- Chat (Stepfun)
- Chat (Stepfun Multimodal)
- Chat (iFLYTEK Spark)
- Chat (SenseTime)
- Chat(Minimax)
- Chat (Tencent Hunyuan)
- SiliconFlow
- Open Source Model
- Expert Model
- Other Models
- Image Generation
- Unified interface
- GPT-Image-1
- DALL.E
- Stability.ai
- Text-to-image (Image Generation-V1)
- Generate (Image Generation-SD2)
- Generate (Image Generation-SD3-Ultra)
- Generate (Image Generation-SD3)
- Generate(Image Generation-SD3.5-Large)
- Generate(Image Generation-SD3.5-Medium)
- Generate(Image to Image-SD3)
- Generate(Image to Image-SD3.5-Large)
- Generate(Image to Image-SD3.5-Medium)
- Midjourney
- Midjourney-Relax
- 302.AI
- Glif
- Flux
- Ideogram
- Recraft
- Luma
- Doubao
- Minimax
- ZHIPU
- Baidu
- Hidream
- Image Processing
- 302.AI
- Upscale
- Upscale-V2
- Upscale-V3
- Upscale-V4
- Super-Upscale
- Super-Upscale-V2
- Face-upscale
- Colorize
- Colorize-V2
- Removebg
- Removebg-V2
- Inpaint
- Erase
- Face-to-many
- Llava
- Relight
- Relight-background
- Relight-V2
- Face-swap-V2
- Fetch
- HtmltoPng
- SvgToPng
- image-translate
- image-translate-query
- image-translate-redo
- Flux-selfie
- Trellis(Image to 3D model)
- Pose-Transfer(Human Pose Transformation)
- Pose-Transfer(Human Pose Transformation Result)
- Virtual-Tryon
- Virtual-Tryon(Fetch Result)
- Denoise(AI Denoising)
- Deblur(AI Deblurring)
- 302.AI-ComfyUI
- Create Outfit Change Task
- Create Outfit Change Task (Upload Mask)
- Query Outfit Change Task Status
- Create Face Swap Task
- Query Face Swap Task Status
- Create a Task to Replace Any Item
- Create Object Replacement Task (Upload Mask)
- Check the Status of Any Object Replacement Task
- Create a Task to Transform Cartoon Characters into Real People
- Query the status of the task to turn a manga character into a real person
- Create Style Transfer Task
- Query the status of the style transfer task
- Create Image Removal Task
- Query Image Removal Task Status
- Create Video Face Swap Task
- Query Video Face Swap Task Status
- Vectorizer
- Stability.ai
- Glif
- Clipdrop
- Recraft
- BRIA
- Flux
- Flux-V1.1-Ultra-Redux(Image-to-image generation-Ultra)
- Flux-V1.1-Pro-Redux(Image-to-image generation-Pro)
- Flux-Dev-Redux(Image-to-image generation-Dev)
- Flux-Schnell-Redux(Image-to-image generation-Schnell)
- Flux-V1-Pro-Canny(Object consistency)
- Flux-V1-Pro-Depth(Depth consistency)
- Flux-V1-Pro-Fill(Partial repainting)
- Hyper3D
- Tripo3D
- FASHN
- Ideogram
- Doubao
- Kling
- StepFun
- 302.AI
- Video Generation
- Unified Interface
- 302.AI
- Stable Diffusion
- Luma AI
- Runway
- Kling
- Txt2Video(Text to Video 1.0 Rapid-5s)
- Txt2Video_HQ(Text to Video 1.5 HQ-5s)
- Txt2Video_HQ(Text to Video 1.5 HQ-10s)
- Image2Video(Image to Video 1.0 Rapid-5s)
- Image2Video(Image to Video 1.0 Rapid-10s)
- Image2Video(Image to Video 1.5 Rapid-5s)
- Image2Video(Image to Video 1.5 Rapid-10s)
- Image2Video_HQ(Image to Video 1.5 HQ-5s)
- Image2Video_HQ(Image to Video 1.5 HQ-10s)
- Txt2Video(Text to Video 1.6 Standard-5s)
- Txt2Video(Text to Video 1.6 Standard-10s)
- Txt2Video(Text to Video 1.6 HQ-5s)
- Txt2Video(Text to Video 1.6 HQ-10s)
- Image2Video(Image to Video 1.6 Standard-5s)
- Image2Video(Image to Video 1.6 Standard-10s)
- Image2Video(Image to Video 1.6 HQ-5s)
- Image2Video(Image to Video 1.6 HQ-10s)
- Txt2Video(Text-to-Video 2.0 – HD – 5s)
- Image2Video(Image-to-Video 2.0 – HD – 5s)
- Image2Video(Image-to-Video 2.0 – HD – 10s)
- Image2Video (Multiple pictures for reference)
- Extend_Video
- Fetch
- CogVideoX
- Minimax
- Pika
- 1.5 pikaffects(Image-to-Video Generation)
- Turbo Generate(Text-to-Video Generation)
- Turbo Generate(Text-to-Video Generation)
- 2.1 Generate(Text-to-Video Generation)
- 2.1 Generate(Image-to-Video Generation)
- 2.2 Generate(Text-to-Video Generation)
- 2.2 Generate(Image-to-Video Generation)
- 2.2 Pikascenes(Generate scene videos)
- Fetch(Result)
- PixVerse
- Genmo
- Hedra
- Haiper
- Sync.
- Lightricks
- Hunyuan
- Vidu
- Vidu(Text-to-Video)
- Vidu(Image to Video)
- Vidu(Generate video from the first and last frames)
- Vidu(Reference-based video generation)
- Vidu(Generate scene video)
- Vidu(Smart Ultra HD)
- Fetch(Retrieve Task Results)
- Vidu V2(Text-to-Video Generation)
- Vidu V2(Image-to-Video)
- Vidu V2(Start-and-End Frame Video Generation)
- Vidu V2(Subject-Driven Video Generation)
- Vidu(Scene Video Generation V2)
- Vidu V2(AI Ultra HD – Premium)
- Fetch V2(Retrieve Task Result)
- Tongyi Wanxiang
- Jimeng
- SiliconFlow
- Kunlun Tech
- Higgsfield
- Audio/Video Processing
- Unified interface
- 302.AI
- Stable-Audio(instrumental generation)
- Transcript (Audio/Video to Text)
- Transcriptions(Speech to Text)
- Alignments(Subtitle Timing)
- WhisperX
- F5-TTS(Text to Speech)
- F5-TTS (Asynchronous Text-to-Speech)
- F5-TTS (Asynchronously Retrieve Results)
- mmaudio(Text-to-Speech)
- mmaudio(AI Video Voiceover)
- mmaudio (Asynchronous Result Retrieval)
- Diffrhythm(Song Generation)
- OpenAI
- Azure
- Suno
- Doubao
- Fish Audio
- Minimax
- Dubbingx
- Udio
- Elevenlabs
- Mureka
- Information Processing
- Unified Search API
- 302.AI
- Admin Dashboard
- Information search
- Xiaohongshu_Search
- Xiaohongshu_Note
- Get_Home_Recommend
- Tiktok_Search
- Douyin_Search
- Twitter_Search
- Twitter_Post(X_Post)
- Twitter_User(X_User)
- Weibo_Post
- Search_Video
- Youtube_Info
- Youtube_Subtitles(Youtube Obtain Subtitles)
- Bilibili_Info(Bilibili Obtain Video Information)
- MP_Article_List(Get the list of WeChat official account articles)
- MP_Article(Retrieve WeChat Official Account articles)
- File processing
- Code execution
- Remote Browser
- Tavily
- SearchAPI
- Search1API
- Exa
- Bocha AI
- Doc2x
- Glif
- Jina
- DeepL
- RSSHub
- Firefly card
- Youdao
- Mistral
- Firecrawl
- RAG-related
- Tools API
- AI Video Creation Hub
- AI Paper Writing
- AI Podcast Production
- AI Writing Assistant
- AI Video Real-Time Translation
- AI Document Editor
- Web Data Extraction Tool
- AI Prompt Expert
- AI 3D Modeling
- AI Search Master 3.0
- AI Vector Graphics Generation
- Al Answer Machine
- AI PPT Generator
- Generate PPT interface with one click
- File parsing
- Generate an outline
- Generate outline content
- Get template options
- Generate PPT interface (synchronous interface)
- Load PPT data
- Generate PPT interface (asynchronous interface)
- Asynchronous query generates PPT status
- Download PPT
- Add/update custom PPT templates
- Pagination query PPT template
- AI Academic Paper Search
- One-Click Website Deployment
- AI Avatar Maker
- AI Card Generation
- Help Center
vc(Audio and video caption generation)
POST
/doubao/vc/submit
Official Documentation:https://www.volcengine.com/docs/6561/80909
Request
Query Params
words_per_line
string
optional
max_lines
string
optional
use_itn
string
optional
If enabled (True), Chinese numerals in the recognition results will be automatically converted to Arabic numerals.
language
string
optional
caption_type
string
optional
You can choose "speech" (recognizes only the speech parts) or "singing" (recognizes only the singing parts).
use_punc
string
optional
If set to True, punctuation marks will be added to the recognition results.
This is effective only when (caption_type = speech).
use_ddc
string
optional
If set to True, silent sentences with empty text will be added to the returned utterances, and their attribute "event" will be marked as "silent." Additionally, words that require smoothing may be annotated in the "words" field, for example:
"extra": { "smoothed": "repeat" }
.The value of "smoothed" can be "repeat" (repeated words) or "filler" (filler words).
boosting_table_id
string
optional
id
) or the name (name
), only one is required.You also need to pass
asr_appid
(which should be the same as the appid
value).boosting_table_name
string
optional
asr_appid
string
optional
appid
value.with_speaker_info
string
optional
Header Params
Authorization
string
optional
Example:
Bearer {{YOUR_API_KEY}}
Body Params application/json
url
string
required
Example
{
"url": "https://file.302.ai/gpt/imgs/20241204/361bca5886e844dfac39fb861ea3f3ac.mp3"
}
Request samples
Shell
JavaScript
Java
Swift
Go
PHP
Python
HTTP
C
C#
Objective-C
Ruby
OCaml
Dart
R
Request Request Example
Shell
JavaScript
Java
Swift
curl --location --request POST 'https://api.302.ai/doubao/vc/submit?words_per_line&max_lines&use_itn&language&caption_type&use_punc&use_ddc&boosting_table_id&boosting_table_name&asr_appid&with_speaker_info' \
--header 'Authorization: Bearer sk-jls4AaVBGoe1GwZD64qZA1qyKTN1MPHa4NmvH1cT68z7K1Zz' \
--header 'Content-Type: application/json' \
--data-raw '{
"url":"https://file.302.ai/gpt/imgs/20241204/361bca5886e844dfac39fb861ea3f3ac.mp3"
}'
Responses
🟢200成功
application/json
Body
object {0}
Example
{}