vc（Audio and video caption generation）

POST

/doubao/vc/submit

Audio and video caption generation from Doubao
Official Documentation：https://www.volcengine.com/docs/6561/80909

Price：0.01 PTC/ min

Request

Query Params

words_per_line

string

optional

The maximum number of characters displayed per line is 46 by default.

max_lines

string

optional

The maximum number of lines displayed per screen is 1 by default.

use_itn

string

optional

Whether to use the number conversion feature: disabled (False) by default.
If enabled (True), Chinese numerals in the recognition results will be automatically converted to Arabic numerals.

language

string

optional

Subtitle language type

caption_type

string

optional

Subtitle recognition type: The default value is "auto" (recognizes both speech and singing parts).
You can choose "speech" (recognizes only the speech parts) or "singing" (recognizes only the singing parts).

use_punc

string

optional

Add punctuation: Default is False.
If set to True, punctuation marks will be added to the recognition results.
This is effective only when (caption_type = speech).

use_ddc

string

optional

Use smooth annotation for filler words: Default is False.
If set to True, silent sentences with empty text will be added to the returned utterances, and their attribute "event" will be marked as "silent." Additionally, words that require smoothing may be annotated in the "words" field, for example:
"extra": { "smoothed": "repeat" }.
The value of "smoothed" can be "repeat" (repeated words) or "filler" (filler words).

boosting_table_id

string

optional

Self-learning platform hotword: Provide either the ID (id) or the name (name), only one is required.
You also need to pass asr_appid (which should be the same as the appid value).

boosting_table_name

string

optional

The file name of the hotword on the self-learning platform.

asr_appid

string

optional

The APPID passed to ASR is required when using hotwords from the self-learning platform and should be the same as the appid value.

with_speaker_info

string

optional

Returning speaker information is set to False by default. If set to True, the attribute field in both utterance and word will include speaker information, such as 'attribute': {'speaker': '1'}.

Header Params

Authorization

string

optional

API Key

Example:

Bearer {{YOUR_API_KEY}}

Body Params application/json

url

string

required

File URL

Example

{
    "url":"https://file.302.ai/gpt/imgs/20241204/361bca5886e844dfac39fb861ea3f3ac.mp3"
}

Request samples

Shell

JavaScript

Java

Swift

PHP

Python

HTTP

Objective-C

Ruby

OCaml

Dart

curl --location --request POST 'https://api.302.ai/doubao/vc/submit?words_per_line&max_lines&use_itn&language&caption_type&use_punc&use_ddc&boosting_table_id&boosting_table_name&asr_appid&with_speaker_info' \
--header 'Authorization: Bearer sk-jls4AaVBGoe1GwZD64qZA1qyKTN1MPHa4NmvH1cT68z7K1Zz' \
--header 'Content-Type: application/json' \
--data-raw '{
    "url":"https://file.302.ai/gpt/imgs/20241204/361bca5886e844dfac39fb861ea3f3ac.mp3"
}'

Responses

🟢200成功

application/json

Body

object {0}

Example

{}

Modified at 2024-12-11 07:53:52

fetch（Query Generation Status）

fetch（Query caption result）