Alignments（Subtitle Timing）

POST

/v1/audio/alignments

Transcribe audio into the input language.
The transcription API accepts the audio file you want to transcribe as input, as well as the desired output file format for the transcription. We currently support multiple input and output file formats.

Price：0.002 PTC /min

Request

Header Params

string

required

Example:

application/json

Authorization

string

optional

Example:

Bearer {{YOUR_API_KEY}}

Body Params multipart/form-data

file

required

To transcribe an audio file, please use one of the following formats：mp3、mp4、mpeg、mpga、m4a、wav or webm。

text

string

required

The transcription result of the audio

model

string

required

The ID of the model to be used whisper-v3 , whisper-v3-turbo

Example:

whisper-v3-turbo

vad_model

string

optional

Example:

silero

preprocessing

string

optional

none
dynamic
soft_dynamic
bass_dynamic

Example:

none

response_format

string

optional

The format of the reply should be in one of the following formats：srt、verbose_json、vtt

Example:

verbose_json

alignment_model

string

optional

tdnn_ffn
mms_fa
gentle

Example:

tdnn_ffn

Request samples

Shell

JavaScript

Java

Swift

PHP

Python

HTTP

Objective-C

Ruby

OCaml

Dart

curl --location --request POST 'https://api.302.ai/v1/audio/alignments' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer sk-jls4AaVBGoe1GwZD64qZA1qyKTN1MPHa4NmvH1cT68z7K1Zz' \
--form 'file=@""' \
--form 'text=""' \
--form 'model="whisper-v3-turbo"' \
--form 'vad_model="silero"' \
--form 'preprocessing="none"' \
--form 'response_format="verbose_json"' \
--form 'alignment_model="tdnn_ffn"'

Responses

🟢200OK

application/json

Body

text

string

required

Example

{
    "text": "Imagine the wildest idea that you've ever had, and you're curious about how it might scale to something that's a 100, a 1,000 times bigger. This is a place where you can get to do that."
}

Modified at 2025-01-15 02:54:11

Transcriptions（Speech to Text）

WhisperX