Verba transcript file (.vtr) format
VTR file format specification
The VTR file format (with .vtr extensions) contains the transcript text and other information form speech analysis. It is a human-readable JSON formatted text file compressed with Zip. Non-ASCII characters are encoded with UTF8.
Objects
"
provider
" : the engine used to create the transcript (e.g. "intelligentvoice",”speechmatics”)"
language
" : comma-separated list of languages defined or detected in the transcript"
speakers
" : list of speakers with id, name and sentiment values"
topics
:" : list of topics identified in the transcript with score, positions and other information"
words
": list of words in chronological order with position, time, speaker, alternatives and other information"
tcus
": list of turn construction units with text, speaker, time and sentiment value
speakers
The speakers
text is an array of speaker objects.
speaker object
example:
{
"id": "Speaker 1",
"iv_id": 1231,
"iv_label": "Channel 1",
"sentiment": {
"positiveAggregatedSentiment": 26.66919087, ,
"negativeAggregatedSentiment": -75.60949249, ,
"sentimentGradient": 25.35896311, ,
"normalisedSentimentGradient": 13.05775275, ,
"sentimentIntercept": -13.36524594, ,
"sentimentOutcome": 12.75025956
}
}
property | description |
---|---|
| String. An identification which is local to the current conversation. A speaker is referred by this id throughout the rest of the document. |
| Number. This is specific to Intelligent Voice. This is a unique global id within the IV database which identifies the speaker. |
| String. This is specific to Intelligent Voice. This is a label given to the speaker during diarisation process. |
| Object. This is specific to Intelligent Voice. If sentiment processing was enabled for the transcript, the sentiment object contains sentiment values calculated for this speaker regarding this conversation. See Intelligent Voice - Sentiment scores for explanation of sentiment values. List of sentiment properties:
|
topics
The topics
text is an array of topic objects.
topic object
example:
{
"topic": "Government Press Office",
"score": 0.08300000,
"positionInView": 1,
"length": 0,
"id": 14562,
"rawscore": 1.00000000,
"seektime": 0,
"status": 0,
"tagID": 973,
"position": [{
"order": 1,
"wordIndex": 376,
"timestamp": 107.40000000,
"offset": 0
}
]
}
property | description |
---|---|
| String. A word or phrase identified as a topic by the provider, which is relevant to the conversation. |
| Real number in the range of 0..100. It shows how relevant the topic is to the conversation, determined by the transcript provider. Normalized value, 100.0 is the maximum relevance. |
| Number. A unique position of the topic within the document. |
| Number. Always zero. |
| Number. A global id for this topic in this conversation. |
| Real number. It shows how relevant the topic is to the conversation, determined by the transcript provider. |
| Number. Always zero. |
| Number. Always zero. |
| Number. A global id for this topic across all conversations. |
| Array of Properties of position object:
|
words
Is an array of word objects. It contains the complete transcription of the conversation split into words in chronological order.
word object
example:
{
"word": "to",
"confidence": 0.55500000,
"speaker": "Speaker 2",
"speakerName": "3",
"speakerId": 1002294,
"time": 12.60000000,
"duration": 0.04000000,
"alternatives": [{
"word": "the",
"confidence": 0.28853413
}, {
"word": "they",
"confidence": 0.15478531
}, {
"word": "<eps>",
"confidence": 0.00215280
}
]
}
property | description |
---|---|
| String. A word transcribed by the provider with the highest confidence. It usually contains trailing punctuation marks. |
| Real number in the range of 0..1. It shows how confident the provider is in the word is actually the one spoken in the conversation. 1.0 is the highest confidence. |
| String. Display name of the speaker if diarisation is enabled. It is the |
| String. |
| Number. |
| Real number. Beginning timestamp in seconds where the word is spoken in the conversation. |
| Real number. Duration in seconds of the spoken word in the conversation. |
| Array of Properties of
|
tcus
Is an array of tcu
objects. A TCU or Turn Construction Unit is a sentence or similar snippet of a conversation separated by punctuation, speaker change or other means.
tcu object
example:
{
"text": "How'd you get on today? Today",
"speaker": "10",
"startTime": 0.16000000,
"endTime": 2.00000000,
"sentiment": 0.01980386
}
property | description |
---|---|
| String. Transcription of the tcu. |
| String. |
| Real number. Beginning timestamp in seconds where the tcu is spoken in the conversation. |
| Real number. Ending timestamp in seconds where the tcu is spoken in the conversation. |
| Real number. Sentiment value of the tcu if sentiment processing was enabled for the transcription. See Intelligent Voice - TCU sentiment for details. |