Verba transcript file (.vtr) format

VTR file format specification

The VTR file format (with .vtr extensions) contains the transcript text and other information form speech analysis. It is a human-readable JSON formatted text file compressed with Zip. Non-ASCII characters are encoded with UTF8.

Objects

"provider" : the engine used to create the transcript (e.g. "intelligentvoice",”speechmatics”)
"language" : comma-separated list of languages defined or detected in the transcript
"speakers" : list of speakers with id, name and sentiment values
"topics:" : list of topics identified in the transcript with score, positions and other information
"words": list of words in chronological order with position, time, speaker, alternatives and other information
"tcus": list of turn construction units with text, speaker, time and sentiment value

speakers

The speakers text is an array of speaker objects.

speaker object

example:

{
	"id": "Speaker 1",
	"iv_id": 1231,
	"iv_label": "Channel 1",
	"sentiment": {
		"positiveAggregatedSentiment": 26.66919087, ,
		"negativeAggregatedSentiment": -75.60949249, ,
		"sentimentGradient": 25.35896311, ,
		"normalisedSentimentGradient": 13.05775275, ,
		"sentimentIntercept": -13.36524594, ,
		"sentimentOutcome": 12.75025956
	}
}

property	description

property	description
`id`	String. An identification which is local to the current conversation. A speaker is referred by this id throughout the rest of the document.
`iv_id`	Number. This is specific to Intelligent Voice. This is a unique global id within the IV database which identifies the speaker.
`iv_label`	String. This is specific to Intelligent Voice. This is a label given to the speaker during diarisation process.
`sentiment`	Object. This is specific to Intelligent Voice. If sentiment processing was enabled for the transcript, the sentiment object contains sentiment values calculated for this speaker regarding this conversation. See Intelligent Voice - Sentiment scores for explanation of sentiment values. List of sentiment properties: `positiveAggregatedSentiment` `negativeAggregatedSentiment` `sentimentGradient` `normalisedSentimentGradient` `sentimentIntercept` `sentimentOutcome`

topics

The topics text is an array of topic objects.

topic object

example:

{
	"topic": "Government Press Office",
	"score": 0.08300000,
	"positionInView": 1,
	"length": 0,
	"id": 14562,
	"rawscore": 1.00000000,
	"seektime": 0,
	"status": 0,
	"tagID": 973,
	"position": [{
			"order": 1,
			"wordIndex": 376,
			"timestamp": 107.40000000,
			"offset": 0
		}
	]
}

property	description

property	description
`topic`	String. A word or phrase identified as a topic by the provider, which is relevant to the conversation.
`score`	Real number in the range of 0..100. It shows how relevant the topic is to the conversation, determined by the transcript provider. Normalized value, 100.0 is the maximum relevance.
`positionInView`	Number. A unique position of the topic within the document.
`length`	Number. Always zero.
`id`	Number. A global id for this topic in this conversation.
`rawscore`	Real number. It shows how relevant the topic is to the conversation, determined by the transcript provider.
`seektime`	Number. Always zero.
`status`	Number. Always zero.
`tagID`	Number. A global id for this topic across all conversations.
`position`	Array of `position` objects. A position object represents an occurrence of the topic in the conversation. Properties of position object: `order`: the order of the position within the list `wordIndex`: index of the starting word of the topic occurrence in the word list of the transcript `timestamp`: starting timestamp of the topic occurrence in the transcribed media `offset`: always 0

words

Is an array of word objects. It contains the complete transcription of the conversation split into words in chronological order.

word object

example:

{
	"word": "to",
	"confidence": 0.55500000,
	"speaker": "Speaker 2",
	"speakerName": "3",
	"speakerId": 1002294,
	"time": 12.60000000,
	"duration": 0.04000000,
	"alternatives": [{
			"word": "the",
			"confidence": 0.28853413
		}, {
			"word": "they",
			"confidence": 0.15478531
		}, {
			"word": "<eps>",
			"confidence": 0.00215280
		}
	]
}

property	description

property	description
`word`	String. A word transcribed by the provider with the highest confidence. It usually contains trailing punctuation marks.
`confidence`	Real number in the range of 0..1. It shows how confident the provider is in the word is actually the one spoken in the conversation. 1.0 is the highest confidence.
`speaker`	String. Display name of the speaker if diarisation is enabled. It is the `iv_label` property of the speaker object in the `speakers` array.
`speakerName`	String. `id` property of the speaker object. This should be used to reference the speaker in the `speakers` array.
`speakerId`	Number. `iv_id` property of the speaker object. Global id of the speaker for this conversation in the IV database.
`time`	Real number. Beginning timestamp in seconds where the word is spoken in the conversation.
`duration`	Real number. Duration in seconds of the spoken word in the conversation.
`alternatives`	Array of `alternative` objects. Alternative transcriptions of the spoken word with less confidence. Properties of `alternative` object: `word`: string, alternative transcription of the spoken word `confidence`: confidence value for this alternative

tcus

Is an array of tcu objects. A TCU or Turn Construction Unit is a sentence or similar snippet of a conversation separated by punctuation, speaker change or other means.

tcu object

example:

{
	"text": "How'd you get on today? Today",
	"speaker": "10",
	"startTime": 0.16000000,
	"endTime": 2.00000000,
	"sentiment": 0.01980386
}

property	description

property	description
`text`	String. Transcription of the tcu.
`speaker`	String. `id` property of the speaker object. This should be used to reference the speaker in the `speakers` array.
`startTime`	Real number. Beginning timestamp in seconds where the tcu is spoken in the conversation.
`endTime`	Real number. Ending timestamp in seconds where the tcu is spoken in the conversation.
`sentiment`	Real number. Sentiment value of the tcu if sentiment processing was enabled for the transcription. See Intelligent Voice - TCU sentiment for details.

VFC Capture (Verba) 9.9

Verba transcript file (.vtr) format

VTR file format specification

Objects

speakers

speaker object

topics

topic object

words

word object

tcus

tcu object

Related content