Intelligent Voice transcription



The proliferation of voice capture across, Turrets, Unified Communication, Mobile and more, means that IT and Compliance users have more data to manage and less time to do so.  To meet these challenges, Verint Financial Compliance's Compliance Application functionality has been enhanced with Intelligent Voice's speech capabilities to automate and simplify Conversation Search. 

The following functionality is available within Verint Financial Compliance:  

  • Transcription of audio for voice and video recordings
  • Diarization: the ability to separate/identify speakers
  • Punctuation and capitalization: the ability to identify the beginning and the end of sentences
  • Conversation Search: Includes the ability to search within Transcribed Text, including keyword and phrase searches.
  • Language Search: Includes the ability to search for Conversations in a specific (supported and configured language) and also conversations where a Language Switch took place during the conversation.
  • Sentiment Search: Includes the ability to search for Sentiment Gradient swings, whereby sentiment changes (for example from positive to negative) during a conversation.  In addition, users can search for high occurrences of negative or positive sentiment in a given conversation.
  • Topic Search:  Includes the ability to search for conversations that include IV derived Key Topics.
  • Search Results:  Search grid includes; Topics, Sentiment Information, Language Switches and Spoken Languages
  • Conversation View: Includes a new tab for Analytics.  This will present Topics, Sentiment Information, Language Switches, Spoken Languages, and Summary to the user.  In addition, the user can interact with the Topics and use this to navigate and jumpto the point in the conversation relative to the topic.
  • Support for multiple languages, see
  • Adaptation/customization of automatic speech recognition (ASR) models (Model adaptation is the process of taking an existing ASR model and adapting it to suit a specific use case, by incorporating new words and new patterns of speech. Speech recognition models reflect the patterns of speech in the training dataset they were built with. The general ASR models distributed by Intelligent Voice reflect the patterns of speech in the general population of the region.  Improved results can be obtained by tailoring the model to reflect the speech in a given domain.)
  • Integrated via REST APIs
  • Intelligent Voice solution is deployed separately on-premise or in Verint partner cloud

License requirements

The Intelligent Voice transcription engine requires specific licenses. Please contact your Verint sales representative for more information.

After uploading the necessary licenses, the licenses have to be assigned to users through the role configuration. The following permission(s) are required for the Intelligent Voice speech transcription:

Role Permission
Required License
Speaker Diarization

Sentiment Analysis

Language Detection
Transcription (Profiling Speech)Communications Profiling SpeechYes (tick)




Yes (tick)
Transcription (Profiling Speech Advanced)Communications Profiling Speech - AdvancedYes (tick)Yes (tick)Yes (tick)


Yes (tick)

Transcription (Risk Profiling)

Communications Risk Profiling SpeechYes (tick)Yes (tick)Yes (tick)Yes (tick)Yes (tick)

Deploying and configuring Intelligent Voice transcription

The Intelligent Voice transcription is considered a 3rd party transcription engine, which requires using a separate on-premise or cloud based Intelligent Voice infrastructure to run the transcription service. The Verba Speech Analytics Service is connecting to the Intelligent Voice platform and sends audio files for transcription. For more information see Deploying transcription.

Failover and load balancing

Multiple servers can process the transcription policies simultaneously. If 1500 records have already been sent to the Intelligent Voice engine, then no new records will be selected for the policy (the records already selected will be sent though).

Failure scenarios

  • Database query or update problem: the Speech Analytics Service retries automatically
  • Communication problem with the Intelligent Voice service: the Speech Analytics Service automatically retries
  • Pending tasks can be monitored in the speech_pending database table
  • Getting stuck
    • The conversation is selected for sending by the service but it is never sent:
      Such entries are deleted from the speech_pending table in one hour so the record will be selected for sending again
    • The conversation is selected for receiving the results by the service but it is never finished:
      Such entries are updated in the speech_pending table pending so downloading will be retried by one of the Media Repositories again
    • Pending entries in the Verba database are cross-checked with the entries in the Intelligent Voice side in every 4 hours by the Speech Analytics Service
  • Intelligent Voice engine is not properly configured or missing components: consult with IV about the installation and licenses.

Performance requirements

Considering performance requirements, the most important factor is the storage and the network should be capable of reading and sending the expected amount of audio files to the IV cluster. As all analysis happens on the IV servers, CPU and memory are only used for lightweight database querying and parsing, and storing the results from the IV system.


Add the Intelligent Voice ca-cert.pem certificate to the Verba Java Keystore. If this step is accidentally missed, an HTTP error will be shown under Data > ASR Models in the Verba UI and no ASR models will be shown.

  1. Check with Verba support for the Java keystore password.
  2. The Intelligent Voice ca-cert.pem certificate can be found in /opt/jumpto/ssl.  Copy ca-cert.pem to a convenient location on the Verba Media Repository, for example "C:\IV\ca-cert.pem".
  3. Open a DOS Admin prompt and change the current directory to "C:\Program Files\Eclipse Adoptium\jre-\bin"
  4. Type the following command: keytool -import -trustcacerts -alias iv-ca -file "c:\IV\ca-cert.pem" -keystore  "C:\Program Files\Eclipse Adoptium\jre-\lib\security\cacerts"
  5. Check the timestamp has been updated on the "C:\Program Files\Eclipse Adoptium\jre-\lib\security\cacerts" file and that no new cacerts file has been created in the local directory.
  6. Restart the Verba Conversation UI.

Data processor

Once, the Intelligent Voice platform is available, the required Data Processor has to be created to enable the integration with the Intelligent Voice transcription engine. Follow the steps described in Configuring and running transcription to create the processor and select the Intelligent Voice engine. The following table describes the settings available for an Intelligent Voice data processor:

Configuration item


NameName of the data processor. This name will identify this processor across the system.
TypeSelect Speech Transcription
EngineSelect Intelligent Voice
API Root URLURL of the Intelligent Voice API (on-prem server or cloud)
API User

API user name, the group ID defined in Intelligent Voice

API Token

API token

Enable Speaker DiarizationAllows separating participants in conversations and producing a dialog like output
Transcription SummaryAllows the generation of a 100-500 word summary of the conversation.

Transcription policy

After creating the data processor, you can follow the guidelines at  Configuring and running transcription to configure one or more data management policies.

When an Intelligent Voice Data Processor is selected, then the ASR Models must be defined:

The maximum number of selected models is 4, because Intelligent Voice supports up to 3 languages for language detection, and the 4th item should be the special ASR Model that signals the language detection requirement.

Only the Transcription (Risk Profiling) license supports language detection, in the case of Transcription (Profiling Speech) and Transcription (Profiling Speech Advanced), the Users' Roles define which ASR Model will be available for the given user. In that case, the Data Management Policy must be set up to include the Users' ASR Model, otherwise, the Users' calls will not be processed by the policy.

Environment configuration

The Intelligent Voice Group ID should be set up for each tenant before using the integration:

Managing ASR models

ASR model management

Detailed User Guides

Using analytics search with Intelligent Voice

Using transcription and analytics in player


The Intelligent Voice integration has the following limitations:

  • The single conversation can only be transcribed once. Once it is transcribed there is no way to transcribe it again.
  • Automatic language detection for multi-language calls supports up to 3 languages for a call and requires the configuration of each model upfront. It is not able to detect a language that is not configured in the transcription policy.
  • Records migrated from WFO cannot be transcribed