Microsoft Teams chat and channel archiving

Overview

Microsoft provides 2 set of Graph APIs to archive chat and channel messages for Microsoft Teams. The Verba system supports both integrations. The following table provides a comparison of the 2 integration options.


FeatureWebhook/DLP APIExport API
CaptureInternal chat (peer-to-peer and group) messages and filesSupportedSupported
External chat (peer-to-peer and group) messages and files

Supported

Files can only be archived if the chat is started by an internal party

Supported

Files can only be archived if the chat is started by an internal party

Internal channel messages and filesSupportedSupported
Internal meeting messages and filesSupportedSupported
External meeting messages and files

Supported

Files cannot be archived unless the meeting is hosted by an internal party

Supported

Files cannot be archived unless the meeting is hosted by an internal party

Private channel messages and filesSupportedSupported
Channel announcementSupportedSupported
RepliesSupportedSupported
ReactionsSupported

Supported

Reactions that are added after the query returned the message, will not be archived. because there is a known bug in the Export API confirmed by Microsoft.

EmoticonsSupportedSupported
Animated GIFs, Stickers, Praises, and other rich contentSupportedSupported
Send email to channelSupportedSupported
Loop componentsNot supportedNot supported
OneNoteNot supportedNot supported
Participant join/leave events

Supported

Initially, the membership information is determined based on periodic membership queries and/or message sender information (there is a chance that this is not 100% accurate all the time. VFC data is only as accurate as the information returned in the query).

From the time a conversation is being recorded, accurate participant information can be determined from join/leave system events.

Supported

Initially, the membership information is determined based on periodic membership queries and/or message sender information, which provide the list of members at the time of the query. From the time a conversation is being recorded, accurate participant information can be determined by combining the initial membership information received from the periodic membership queries and from join/leave system events. Join/leave system events provide historical information and are used to verify changes in participant membership since the last membership information update.

Note: Until the data from the initial membership query is received, only partial participant information is available, determined from the system events and user activity. The receipt of accurate participation information can also be delayed due to the delay in the ingestion of Teams chat and channel messages, and any throttling limits that exist for the API.

Selective capture

Supported for both chats and channels with limitations, participant information is not 100% accurate all the time (see below)

Chat: supported

Channel: supported, but the API only offers team based queries (user based queries are not available, teams have to be configured as recorded extensions)

Participant informationChat and channel membership information is collected by receiving join/leave system events and periodically querying Graph API endpoints and caching the data on the Media Recorders.

Chat and channel membership information is collected at the point when the Export API is queried. The accuracy of the chat and channel membership information has no effect on the selective capture, there will be no data loss. However, due to the periodic query nature, the membership information might not be accurately reflected in the database for the chat and channel conversations. See Participant Join/Leave Events for more information.

Chat/channel name and description updates

Supported

Available through regular polling and system events.

Supported

Available through regular polling.

Disclamier notificationNot supportedNot supported
ArchitectureIntegration with Microsoft Graph APIs

The Webhook/DLP API is a set of Microsoft Graph APIs that allow subscribing to change notification events for both chat and channel messages in a Teams tenant. The Webhook API based integration provides a real-time capture of messages and attachments.

For more information, see https://docs.microsoft.com/en-us/graph/teams-changenotifications-chatmessage

The system utilizes other Graph APIs to collect additional information such as attachments, user information, group membership, etc.

The Export API is a set of Microsoft Graph APIs which allow querying both chat and channel messages for specific users and teams in a Teams tenant. The Export API based integration provides a non real-time capture of messages and attachments.

For more information, see https://docs.microsoft.com/en-us/microsoftteams/export-teams-content

The system utilizes other Graph APIs to collect additional information such as attachments, user information, group membership, etc.

Data segregation, access to regulated users' data onlyNot supported, the webhook sends data for every user in the tenant which is filtered on the Media Records only, the files in the file queue are encrypted automaticallySupported
Load balancing for Recording DirectorSupported via load balancers

Supported by automatic allocation of archived users and teams to file queues

Note: Recording Director and Media Recorder roles are always co-located for Export API based deployments

Load balancing for Media RecorderSupported via file queues
Failover for Recording DirectorSupported via load balancers

Supported by deploying standby servers

Note: Recording Director and Media Recorder roles are always co-located for Export API based deployments

Failover for Media RecorderSupported by deploying standby servers
Scalability for Recording DirectorScales by adding more servers behind a load balancer

Scales by adding more servers

Note: Recording Director and Media Recorder roles are always co-located for Export API based deployments

Scalability for Media RecordersScales by adding more servers
Possible data loss scenarios
  • Microsoft only retries sending events a few times, so data can be lost after that. The risk can be mitigated by deploying multiple Recording Directors behind a load balancer.
  • Data loss is possible if selective archiving is configured and the participant information is not up-to-date (see Participant Information for more information).

No data loss since the queries can be executed at any time. Microsoft stores messages for the defined retention period (see https://docs.microsoft.com/en-us/microsoftteams/retention-policies). Deleted messages are kept for 21 days only.

Multi-tenancy

Supported

Supported

Data duplicationNo duplication, messages and files are stored only once even where there are multiple archived users in the same chat or channel

Chat: data duplication, messages and files are multiplicated based on the number of archived users in the chat

Channel: no duplication, messages and files are stored only once even where there are multiple archived users in the same channel

ExportExport

SMTP-based export only

User/participant-based or conversation/chat based export

SMTP based export only

User/participant based only

LicensingMicrosoft licensing

For licensing information, please refer to the following Microsoft knowledge base article, "Graph APIs for Teams Data Loss Prevention (DLP) and for Teams Export" section:

https://learn.microsoft.com/en-us/office365/servicedescriptions/microsoft-365-service-descriptions/microsoft-365-tenantlevel-services-licensing-guidance/microsoft-365-security-compliance-licensing-guidance#microsoft-purview-data-loss-prevention-graph-apis-for-teams-data-loss-prevention-dlp-and-for-teams-export

In addition to the user license requirements above, the owner of the application registration must define the licensing model for the deployment. Model A is required for Security and Compliance (S+C) and general usage scenarios. The licensing model is configurable in the VFC system.

For more information about seeded capacity and consumption fees, see https://docs.microsoft.com/en-us/graph/teams-licenses.

LimitationsLimitations-

The following limitations should be considered when deploying the Export API based solution:

  • The Microsoft Export API returns the latest version of the message at the time of capture. The message appears as (Edited) in the VFC web application. The full edited history of the message is not returned by the API.
  • The onBehalfOf attribute is missing for apps, such as Forms, the sender is not recognized due to this limitation.
  • Microsoft has no official SLA for the completeness of the records, we estimate min. 15 minutes, but it is recommended to use the default 1 hour delay for the queries.
  • Participant join/leave times are not accurate (see above)
  • Reactions that are added after the query returned the message, will not be archived. because there is a known bug in the Export API confirmed by Microsoft (see above)

Deploying Microsoft Teams chat and channel archiving

The following section contains all the necessary steps for setting up a Microsoft Teams chat and channel archiving infrastructure.

Server sizing

The IM recording architecture includes two server roles: Recording Director and Media Recorder. These roles have different sizing numbers and different factors have to be taken into account. Since the Recording Director has a small footprint compared to the Media Recorder, they are usually not separated but deployed as a single recorder.

Rule of thumb for server sizing

The following table shows the expected incoming message rates at different user numbers:


1K Users

10K Users

100K Users

Average during the day*1.6 msg/s16.6 msg/s166.6 msg/s
Low message rate**2.7 msg/s27.7 msg/s277.7 msg/s
Medium message rate**4.1 msg/s41.6 msg/s416.6 msg/s
High message rate**6.9 msg/s69.4 msg/s694.4 msg/s

*Based on Slack usage statistics

**Based on Cisco IM/P sizing

Based on the statistics above, if the daily IM message rate has to be processed within 8 hours, then a single recorder core can handle 13K users.
If it is enough to process the messages within 16 hours, then a single recorder core can handle 26K users.
For more real-time processing during peak hours, extra CPU cores can be added:

  • In the case of the real-time processing of the low message rate, a single CPU core can handle 8K users.
  • In the case of a medium message rate, a single CPU core can handle 5K users.
  • In the case of a high message rate, a single CPU core can handle 3K users.

For requirements for other components and server roles, see Server sizing and requirements

For the detailed sizing guidelines of the different Recording Server components, see the paragraphs below:

Recording Director

In the case of the Webhook/DLP API, the Recording Director component has to be sized based on the real-time incoming load. The minimum CPU requirement is 4 CPU cores. It can process 1500 messages every second with a single CPU core, and 6000 messages every second with 4 cores.

In the case of higher incoming loads, the network bandwidth also has to be considered. 1000 messages per second incoming load generate 48 Mbps traffic (or 6 MB/s) between the Teams side and the Recording Director, and 42 Mbps traffic (or 5.2 MB/s) between the Recording Director and the file queue storage.

Media Recorder and SQL Server

The Media Recorder component does not have to be sized for real-time processing, since the recorded data is stored already in the file queue storage. Instead, the Media Recorder can be sized based on the overall message count a day. If there are more incoming messages than the real-time processing capacity of the Media Recorder(s), then the messages will be inserted into the database later, so they will be also available for search and replay through the web interface later. However, sufficient processing capacity should be provided so it can process the daily message load at least within 16 hours.

The minimum CPU requirement is 4 CPU cores. It can process 22 messages every second with a single CPU core. In the case of multiple Media Recorder servers, all servers have to have the same number of cores.

The Recorder Director and the Media Recorder components can be co-located on the same server. In this case, the resources will be shared between them.

The SQL Server has to be sized based on the fully utilized CPU cores of the Media Recorder server(s). The SQL Server needs to have one and a half times more CPU cores than the Media Recorder server(s). On the SQL Server physical disk, every fully utilized Media Recorder CPU cores generate 100 IOPS.

Load-balancing and Failover

Large deployments may require multiple VMs and other Azure components. In the case of the Webhook/DLP API, a load-balancer has to be placed in front of the Recording Servers (Recording Directors).

If the Recording Director and Media Recorder roles are separated, multiple Media Recorders can be deployed behind the Recording Director(s).

In the case of the Webhook/DLP API, only one of the Recording Director components is writing into the file queues at once, depending on which one receives the events from the Application Gateway. The other Recording Director(s) will be on standby.

In the case of the Export API however, the active Recording Director components divide the user list amongst each other equally, and only query the chats of their own portion of the user list. The standby Recording Director(s) will become active only if an active one goes down. In that case, it takes over the user list portion of the server that went down.

The Media Recorder component works the same way regardless of the API being used. File queues are distributed between the active Media Recorders equally. Standby Media Recorders will become active only if an active Media Recorder goes down. In that case, it takes over the file queues of the server that went down.

Highly available setup with separated server roles:

Preparation

Make sure that all the required prerequisites are installed on each server prior to the installation.

For guidance on configuring the necessary firewall port, visit Firewall configuration for Microsoft Teams recording deployments

Installation

The following articles contain all the steps for installing the various server roles:

Configuration

For chat and channel archiving, see Microsoft Teams chat and channel archiving.