Support real-time access to audio stream
This feature provides a real-time access to audio stream to an application code, allowing the application to monitor raw audio frames coming from microphone and/or sent to speaker device, and to implement custom processing of these frames (e.g. to add real-time transcription).
OSX
The audio stream real time access is supported by utilizing new method to enable/disable audio monitoring of particular stream direction, and notification delegate that takes an audio frame data object holding specific information about that frame.
@protocol GSEndpoint <NSObject>
/**
Enable audio processing support for requested stream type
@param streamType values: 0 - disable support; 1 - mic stream; 2 - speaker stream; 3 - both streams;
@returns result of the operation
*/
- (GSStatus) enableAudioMonitor:(int) streamType;
@protocol GSEndpointNotificationDelegate <NSObject>
/**
Called when an audio frame received.
@see GSEndpointEvent
*/
- (void) audioFrameReceivedNotification:(GSAudioFrame*) audioFrame;
/**
Audio frame data structure supplied with the notification delegate.
@field direction to indicate the collected media steam type: mic or speaker
@field samples to hold an array of collected media samples in the received frame
@field length to hold the count of the stored in array samples
@field samplingFrequency to hold a frequency
@field isStereo to indicate whether the the received frame content has stereo or mono data
@see GSEndpointEvent
*/
@interface GSAudioFrame : NSObject {
@private
int direction;
NSArray *samples;
int length;
int samplingFrequency;
bool isStereo;
}
.NET
The audio stream real time access support is added to the IExtendedService interface:
// Audio related //0 - processing disabled; 1 - mic stream; 2 - speaker stream; 3 - both streams; GsStatus EnableAudioMonitor(int streamType); event EventHandler<EndpointEventArgs^>^ AudioFrameDelivered;
Audio frame data is incorporated into endpoint event property dictionary,
EndpointEventArgs^ event; IDictionary<String^, Object^>^ property = event->Context->Properties;
where the property will hold audio frame data as key value pairs:
("direction", direction); /* int */
("samples", samples[length]); /* int16_t samples[] */
("length", length); /* int */
("samplingFrequency", samplingFrequency); /* int */
("isStereo", isStereo); /* bool */
Detailed Description
When Audio monitoring is enabled for particular direction, it is applicable for current and all future session, until explicitly turned off. Parameter streamType is basically a bit mask specifying monitoring state of capture (least significant bit) and playback devices (second bit), with bit=1 enabling monitoring and bit=0 disabling it.
One single notification callback is used for both audio streams, with
- direction property indicating stream type - 1 for capture, 2 for playback stream
- samples array holding the audio data, total of length 16-bit signed integers representing PCM samples; when isStereo is true, data is in interleaved stereo format: L0,R0,L1,R1,...
- samplingFrequency indicating number of samples per second, based on
For narrow-band codecs such as G.711 or G.729, sampling frequency is always 8000 and stereo = false. For wide-band codecs, sampling rate depends upon device capabilities and codec used. Namely:
- sampling rate of captured stream is the maximum rate both codec and device supports, stereo is used only for Opus codec, is microphone device supports it
- sampling rate and stereo status for playback stream always follows the codec parameters (the rate will be as high as 48000 for Opus codec, with stereo = true)
