Voices Obscured in Complex Environmental Settings (VOiCES)

VOiCES is a speech corpus recorded in acoustically challenging settings, using distant microphone recording. Speech was recorded in real rooms with various acoustic features (reverb, echo, HVAC systems, outside noise, etc.). Adversarial noise, either television, music, or babble, was concurrently played with clean speech. Data was recorded using multiple microphones strategically placed throughout the room. The corpus includes audio recordings, orthographic transcriptions, and speaker labels.

Overview

Features and programs

Open Data Sponsorship Program

This dataset is part of the Open Data Sponsorship Program, an AWS program that covers the cost of storage for publicly available high-value cloud-optimized datasets.

Learn more

Pricing

This is a publicly available data set. No subscription is required.

How can we make this page better?

Tell us how we can improve this page, or report an issue with this product.

Legal

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Delivery details

AWS Data Exchange (ADX)

AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.

Open data resources

Available with or without an AWS account.

How to use: To access these resources, reference the Amazon Resource Name (ARN) using the AWS Command Line Interface (CLI). Learn more

Description: wav audio files, orthographic transcriptions, and speaker ID
Resource type: S3 bucket
Amazon Resource Name (ARN): arn:aws:s3:::lab41openaudiocorpus
AWS region: us-east-1
AWS CLI access (No AWS account required): aws s3 ls --no-sign-request s3://lab41openaudiocorpus/

Resources

Vendor resources

View this dataset on Github

Support

Contact

https://github.com/voices18/utilities/issues

Managed By

In-Q-Tel

How to cite

Voices Obscured in Complex Environmental Settings (VOiCES) was accessed on DATE from https://registry.opendata.aws/lab41-sri-voices .

License

Creative Commons BY 4.0 (see here for more details)

Similar products

Deepgram Voice AI Nova-3 Monolingual Speech-to-Text (STT) Streaming

By Deepgram

Deepgram is the enterprise Voice AI platform for building and scaling real time voice applications on AWS. This product listing contains multiple versions of the nova-3 model which can each transcribe a set of languages. See version details for more information. You will be billed $0.0077/min as described by https://deepgram.com/pricing. Private pricing available upon request. Our APIs for Nova Speech to Text (STT) are natively available in the new SageMaker Bi-Directional Streaming API. Additional native touchpoints with Amazon Bedrock, Lex, and Amazon Connect make it simple to compose full voice experiences with the cloud services your teams already trust.

View product

Deepgram Voice AI Aura-2 Text-to-Speech (TTS)

By Deepgram

Deepgram is the enterprise Voice AI platform for building and scaling real time voice applications on AWS. This product listing contains multiple versions of the aura-2 model which can each speak a set of languages and voices. See version details for more information. Deepgram charges are billed per request as described by https://deepgram.com/pricing Our APIs for Nova Speech to Text (STT) are natively available in the new SageMaker Bi-Directional Streaming API. Additional native touchpoints with Amazon Bedrock, Lex, and Amazon Connect make it simple to compose full voice experiences with the cloud services your teams already trust.

View product

Deepgram Voice AI Flux Multilingual Speech-to-Text (STT) Streaming

By Deepgram

Deepgram is the enterprise Voice AI platform for building and scaling real time voice applications on AWS. This product listing contains multiple versions of the flux model which can each transcribe a set of languages. See version details for more information. You will be billed $0.0078/min as described by https://deepgram.com/pricing. Our APIs for Nova Speech to Text (STT) are natively available in the new SageMaker Bi-Directional Streaming API. Additional native touchpoints with Amazon Bedrock, Lex, and Amazon Connect make it simple to compose full voice experiences with the cloud services your teams already trust.

View product

Deepgram Voice AI Nova-3 Multilingual Speech-to-Text (STT) Streaming

By Deepgram

Deepgram is the enterprise Voice AI platform for building and scaling real time voice applications on AWS. This product listing contains multiple versions of the nova-3 model which can each transcribe a set of languages. See version details for more information. You will be billed $0.0092/min as described by https://deepgram.com/pricing. Private pricing available upon request. Our APIs for Nova Speech to Text (STT) are natively available in the new SageMaker Bi-Directional Streaming API. Additional native touchpoints with Amazon Bedrock, Lex, and Amazon Connect make it simple to compose full voice experiences with the cloud services your teams already trust.

View product

Deepgram Voice AI Flux Monolingual Speech-to-Text (STT) Streaming

By Deepgram

Deepgram is the enterprise Voice AI platform for building and scaling real time voice applications on AWS. This product listing contains multiple versions of the flux model which can each transcribe a set of languages. See version details for more information. You will be billed $0.0077/min as described by https://deepgram.com/pricing. Private pricing available upon request. Our APIs for Nova Speech to Text (STT) are natively available in the new SageMaker Bi-Directional Streaming API. Additional native touchpoints with Amazon Bedrock, Lex, and Amazon Connect make it simple to compose full voice experiences with the cloud services your teams already trust.

View product