Our Services

This is your Services Page. It's a great opportunity to provide information about the services you provide. Double click on the text box to start editing your content and make sure to add all the relevant details you want to share with site visitors.

​Three types of big audio datasets with different characteristics are collected over the years. It can be used for training and improving machine learning and deep learning AI models. Also we have a pre-trained model for each dataset. Gathering datasets is supported by Korea government, and the data was double-checked by NIPA, a government agency. ​Each datasets can be used for commercial or academic use at different prices. Detailed descriptions and statistics are on each datasets page below, and listen to samples.

Please do not hesitate to contact us about price inquiries - contact@deeplyinc.com.

Datasets Introduction

1. Nonverbal Vocalization Data

Non-verbal voice data that does not contain language. There are a total of 16 types of nonverbal vocal sounds, including screams, laughter, cries, moans, and tickling sounds. With 57 hours of data collected from 1419 people, the quality of the data was confirmed through double inspection.

2. Parent- Child Vocal Interaction Data

Various conversations between parents and children. It consists of voice interaction data of 8 classes, such as talking, singing, crying, etc. A total of 282 hours of data was confirmed through double inspection. In particular, the same sound was recorded on both types of cell phones (iPhone X, Samsung Galaxy S7) under the distance conditions of 0.4m, 2.0m, and 4.0m. In addition, taking into account the characteristics of the recording space, each piece of data is recorded in a room, studio, or anechoic room.

3. Emotional Speech Corpus

Voice data with various emotions. We recorded when sentences contain positive, neutral, or negative meanings with neutral emotion. Other sentences were recorded by the sound of the speaker with positive, neutral, or negative emotions. A total of 290 hours of data was confirmed through double inspection. Also, the same sound was recorded on both types of cell phones (iPhone X, Samsung Galaxy S7) under the distance conditions of 0.4m, 2.0m, and 4.0m. In addition, taking into account the characteristics of the recording space, each piece of data is recorded in a room, studio, or anechoic room.

 

Secure datasets are already being used by large companies, research institutes, and universities to improve speech, nonverbal, and emotional AI analysis.

Information can also be found at the following link.

* github link

* blog post

What Our Clients Say

Name, Title

I'm a testimonial. Click to edit me and add text that says something nice about you and your services. Let your customers review you and tell their friends how great you are.