UnMute: Opening Spoken Language Interaction to the Currently Unheard

Overview

The UnMute Toolkit is a collection of tools, methodologies and pipelines tailored to minority language technology design and development. It contains components to engage community members, collect spoken language recordings, train information retrieval models and deploy those models in community contexts.

Components

All of the technologies developed through the UnMute project are open-source, and available to freely download, use and adapt.

SpeechBox

The SpeechBox enables community members with little technological exposure to contribute their experiences about a topic of interest to a wider public audio collection. Located in public spaces, the device requires no prior training or additional devices to be used, and allows community members to share verbal narratives at the touch of a button.

Build a SpeechBox

Data Gathering

Data gathering can be challenging when working with a diverse range of environments and technology experience levels. We describe how two existing and widely-used smartphone-based tools can be used effectively for this purpose.

Read about Data Gathering methods

TranscriptTool

The TranscriptTool is a bespoke mobile application that engages community members directly in annotating community-generated speech data. The goal is to acquire accurate and thorough transcriptions of spoken content.

Use the TranscriptTool

VoiceServer and RetrievalApp

The VoiceServer information retrieval system is designed to allow speech-based search over a collection of user-contributed photos. It is split into two components in order to work with limited data: a phone recogniser and a ranker. It is paired with the RetrievalApp – a simple Android application that allows users to interact with stored information using voice search.

Deploy the VoiceServer Build the RetrievalApp

The UnMute Toolkit

Open-source tools for minority language technology design and development