The UnMute Toolkit

Open-source tools for minority language technology design and development


The UnMute Toolkit is a collection of tools, methodologies and pipelines tailored to minority language technology design and development. It contains components to engage community members, collect spoken language recordings, train information retrieval models and deploy those models in community contexts.


All of the technologies developed through the UnMute project are open-source, and available to freely download, use and adapt.


The SpeechBox enables community members with little technological exposure to contribute their experiences about a topic of interest to a wider public audio collection. Located in public spaces, the device requires no prior training or additional devices to be used, and allows community members to share verbal narratives at the touch of a button.

Build a SpeechBox

Data Gathering

Data gathering can be challenging when working with a diverse range of environments and technology experience levels. We describe how two existing and widely-used smartphone-based tools can be used effectively for this purpose.

Read about Data Gathering methods


The TranscriptTool is a bespoke mobile application that engages community members directly in annotating community-generated speech data. The goal is to acquire accurate and thorough transcriptions of spoken content.

Use the TranscriptTool

VoiceServer and RetrievalApp

The VoiceServer information retrieval system is designed to allow speech-based search over a collection of user-contributed photos. It is split into two components in order to work with limited data: a phone recogniser and a ranker. It is paired with the RetrievalApp – a simple Android application that allows users to interact with stored information using voice search.

Deploy the VoiceServer Build the RetrievalApp


A launch event for the UnMute Toolkit was held in collaboration with the Centre for Linguistic Science and Technology at the Indian Institute of Technology Guwahati in Guwahati, India on Friday January 5th, 2024. The event brought together interdisciplinary researchers, NGOs, industrial partners, and minority language speakers to demonstrate the toolkit and reflect on the lessons learned developing it.