UnMute Toolkit

Applications, methodologies, pipelines for minority language technology design and development

Register your interest

We would like to invite you to the UnMute toolkit launch event at IIT Guwahati on Friday 5th January 2024. The event has two aims: (1) to demonstrate and disseminate the UnMute Toolkit for spoken language technology development; and (2) to provide opportunities for networking, collaboration, idea development surrounding spoken language technologies, and matching these to national and international funding.

The Toolkit

The UnMute Toolkit is an open-source collection of tools, methodologies, and pipelines tailored to speakers of un-written, minority languages. It contains components to engage community members, collect spoken language recordings, train information retrieval models, and deploy those models in community contexts. The toolkit has been designed to involve community members in the co-design of information retrieval use-cases without any existing digital language resources. Interactive information retrieval systems can be developed from scratch using the toolkit and are seeded with as little spoken language data as is available (e.g. <4h) and requires no transcription or transliteration.

The Launch

The UnMute Toolkit will be launched in collaboration with the Centre for Linguistic Science and Technology at the Indian Institute of Technology Guwahati in Guwahati, India on Friday January 5th, 2024.

During this launch, we will bring together interdisciplinary researchers, NGOs, industrial partners, and minority language speakers. We will demonstrate the toolkit and reflect on the lessons learned developing it. These will serve as a case-study to stimulate discussions involving attendees to identify technical, social, and participatory challenges; to form new interdisciplinary partnerships to tackle challenges; and to match these to national and international funding opportunities.

Event Details

The event is free to attend, and will include meals and refreshments. Attendees will need to fund their own travel and other costs.

Register your interest

Outline agenda — 5th January 2024

This is a day-long event, where the first and last sessions will focus on the toolkit and provide an interactive demonstration of how it can be utilised to engage communities and develop a rudimentary information retrieval system using only spoken language samples collected on the day. In between these sessions, we will network and collaborate on research agendas on the future of speech technologies in India and how these can better support the needs and functions of diverse minority language communities.

Frequently Asked Questions

What are minority languages?

In India, minority languages are those languages spoken by Linguistic Minorities, which at the State level means any group or groups of people whose mother tongues are different from the principal language of the State, and at the district and taluka /tehsil levels, different from the principal language of the district or taluka/tehsil.

What is the UnMute toolkit?

The UnMute Toolkit is an open-source collection of tools, methodologies, and pipelines tailored to speakers of un-written, minority languages. It contains components for creating language models, voice search, mobile apps for data-collection and analysis, and a methodology handbook.

Is the toolkit part an open-source resource?


How will the toolkit help language community members? How will community members benefit from this event?

Speech recognition technology is often created without much involvement from community members who provide the speech and/or text data that is critical to the process. Our approach puts community members at the centre of speech technology work, making sure they are involved in the design and development process, and are the primary beneficiaries.

Is there a set of follow-up activities which lie ahead?

We will be running additional events in mid-2024, but are always open to discussion and collaborations. Please get in touch: hi@unmute.tech

What are the ethical considerations with respect to the collected data? How will attribution work? Who would own the data?

Ethical considerations about data-collection and use are at the centre of this work. Our fundamental belief is that communities should be the owners of their own data. But we also believe that this is a topic that should be discussed further. There will be a session about this important aspect during the event.

About Your Hosts

The Centre for Linguistic Science and Technology is an interdisciplinary centre at the Indian Institute of Technology Guwahati that focuses on minority languages of India and empowering communities in the NE region.

UnMute is a collaborative EPSRC-funded project between the University of Edinburgh, Swansea University, and Studio Hasi, aiming to address the limitations of today’s speech and voice-based interactions and open up intelligent interfaces to the currently digitally ‘unheard’.


If you have any questions about the event or toolkit, please do not hesitate to get in touch with us at: launch@unmute.tech