With the increase of globalisation and the current rise in numbers of people moving abroad, learning a new language in an affordable and fun way is becoming more and more important. Parlangi is an e-learning provider which connects native speakers and learners of a specific language. They are invited to use a platform for video calls to talk and improve the language skills of the language learner.
As of now, Parlangi provides different topics of conversations every 10 minutes to engage the speakers. Although this approach is relevant to keep the conversation going, it does have a few potential drawback as it can either be:
The goal of this project is to enhance the conversation topic suggestion feature and make it more dynamic. This can help provide a more fun and satisfying experience for the users of the platform.
In order to do so, both the frequency and duration of silence of the conversation as well as the speech frequency of the individual speakers need to be determined. This information can be used to quantify the overall level of engagement of the speakers and suggest conversational topic switch at an appropriate timing.
One method of approaching this problem is by applying speaker diarization techniques on the raw audio recording. Speaker diarization aims at answering the question “who spoke when”. With that, it is feasible to detect both the ‘speech’ moments of the individual speakers as well as the ‘silence’ segments as illustrated below.
This internship isn’t only a great way to leverage your skills working with audio and edge computing but also to do good. Your internship can be rounded up with a blog post where you share your learnings and how you helped Parlangi and its users by improving their experience of learning a new language.
You can take a headstart when working on this project, as some work has already been done. There exist many diarization libraries that already implement diarization pipelines to diarize audio recordings. Some initial exploration of those libraries has been done by ML6. However, there is still much work to be done to put this tool in practice.
During this internship you will:
The duration of the internship can be flexible and depends on the candidate preference and the project requirements. The estimated duration for this specific project is 6-8 weeks:
Our internships and theses are linked to our chapters. A chapter is a cross-squad team of experts in a specific topic to enable knowledge building and sharing across projects. The chapters build knowledge by performing applied research and gathering learnings from projects. This internship falls under the Speech/Audio working group which is part of the Natural Language Processing (NLP) chapter.
Thomas Dehaene: Chapter Lead
Lisa Becker: Machine Learning Engineer and Speech Working Group Lead (daily supervisor)