The NLP Chapter is the ML6 special division on all things related to Natural Language Processing.
We try to tackle all relevant areas of Natural Language Processing: all things information extraction, speech recognition, sequence to sequence modelling under the roof of a single internal powerhouse.
We are machine learning engineers at heart.We build and deploy tools, demos and boilerplates to bootstrap ML6 project work and showcase the value of Natural Language Processing in a wide variety of areas.
BUILD
NLP research is happening at dazzling speed. By exploring trending and relevant AI papers and topics, we keep up to date with the latest and greatest in our field.
No transformer left behind!
SHARE
We love to show and share our work. Head over to our use cases to see what we’re up to or have fun with one of our NLP-powered demos.
Come check us out on Huggingface and Github as well!
Demo's
Terms & Conditions Summarizer 📝
Apply state of the art extractive and abstractive summarization on website Terms of Service to get a quick and concise focus on the main points.
With the increase of globalisation and the current rise in numbers of people moving abroad, learning a new language in an affordable and fun way is becoming more and more important. Parlangi is an e-learning provider which connects native speakers and learners of a specific language. They are invited to use a platform for video calls to talk and improve the language skills of the language learner.
As of now, Parlangi provides different topics of conversations every 10 minutes to engage the speakers. Although this approach is relevant to keep the conversation going, it does have a few potential drawback as it can either be:
Intrusive: A disruption of the flow of conversation can occur if both speakers are engaging in the topic of conversation and a topic switch pops up.
Infrequent: It might preferable to suggest a topic switch early on if the topic of conversation is not engaging both speakers (dominated by silence) or the topic is engaging only the native speaker (which defeats the purpose of the video call to improve the non-native speakers' speaking skills).
Goal
The goal of this project is to enhance the conversation topic suggestion feature and make it more dynamic. This can help provide a more fun and satisfying experience for the users of the platform.
In order to do so, both the frequency and duration of silence of the conversation as well as the speech frequency of the individual speakers need to be determined. This information can be used to quantify the overall level of engagement of the speakers and suggest conversational topic switch at an appropriate timing.
One method of approaching this problem is by applying speaker diarization techniques on the raw audio recording. Speaker diarization aims at answering the question “who spoke when”. With that, it is feasible to detect both the ‘speech’ moments of the individual speakers as well as the ‘silence’ segments as illustrated below.
This internship isn’t only a great way to leverage your skills working with audio and edge computing but also to do good. Your internship can be rounded up with a blog post where you share your learnings and how you helped Parlangi and its users by improving their experience of learning a new language.
Methodology
You can take a headstart when working on this project, as some work has already been done. There exist many diarization libraries that already implement diarization pipelines to diarize audio recordings. Some initial exploration of those libraries has been done by ML6. However, there is still much work to be done to put this tool in practice.
An appropriate diarization framework has to be chosen for this task. Different trade-offs in accuracy/detection speed/resources need to be considered.
An extension tool has to be implemented to access the audio stream from the open source video platform.
The results of the diarization algorithm will be used in a control loop algorithm that proposes conversational topics dynamically.
Implement an extension tool to access the raw audio data from the video platform.
Develop an end-to-end solution for dynamic topic suggestion and integrate it with the Parlangi platform.
Write a blog post summarising your work..
Do some good!
Profile / Required skills
Strong analytical abilities, knowledge of different statistical methods and a familiarity with research studies.
Working experience in Java development to build a tool that interfaces with the video platformi.
Strong interest in Speech/Audio processing [preferred].
Familiarity with tools like Python.
Excellent verbal and written communication in English.
You are currently pursuing a degree in computer science or related field.
Internship Duration
The duration of the internship can be flexible and depends on the candidate preference and the project requirements. The estimated duration for this specific project is 6-8 weeks:
Week 1: Getting familiar with SoTA diarization algorithms and the open source video platform.
Week 2-3: Build a tool that interfaces with the video platform to get an audio stream.
Week 4-5: Integrate the diarization algorithm with the audio stream and build the control flow logic for the dynamic topic suggestions
Week 6: Validate the results of the algorithm and write a blogpost
Chapters
Our internships and theses are linked to our chapters. A chapter is a cross-squad team of experts in a specific topic to enable knowledge building and sharing across projects. The chapters build knowledge by performing applied research and gathering learnings from projects. This internship falls under the Speech/Audio working group which is part of the Natural Language Processing (NLP) chapter.
Supervisors
Thomas Dehaene: Chapter Lead
Lisa Becker: Machine Learning Engineer and Speech Working Group Lead (daily supervisor)