Extractive text summarization

Project Description 

In recent times the general trend towards automation has meant that use cases which involve processing large amounts of data are becoming automated. The reasons for this are quite obvious: these are often repetitive time-consuming tasks that are prone to human error and lend themselves to being automated. However, still one of such tasks remains and that is reading. Unfortunately, we can’t automate reading but we can make it faster by highlighting the key information in the text.

The goal of this project is to develop an algorithm that highlights the most important information in a document. However, how we define important information requires some creativity (i.e, it could mean sentences that summarise the text, sentences that are unexpected, etc.). Concretely, we see a major use case for legal texts where we can also exploit the repetitive nature of such documents (e.g, rental contracts are 90% the same because they need a certain legal structure) but we are open to suggestions if there is another field that interests you more where it could also have a big impact.

Methodology / Tasks

During this internship, you will:

  • Gain experience leveraging state-of-the-art NLP techniques.
  • Conduct applied research to solve a concrete real-world problem.
  • Let your creative skills loose on a major problem for many companies in a wide range of domains.

Profile / Required skills

  • Strong analytical abilities, knowledge of different statistical methods, not scared by mathematics and a familiarity with research studies.
  • Strong interest in Computer Vision / NLP / Other subdomain [preferred]
  • Familiarity with statistical analysis languages and tools like Python, SQL.
  • Excellent verbal and written communication in English.
  • You are currently pursuing a degree in computer science or related field.

Internship Duration

The duration of the internship can be flexible and depends on the candidate preference and the project requirements. The typical duration is 6 to 8 weeks. The preferred duration for this specific project is 6 weeks.