Anomaly detection in text

Project Description 

In recent times anomaly detection has become a major topic within the field of AI. It has particularly gained traction within the domain of computer vision with use cases in defect detection, predictive maintenance, etc. and within the domain of structured data with use cases in fraud detection, spam filtering, etc. However, the development of similar techniques with the domain of NLP remains an understudied subject despite its potential.

The goal of this project is to leverage different NLP techniques to arrive at an algorithm that can accurately highlight anomalous words and/or sentences in a document that you wouldn’t expect to appear in that document. This system would likely exploit the repetitive nature of certain types of documents (e.g, rental contracts are 90% the same because they need a certain legal structure). If successful, such an algorithm could have very impactful use cases in fields such as the legal domain, insurance companies, etc. The concrete approach would be to develop such a system on legal documents but we are open to suggestions if there is another field that interests you more where it could also have a big impact.

Methodology / Tasks

During this internship, you will:

  • Gain experience leveraging state-of-the-art NLP techniques.
  • Conduct applied research to solve a concrete real-world problem.
  • Let your creative skills loose on a major problem for many companies in a wide range of domains.

Profile / Required skills

  • Strong analytical abilities, knowledge of different statistical methods, not scared by mathematics and a familiarity with research studies.
  • Strong interest in Computer Vision / NLP / Other subdomain [preferred]
  • Familiarity with statistical analysis languages and tools like Python, SQL.
  • Excellent verbal and written communication in English.
  • You are currently pursuing a degree in computer science or related field.

Internship Duration

The duration of the internship can be flexible and depends on the candidate preference and the project requirements. The typical duration is 6 to 8 weeks. The preferred duration for this specific project is 8 weeks.