In recent times anomaly detection has become a major topic within the field of AI. It has particularly gained traction within the domain of computer vision with use cases in defect detection, predictive maintenance, etc. and within the domain of structured data with use cases in fraud detection, spam filtering, etc. However, the development of similar techniques with the domain of NLP remains an understudied subject despite its potential.
The goal of this project is to leverage different NLP techniques to arrive at an algorithm that can accurately highlight anomalous words and/or sentences in a document that you wouldn’t expect to appear in that document. This system would likely exploit the repetitive nature of certain types of documents (e.g, rental contracts are 90% the same because they need a certain legal structure). If successful, such an algorithm could have very impactful use cases in fields such as the legal domain, insurance companies, etc. The concrete approach would be to develop such a system on legal documents but we are open to suggestions if there is another field that interests you more where it could also have a big impact.
During this internship, you will:
The duration of the internship can be flexible and depends on the candidate preference and the project requirements. The typical duration is 6 to 8 weeks. The preferred duration for this specific project is 8 weeks.