Audio & NLP Lab – Information Extraction from Unstructured Data – Department of Electronics and Computer Engineering

Project By: Aayush Lammichhane Aayush Neupane Ankit Paudel Ashish Lamsal

<br />
This project focuses on developing a system to extract key information from unstructured data, such as documents and images. The tool uses an annotation-based pipeline to prepare and train machine learning models, allowing users to process new documents and convert them into structured data.

It incorporates OCR technology to extract text from images and enables users to define specific fields for data extraction through manual annotations. After completing annotations, users can train models to automatically extract similar information from new documents. The processed data can be reviewed, corrected, and exported in CSV or JSON formats for further use.

The goal is to automate and simplify information retrieval from unstructured sources, reducing manual data entry. Applications include invoice processing, form extraction, and digitizing documents, benefiting industries like finance, healthcare, and logistics.

Final Product
EaseAnnotate: Built with advanced OCR technology, EaseAnnotate revolutionizes your extraction experience, making your workflow more productive and hassle-free.

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28