Authors: Oshri Naparstek, Roi Pony, Inbar Shapira, Foad Abo Dahood, Ophir Azulai, Yevgeny Yaroker, Nadav Rubinstein, Maksym Lysak, Peter Staar, Ahmed Nassar, Nikolaos Livathinos, Christoph Auer, Elad Amrani, Idan Friedman, Orit Prince, Yevgeny Burshtein, Adi Raz Goldfarb, Udi Barzelay
Published on: May 01, 2024
Impact Score: 7.4
Arxiv code: Arxiv:2405.00505
Summary
- What is new: Introduction of a new dataset, KVP10k, for key-value pair (KVP) extraction without predefined keys, focusing on diverse templates and complex document layouts.
- Why this is important: Current datasets primarily support Key Information Extraction (KIE) with predefined keys, limiting their use in complex, varied document layouts.
- What the research proposes: The creation of a new, richly annotated dataset and benchmark, KVP10k, designed for extracting KVPs from complex documents without relying on predefined keys.
- Results: KVP10k contains 10707 images, offering an unprecedented resource for the development and testing of information extraction technologies.
Technical Details
Technological frameworks used: nan
Models used: nan
Data used: KVP10k dataset
Potential Impact
Businesses in legal, financial, and administrative sectors seeking to automate document processing and information extraction could significantly benefit.
Want to implement this idea in a business?
We have generated a startup concept here: DocExtractAI.
Leave a Reply