Authors: Zichen Zhu, Yang Xu, Lu Chen, Jingkai Yang, Yichuan Ma, Yiming Sun, Hailin Wen, Jiaqi Liu, Jinyu Cai, Yingzi Ma, Situo Zhang, Zihan Zhao, Liangtai Sun, Kai Yu
Published on: February 05, 2024
Impact Score: 8.3
Arxiv code: Arxiv:2402.03173
Summary
- What is new: Introduction of Multi, a benchmark for evaluating MLLMs with a focus on understanding complex figures, tables, and scientific questions, along with Multi-Elite and Multi-Extend subsets for advanced testing.
- Why this is important: Existing benchmarks for MLLMs focus on simple natural image understanding and fail to address the complexity of real-life, multimodal inputs.
- What the research proposes: Multi provides a comprehensive dataset that includes over 18,000 questions aimed at evaluating MLLMs’ abilities to understand complex, multimodal inputs similar to real-life school tests.
- Results: GPT-4V achieves a 63.7% accuracy rate on Multi, significantly outperforming other MLLMs, which scored between 31.3% and 53.7%, indicating a considerable advancement in MLLM capabilities.
Technical Details
Technological frameworks used: Multi, Multi-Elite, Multi-Extend
Models used: GPT-4V
Data used: Over 18,000 questions including complex figures, tables, and scientific questions from diverse formats
Potential Impact
Educational technology, AI research companies, and tech firms focusing on natural language processing and multimodal AI development
Want to implement this idea in a business?
We have generated a startup concept here: EduSynth.
Leave a Reply