Authors: Bufang Yang, Lixing He, Kaiwei Liu, Zhenyu Yan
Published on: April 03, 2024
Impact Score: 7.6
Arxiv code: Arxiv:2404.02508
Summary
- What is new: VIAssist leverages multi-modal large language models (MLLMs) to effectively assist visually impaired individuals by identifying undesired images and providing detailed corrections and reliable answers.
- Why this is important: Visually impaired people find it challenging to capture images that effectively communicate their visual questions for MLLMs due to partial or total inclusion of the target object.
- What the research proposes: The VIAssist tool guides users in taking better images for their queries and uses MLLMs to generate accurate and helpful visual-question answers.
- Results: VIAssist showed improved performance with +0.21 and +0.31 higher BERTScore and ROUGE scores, respectively, compared to the baseline.
Technical Details
Technological frameworks used: VIAssist uses multi-modal large language models for visual understanding and reasoning.
Models used: BERTScore and ROUGE scores for evaluation.
Data used: Images captured by visually impaired individuals.
Potential Impact
Assistive technology markets, companies specializing in AI and computer vision for accessibility solutions.
Want to implement this idea in a business?
We have generated a startup concept here: VisionAid.
Leave a Reply