Authors: Xinchen Wang, Ruida Hu, Cuiyun Gao, Xin-Cheng Wen, Yujia Chen, Qing Liao
Published on: January 24, 2024
Impact Score: 8.3
Arxiv code: Arxiv:2401.13169
Summary
- What is new: The research introduces ReposVul, the first repository-level high-quality vulnerability dataset which overcomes limitations of tangled patches, missing inter-procedural vulnerabilities, and outdated patches in existing datasets.
- Why this is important: Open-Source Software vulnerabilities pose a significant risk and existing datasets for detection have limitations like tangled and outdated patches, and lack of inter-procedural vulnerabilities data.
- What the research proposes: An automated data collection framework that untangles patches, captures inter-procedural relationships, and filters out outdated patches to create a high-quality vulnerability dataset.
- Results: The creation of ReposVul, a repository-level dataset that addresses key limitations of existing datasets and supports more effective vulnerability detection.
Technical Details
Technological frameworks used: Automated data collection framework using Large Language Models and static analysis tools
Models used: Large Language Models (LLMs)
Data used: Open-Source Software vulnerability patches
Potential Impact
Cybersecurity firms, companies relying on OSS for their products, and enterprises offering OSS vulnerability detection and patching solutions.
Want to implement this idea in a business?
We have generated a startup concept here: SecurePath.
Leave a Reply