Authors: Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y.K. Li, Y. Wu, Daya Guo
Published on: February 05, 2024
Impact Score: 8.22
Arxiv code: Arxiv:2402.033
Summary
- What is new: Introduction of DeepSeekMath 7B, a language model specifically trained for mathematical reasoning, marking a significant improvement over existing models by utilizing an expansive dataset and introducing a novel optimization technique.
- Why this is important: Mathematical reasoning is challenging for language models due to its complexity and structured nature.
- What the research proposes: Training DeepSeekMath 7B with 120B math-related tokens and introducing Group Relative Policy Optimization (GRPO) for enhancing mathematical reasoning capabilities.
- Results: DeepSeekMath 7B achieved 51.7% on the MATH benchmark, with self-consistency achieving 60.9%, approaching the performance of leading models without external aids.
Technical Details
Technological frameworks used: DeepSeek-Coder-Base-v1.5
Models used: DeepSeekMath 7B
Data used: 120B math-related tokens from Common Crawl, natural language, and code data
Potential Impact
E-learning platforms, academic institutions, educational content providers, tech companies specializing in AI-driven tutoring and problem-solving tools
Want to implement this idea in a business?
We have generated a startup concept here: MathematiqAI.
Leave a Reply