Authors: Yuda Song, Zehao Sun, Xuanwu Yin
Published on: March 25, 2024
Impact Score: 7.4
Arxiv code: Arxiv:2403.16627
Summary
- What is new: Introduction of a dual approach involving model miniaturization and reduction in sampling steps for diffusion models, aiming at decreasing model latency significantly.
- Why this is important: Diffusion models, while superior in performance for image generation, suffer from complex architectures and high computational demands, leading to significant latency.
- What the research proposes: The solution involves leveraging knowledge distillation to streamline the U-Net and image decoder architectures, coupled with a novel one-step DM training technique that utilizes feature matching and score distillation.
- Results: Achieved inference speeds of approximately 100 FPS (30x faster than SD v1.5) and 30 FPS (60x faster than SDXL) on a single GPU, with promising applications in image-conditioned control for efficient image-to-image translation.
Technical Details
Technological frameworks used: U-Net, image decoder
Models used: SDXS-512, SDXS-1024
Data used: Knowledge distillation, feature matching, score distillation
Potential Impact
Image generation, Gaming, Virtual Reality, Augmented Reality, Media production, Autonomous systems
Want to implement this idea in a business?
We have generated a startup concept here: FastFrame.
Leave a Reply