10 February 2024

Towards Deterministic End-to-end Latency for Medical AI Systems in NVIDIA Holoscan

Written by Startup Idea

Authors: Soham Sinha, Shekhar Dwivedi, Mahdi Azizian

Published on: February 06, 2024

Impact Score: 8.3

Arxiv code: Arxiv:2402.04466

Summary

What is new: A system design optimized for heterogeneous GPU workloads in medical devices, leveraging CUDA MPS for spatial partitioning and isolating compute and graphics processing onto separate GPUs.
Why this is important: Unpredictable end-to-end latency in medical devices due to concurrent execution of several AI applications on a single platform.
What the research proposes: A system design that reduces latency and improves performance predictability for concurrent and heterogeneous GPU workloads on NVIDIA’s Holoscan platform.
Results: Reduces maximum latency by 21-30% and improves latency distribution flatness by 17-25% for up to five concurrent applications; decreases maximum latency by 35% and improves GPU utilization by 42% for up to six concurrent applications.

Technological frameworks used: CUDA MPS

Models used: nan

Data used: Real-world Holoscan medical device applications

Medical device manufacturing, healthcare diagnostics, edge-computing solutions

We have generated a startup concept here: MediCore AI Solutions.