11 February 2024

FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping

Written by Startup Idea

Scientific Papers Academic AI Research Tool, AI startups, algorithm efficiency, B2B partnerships, Biotech Companies, challenges: model updates, data privacy, integration, Cloud service providers, collaborative machine learning, Data Security AI, Disruptive Technology, enterprise level features, Ethical AI use, GPU support, InferFlow, low latency, regulations compliance, serverless platform, Subscription-based Model Leave a Comment

Authors: Minchen Yu, Ao Wang, Dong Chen, Haoxuan Yu, Xiaonan Luo, Zhuohao Li, Wei Wang, Ruichuan Chen, Dapeng Nie, Haoran Yang

Published on: June 06, 2023

Impact Score: 8.22

Arxiv code: Arxiv:2306.03622

Summary

What is new: The introduction of FaaSwap, a GPU-efficient serverless inference platform that dynamically swaps models onto GPUs to enable efficient sharing and meets latency SLOs.
Why this is important: Current serverless platforms do not efficiently support GPUs, limiting low-latency inference for machine learning.
What the research proposes: FaaSwap utilizes asynchronous API redirection, GPU runtime sharing, pipelined model execution, and efficient GPU memory management with an interference-aware request scheduling algorithm.
Results: FaaSwap can serve hundreds of functions on a single node with 4 V100 GPUs, achieving performance comparable to native execution. On a 6-node testbed, it meets latency SLOs for over 1k functions.

Technical Details

Technological frameworks used: FaaSwap

Models used: Interference-aware request scheduling algorithm

Data used: Real-world use cases on a leading commercial serverless platform

Potential Impact

Serverless computing providers and companies relying on machine learning inference could significantly benefit; may disrupt traditional cloud computing models.

Want to implement this idea in a business?

We have generated a startup concept here: InferFlow.

HIIDDEN

Startup Idea

Leave a Reply Cancel reply