SLICE-TUNE: A System for High Performance DNN Autotuning

Cho, Junguk

doi:10.1145/3528535.3565247

SLICE-TUNE: A System for High Performance DNN Autotuning

Source

PROCEEDINGS OF THE TWENTY-THIRD ACM/IFIP INTERNATIONAL MIDDLEWARE CONFERENCE, MIDDLEWARE 2022

Author(s)

Dhakal, Aditya

Ramakrishnan, K. K.

Kulkarni, Sameer G.

Sharma, Puneet

Cho, Junguk

DOI

10.1145/3528535.3565247

Abstract

Autotuning DNN models prior to their deployment is an essential but time-consuming task. Using expensive (and power-hungry) GPU and TPU accelerators efficiently is also key. Since DNNs do not always use a GPU fully, spatial multiplexing of multiple models can provide just the right amount of GPU resources for each DNN. We find that a DNN model tuned with the maximum GPU resources has higher inference latency if less GPU resources are available at inference time. We present methods to tune a DNN model, so that we provide the right amount of accelerator resources during tuning. Thus, even when a wide range of GPU resources are available at inference time, the tuned model achieves low inference latency. Further, existing autotuning frameworks take a long time to tune a model due to inefficient utilization of the client and server-side CPU and GPU. Our system, SLICE-TUNE., improves several autotuning frameworks to effciently use system resources by re-thinking the partitioning of tasks between the client and server (where models are profiled on the server GPU), in a Kubernetes environment. We increase parallelism during tuning by sharding the tuning model across multiple tuning application instances, providing concurrent tuning of different operators of a model. We also scale server instances to achieve better GPU multiplexing. SLICE-TUNE. reduces DNN autotuning time in a single GPU and in GPU clusters. SLICE-TUNE. decreases DNN autotuning time by up to 75%, and increase autotuning throughput by a factor of 5, across 3 different autotuning frameworks (TVM, Ansor, and Chameleon).

Publication link

https://dl.acm.org/doi/pdf/10.1145/3528535.3565247

URI

https://d8.irins.org/handle/IITG2025/19470

Subjects

Computer Science