Towards Efficient Training and Inference of Large Transformer Models

He, Haoyu

doi:10.26180/27129075.v1

Towards Efficient Training and Inference of Large Transformer Models

thesis

posted on 2024-09-28, 23:56 authored by Haoyu He

Transformers have revolutionized modern applications but are costly as model sizes grow. This thesis targets efficient training and inference of large Transformer models. We first explore allocating trainable parameters to task-specific positions to boost performance for parameter-efficient fine-tuning (PEFT). We then tailored PEFT efficiently to generate a range of fine-tuned models meeting various hardware constraints through model stitching. Additionally, we offer a novel pruning approach reconfiguring expensive self-attention layers into efficient convolutional layers, creating compact hybrid models. For semantic segmentation, we develop efficient cross-attention layers and a dynamic positional query design, achieving state-of-the-art performance with affordable costs.

History

Campus location

Australia

Principal supervisor

Bohan Zhuang

Additional supervisor 1

Jianfei Cai

Year of Award

2024

Department, School or Centre

Data Science & Artificial Intelligence

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology

Usage metrics

Keywords

Parameter-efficient Fine-tuning Training Efficiency Inference Efficiency Network Pruning Semantic Segmentation

Licence

In Copyright

Towards Efficient Training and Inference of Large Transformer Models

History

Campus location

Principal supervisor

Additional supervisor 1

Year of Award

Department, School or Centre

Course

Degree Type

Faculty

Usage metrics

Categories

Keywords

Licence

Exports