Transformers have revolutionized modern applications but are costly as model sizes grow. This thesis targets efficient training and inference of large Transformer models. We first explore allocating trainable parameters to task-specific positions to boost performance for parameter-efficient fine-tuning (PEFT). We then tailored PEFT efficiently to generate a range of fine-tuned models meeting various hardware constraints through model stitching. Additionally, we offer a novel pruning approach reconfiguring expensive self-attention layers into efficient convolutional layers, creating compact hybrid models. For semantic segmentation, we develop efficient cross-attention layers and a dynamic positional query design, achieving state-of-the-art performance with affordable costs.