4meta_forgetting_gpt

Huang, Zhigao

doi:10.6084/m9.figshare.28602614.v1

4meta_forgetting_gpt

journal contribution

posted on 2025-03-16, 04:42 authored by Zhigao HuangZhigao Huang

I

The method’s effectiveness stems from its precision: broad regularization increases train loss by 237%for

comparable speed gains, while our targeted approach limits loss increase to 8.6% through layer-wise sensitivity analysis. This

demonstrates that strategic parameter forgetting can enhance efficiency without compromising linguistic capability, offering a pathway

to more adaptable and maintainable language models.The method’s effectiveness stems from its precision: broad regularization increases train loss by 237%for

comparable speed gains, while our targeted approach limits loss increase to 8.6% through layer-wise sensitivity analysis. This

demonstrates that strategic parameter forgetting can enhance efficiency without compromising linguistic capability, offering a pathway

to more adaptable and maintainable language models.getting in anguage Models:Learning to Discard for Enhanced Efficiency and Generalizationsdf The method’s effectiveness stems from its precision: broad regularization increases train loss by 237%for

comparable speed gains, while our targeted approach limits loss increase to 8.6% through layer-wise sensitivity analysis. This

demonstrates that strategic parameter forgetting can enhance efficiency without compromising linguistic capability, offering a pathway

4meta_forgetting_gpt

History

Usage metrics

Categories

Keywords

Licence

Exports