figshare
Browse

4meta_forgetting_gpt

journal contribution
posted on 2025-03-16, 04:42 authored by Zhigao HuangZhigao Huang

I





The method’s effectiveness stems from its precision: broad regularization increases train loss by 237%for

comparable speed gains, while our targeted approach limits loss increase to 8.6% through layer-wise sensitivity analysis. This

demonstrates that strategic parameter forgetting can enhance efficiency without compromising linguistic capability, offering a pathway

to more adaptable and maintainable language models.The method’s effectiveness stems from its precision: broad regularization increases train loss by 237%for

comparable speed gains, while our targeted approach limits loss increase to 8.6% through layer-wise sensitivity analysis. This

demonstrates that strategic parameter forgetting can enhance efficiency without compromising linguistic capability, offering a pathway

to more adaptable and maintainable language models.getting in anguage Models:Learning to Discard for Enhanced Efficiency and Generalizationsdf The method’s effectiveness stems from its precision: broad regularization increases train loss by 237%for

comparable speed gains, while our targeted approach limits loss increase to 8.6% through layer-wise sensitivity analysis. This

demonstrates that strategic parameter forgetting can enhance efficiency without compromising linguistic capability, offering a pathway

to more adaptable and maintainable language models.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC