figshare
Browse

4meta_forgetting_gpt

journal contribution
posted on 2025-03-16, 04:42 authored by Zhigao HuangZhigao Huang
<p dir="ltr">I</p><p><br></p><p><br></p><p><br></p><p dir="ltr"><br></p><p dir="ltr">The method’s effectiveness stems from its precision: broad regularization increases train loss by 237%for</p><p dir="ltr">comparable speed gains, while our targeted approach limits loss increase to 8.6% through layer-wise sensitivity analysis. This</p><p dir="ltr">demonstrates that strategic parameter forgetting can enhance efficiency without compromising linguistic capability, offering a pathway</p><p dir="ltr">to more adaptable and maintainable language models.The method’s effectiveness stems from its precision: broad regularization increases train loss by 237%for</p><p dir="ltr">comparable speed gains, while our targeted approach limits loss increase to 8.6% through layer-wise sensitivity analysis. This</p><p dir="ltr">demonstrates that strategic parameter forgetting can enhance efficiency without compromising linguistic capability, offering a pathway</p><p dir="ltr">to more adaptable and maintainable language models.getting in anguage Models:Learning to Discard for Enhanced Efficiency and Generalizationsdf The method’s effectiveness stems from its precision: broad regularization increases train loss by 237%for</p><p dir="ltr">comparable speed gains, while our targeted approach limits loss increase to 8.6% through layer-wise sensitivity analysis. This</p><p dir="ltr">demonstrates that strategic parameter forgetting can enhance efficiency without compromising linguistic capability, offering a pathway</p><p dir="ltr">to more adaptable and maintainable language models.</p>

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC