MAMBAPEFT: EXPLORING PARAMETER-EFFICIENT FINE-TUNING FOR MAMBA

Abstract

An ecosystem of Transformer-based models has been established by building large models with extensive data. Parameter-efficient fine-tuning (PEFT) is a crucial technology for deploying these models to downstream tasks with minimal cost while achieving effective performance. Recently, Mamba, a State Space Model (SSM)-based model, has attracted attention as a potential alternative to Trans-formers. While many large-scale Mamba-based models have been proposed, ef-ficiently adapting pre-trained Mamba-based models to downstream tasks remains unexplored. In this paper, we conduct an exploratory analysis of PEFT meth-ods for Mamba. We investigate the effectiveness of existing PEFT methods for Transformers when applied to Mamba. We also modify these methods to bet-ter align with the Mamba architecture. Additionally, we propose new Mamba-specific PEFT methods that leverage the distinctive structure of Mamba. Our experiments indicate that PEFT performs more effectively for Mamba than Trans-formers. Lastly, we demonstrate how to effectively combine multiple PEFT meth-ods and provide a framework that outperforms previous works. The source code is available at: https://github.com/sony/mambapeft.

View Publication

著者: Masakazu Yoshimura

Teruaki Hayashi

Yota Maeda

Owen Mayer

* 外部の著者
所属: Sony Group Corporation
学会・学術誌: ICLR
年: 2024

Back to Publications

テクノロジー

テクノロジー

MAMBAPEFT: EXPLORING PARAMETER-EFFICIENT FINE-TUNING FOR MAMBA

Abstract

Share