MAMBAPEFT: EXPLORING PARAMETER-EFFICIENT FINE-TUNING FOR MAMBA
Abstract
An ecosystem of Transformer-based models has been established by building large models with extensive data. Parameter-efficient fine-tuning (PEFT) is a crucial technology for deploying these models to downstream tasks with minimal cost while achieving effective performance. Recently, Mamba, a State Space Model (SSM)-based model, has attracted attention as a potential alternative to Trans-formers. While many large-scale Mamba-based models have been proposed, ef-ficiently adapting pre-trained Mamba-based models to downstream tasks remains unexplored. In this paper, we conduct an exploratory analysis of PEFT meth-ods for Mamba. We investigate the effectiveness of existing PEFT methods for Transformers when applied to Mamba. We also modify these methods to bet-ter align with the Mamba architecture. Additionally, we propose new Mamba-specific PEFT methods that leverage the distinctive structure of Mamba. Our experiments indicate that PEFT performs more effectively for Mamba than Trans-formers. Lastly, we demonstrate how to effectively combine multiple PEFT meth-ods and provide a framework that outperforms previous works. The source code is available at: https://github.com/sony/mambapeft.
- 所属
- Sony Group Corporation
- 学会・学術誌
- ICLR
- 年
- 2024
