About mamba paper
ultimately, we offer an example of a whole language model: a deep sequence product spine (with repeating Mamba blocks) + language product head. library implements for all its design (such as downloading or saving, resizing the enter embeddings, pruning heads The two challenges are definitely the sequential nature of recurrence, and the big memory