Skip to content

Papers

[Paper Reading] Lifting the Curse of Multilinguality by Pre-training Modular Transformers

Cross-lingual Modular (X-Mod) is an interesting language model architecture that modularizes the parameters for different languages as Module Units, allowing the model to use separate parameters when fine-tuning for a new language, thereby (comparatively) avoiding the problem of catastrophic forgetting.

Read More »[Paper Reading] Lifting the Curse of Multilinguality by Pre-training Modular Transformers

[Paper Reading] RAGAS: Automated Evaluation of Retrieval Augmented Generation

Introduction

The year 2023 witnessed an explosion of generative AI technologies, with a myriad of applications emerging across various domains. In the field of Natural Language Processing (NLP), Large Language Models (LLMs) stand out as one of the most significant advancements. By training LLMs effectively and reducing hallucinations, they can significantly reduce human effort across a wide range of tasks.

Read More »[Paper Reading] RAGAS: Automated Evaluation of Retrieval Augmented Generation

[Paper Reading] Mistral 7B

Introduction

Mistral 7B is a large language model (LLM) proposed on September 27, 2023, trained by the Mistral AI team, which also released its weights as open source. Interestingly, it uses the highly permissive Apache 2.0 license, unlike Llama 2, which has its own Llama license terms. Therefore, Mistral 7B is truly "open source" (Llama's license requires discussion with Meta AI when the service volume reaches 700 million).

Read More »[Paper Reading] Mistral 7B

[Paper Reading] Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

Introduction

The accelerated framework is proposed by Huawei Noah's Ark Lab, it replaces the small model used in the original speculative decoding with the shallow sub-network of the large model. Additionally, it employs an extra-trained adapter and the model’s own decoding head to generate speculative tokens, which are then verified by the large model. The subsequent operations are quite similar to the original speculative decoding process.

Read More »[Paper Reading] Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

[Paper Reading] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Introduction

RAG-based LLM is a well-known architecture in current usage of Large Language Models (LLM). It involves "retrieval" to provide the model with prior knowledge that it lacks during training, enabling the model to answer questions in the context of specific information.

Read More »[Paper Reading] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

ImageBind: A Experience Notes on a Multimodal Vector Transformation Model

Introduction

Meta AI has indeed been incredibly powerful recently, seemingly securing its position as a giant in AI research and development in no time at all, and what's more, it sets the bar high with all its top-tier open-source contributions. From Segment Anything that can segment objects in the image domain, to the public large language model and foundational model, LLaMA (yes, the one causing the llama family appear!), to the recent ImageBind that can transform six modalities and the Massively Multilingual Speech (MMS) project... I must say, for an ordinary person like me, it's quite an effort to keep up with how to use these technologies, let alone trying to chase their technical prowess.

Read More »ImageBind: A Experience Notes on a Multimodal Vector Transformation Model