Skip to content

[Machine Learning] Note of LayerNorm

[Machine Learning] Note of Rotary Position Embedding (RoPE)

Last Updated on 2024-08-17 by Clay

Introduction

(Note: Since this article is imported from my personal Hackmd, some symbols and formatting might not display properly in WordPress. I appreciate your understanding, sorry for any inconvenience.)

RoPE is a method for introducing relative position information into the self-attention mechanism through absolute positional encoding.

Read More »[Machine Learning] Note of Rotary Position Embedding (RoPE)

[Paper Reading] Lifting the Curse of Multilinguality by Pre-training Modular Transformers

Last Updated on 2024-08-19 by Clay

Cross-lingual Modular (X-Mod) is an interesting language model architecture that modularizes the parameters for different languages as Module Units, allowing the model to use separate parameters when fine-tuning for a new language, thereby (comparatively) avoiding the problem of catastrophic forgetting.

Read More »[Paper Reading] Lifting the Curse of Multilinguality by Pre-training Modular Transformers
Exit mobile version