Skip to content

Blog

Using AutoModel.from_pretrained() In Transformers To Load Customized Model Architecture

To this day, many AI applications and open-source projects are developed based on the HuggingFace transformers package. A large number of models and packages are written to be compatible with the transformers format, and even share the same functions and methods, which makes them more widely accepted.

Under this premise, I came across an open-source training framework that conveniently wraps the automatic reading of Transformer architectures. However, one unavoidable problem is I want to use my custom model for experiments. I tried several solutions, hoping that when using AutoModel.from_pretrained(), by simply providing the local path to my model, I could successfully use my custom model architecture. This article records the method that worked.

Read More »Using AutoModel.from_pretrained() In Transformers To Load Customized Model Architecture

[Solved] RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Problem Description

When building deep learning models in PyTorch, adjusting the shapes of layers and input/output dimensions is something every AI engineer has to deal with. However, there is a small but interesting pitfall in the view() method of PyTorch:

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Read More »[Solved] RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

[Machine Learning] Note of Rotary Position Embedding (RoPE)

Introduction

(Note: Since this article is imported from my personal Hackmd, some symbols and formatting might not display properly in WordPress. I appreciate your understanding, sorry for any inconvenience.)

RoPE is a method for introducing relative position information into the self-attention mechanism through absolute positional encoding.

Read More »[Machine Learning] Note of Rotary Position Embedding (RoPE)