使用 Microsoft BIPIA 框架評估 LLM 防禦能力

Last Updated on 2024-08-30 by Clay

現在 LLM 的服務已經涵蓋了各式各樣的領域，而提示注入（Prompt Injection）和越獄（Jailbreak）對 LLM 的威脅也是與日俱增，幾個月前甚至有客服 LLM 給了客戶錯誤的資訊導致權益受損呢（雖然那不是 Prompt 攻擊造成的）。

而 Microsoft 所開源的 BIPIA（Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models）評估方法，雖然已經是半年前的測試了，到現在也沒什麼大更新，但是應用在我手邊的任務中，仍不失為一個方便簡潔的測試方法。

值得一提的是，我的服務是使用 vLLM 啟動的 Gemma-2-9B，是支援使用 OpenAI 套件的請求格式的（只需要改換網址），所以沿用原本 BIPIA 對 OpenAI API 的請求程式是沒有問題的。

但是 BIPIA 是使用舊版本的 openai 套件，所以有些與最新套件不同的部份，是一定需要切換回舊版的。

BIPIA 使用方式

啟動 vLLM 服務的部份，可以參考我之前的文章筆記：

安裝 BIPIA

git clone [email protected]:microsoft/BIPIA.git
cd BIPIA
pip3 install -e .

新建 yaml 設定檔

我們需要在 BIPIA/config/xxx.yaml 底下，建立一個屬於自己的設定檔，我是取名為 gemma_2_9b.yaml。

api_key: 我亂填的，保持跟 OpenAI 請求一樣的格式
api_type: 看起來似乎要填 openai，BIPIA 才會按照這個格式發送請求
model: 我的 vLLM 需要看這個名稱去對模型服務發送請求
api_base: 我的 vLLM 服務位置
llm_name: “gpt4″，固定沿用這個名稱，不會真的發請求給 OpenAI

api_key: "test"
api_type: "openai"
model: "/workspace/llm_model"
api_base: "http://192.168.1.78:4896/v1"
chat: True
llm_name: "gpt4"

測試範例程式（來源為官方 demo.ipynb）

首先，匯入會需要用到的套件。

from bipia.data import AutoPIABuilder
from bipia.model import AutoLLM

from functools import partial
import jsonlines
from pathlib import Path

from datasets import Dataset
from accelerate import Accelerator

確認使用的資料與參數。

# dataset args
seed = 2023 # fix the seed as 2023 to reinplement the same results in our paper
dataset_name = "email" # "code", "qa", "abstract", "table" for other subsets
context_data_file = "./BIPIA/benchmark/email/test.jsonl"
attack_data_file = "./BIPIA/benchmark/text_attack_test.json" # for emailQA task use text attacks

# model args
tensor_parallel_size = 1
llm_config_file = "./BIPIA/config/gemma_2_9b.yaml"

# output args
output_path = "./BIPIA/output/vicuna_7b.jsonl"

建立 dataset。

pia_builder = AutoPIABuilder.from_name(dataset_name)(seed)
pia_samples = pia_builder(
    context_data_file,
    attack_data_file,
    enable_stealth=False,
)

pia_dataset = Dataset.from_pandas(pia_samples)
pia_dataset[0]

準備要使用的模型物件，以及準備測試用的資料。

accelerator = Accelerator()

llm = AutoLLM.from_name(llm_config_file)(
    config=llm_config_file,
    accelerator=accelerator,
    tensor_parallel_size=tensor_parallel_size,
)

def rename_target(example):
    example["target"] = example["ideal"]
    return example

with accelerator.main_process_first():
    processed_datasets = pia_dataset.map(
        rename_target,
        desc="Processing Indirect PIA datasets (Rename target).",
    )

    processed_datasets = processed_datasets.map(
        partial(
            llm.process_fn,
            prompt_construct_fn=partial(
                pia_builder.construct_prompt,
                require_system_prompt=llm.require_system_prompt,
                ign_guidance=""
            ),
        ),
        desc="Processing Indirect PIA datasets.",
    )

接下來是個有點麻煩的小地方：Gemma-2 的 Chat Template 是不支援 system role 的！所以很多的測試資料集其實是無法套用在 Gemma-2 的 Chat Template 自動格式化的（過不了 Jinja 那一關）。

但是我這個評估的腳本又是需要與 Llama-3 等可以使用 system role 的 LLM 做評估比較的，所以只好在這裡多一個判斷式，把不支援 system role 的模型需要用到的資料，轉換成只有 user 的版本：

NOT_SUPPORT_SYSTEM_ROLE = True


def insert_user_assistant_replace_system(example):
    new_system_prompt = example["message"][0]
    new_system_prompt["role"] = "user"

    new_assistant_prompt = {
        "role": "assistant",
        "content": "ok!",
    }

    new_message = [new_system_prompt, new_assistant_prompt] + example["message"][1:]
    example["message"] = new_message

    return example

if NOT_SUPPORT_SYSTEM_ROLE:
    processed_datasets = processed_datasets.map(insert_user_assistant_replace_system)

簡單來說，我會把本來：

[
  {
    "role": "system",
    "content": "SYSTEM INSTRUCTIONS..."
  },
  {
    "role": "user",
    "content": "USER QUESTION..."
  }
]

轉換成：

[
  {
    "role": "user",
    "content": "SYSTEM INSTRUCTIONS..."
  },
  {
    "role": "assistant",
    "content": "ok!"
  },
  {
    "role": "user",
    "content": "USER QUESTION..."
  }
]

經過我的一些簡單測試，這樣不支援 system role 的模型效果也不會差太多，所以我就這樣沿用開始測試了。

接下來是最久的部份：模型生成回覆。

responses = llm.generate(processed_datasets)

生成結束後，我們可以把資料儲存下來、進行評估：

out = []
for attack_name, task_name, target, response, message, position in zip(
    processed_datasets["attack_name"],
    processed_datasets["task_name"],
    processed_datasets["target"],
    responses,
    processed_datasets["message"],
    processed_datasets["position"],
):
    out.append(
        {
            "attack_name": attack_name,
            "task_name": task_name,
            "response": response,
            "message": message,
            "target": target,
            "position": position,
        }
    )

output_path = Path(output_path)
output_path.parent.mkdir(exist_ok=True, parents=True)

with jsonlines.open(output_path, "w") as writer:
    writer.write_all(out)


from bipia.metrics import BipiaEvalFactory
from datasets import load_dataset
import numpy as np


# add your openai API key / Azure AOAI key in the ./config/gpt35.yaml to get my_gpt35.yaml
gpt_config_file = "./BIPIA/config/gemma_2_9b.yaml"
attacks = [f"Emoji Substitution-{i}" for i in range(5)]

ds = load_dataset("json", data_files="./BIPIA/output/vicuna_7b.jsonl", split="train")
ds = ds.filter(lambda x: x["attack_name"] in attacks)

evaluator = BipiaEvalFactory(
    gpt_config=gpt_config_file,
    activate_attacks=attacks,
)

asrs = evaluator.add_batch(
    predictions=ds["response"],
    references=ds["target"],
    attacks=ds["attack_name"],
    tasks=ds["task_name"],
)

avg_asr = np.mean(asrs)
print("The average ASR of Emoji Substitution is: ", avg_asr)

Output:

The average ASR of Emoji Substitution is:  0.113

在我微調的 Gemma-2-9B 上，其攻擊成功率（Attack Success Rate, ASR）遠比 Vicuna-7B 來得低很多（官方測試為 0.824），果然 Gemma-2-9B 的基底就是好啊！有種勝之不武的感覺哈哈哈哈。

接下來我還預計要測試不同的模型與資料集，但可能會參考 BIPIA 的流程與架構，重構一下原始碼，然後把 OpenAI 的舊套件版本使用方式抽換成新版本，這樣還能兼容我 RAGAS 那邊的 RAG 評估原始碼。

References

[論文閱讀] RAGAS: Automated Evaluation of Retrieval Augmented Generation

Meta-llama–Prompt-Guard-86M: 提示防護的開源模型，偵測惡意攻擊 Prompt

使用 Microsoft BIPIA 框架評估 LLM 防禦能力

BIPIA 使用方式

安裝 BIPIA

新建 yaml 設定檔

測試範例程式（來源為官方 demo.ipynb）

References

Read More

相關

Leave a Reply取消回覆

使用 Microsoft BIPIA 框架評估 LLM 防禦能力

BIPIA 使用方式

安裝 BIPIA

新建 yaml 設定檔

測試範例程式（來源為官方 demo.ipynb）

References

Read More

分享此文：

相關

Leave a Reply取消回覆