Skip to content

Evaluating LLM Defense Capabilities Using the Microsoft BIPIA Framework

Last Updated on 2024-08-30 by Clay

Currently, LLM services cover a wide range of fields, and Prompt Injection and Jailbreak threats to LLMs are growing by the day. A few months ago, a customer service LLM even provided incorrect information, leading to a loss of customer rights (although that wasn't caused by a prompt attack).

Microsoft's open-source BIPIA (Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models) evaluation method, although tested six months ago without significant updates since, remains a simple and convenient testing method for the tasks I have at hand.

It is worth noting that my service is launched using vLLM with the Gemma-2-9B model, which supports requests in OpenAI format (with just a URL change). Therefore, the original BIPIA request program for the OpenAI API can still be used without issues.

However, BIPIA uses an old version of the OpenAI package, so certain parts that differ from the latest package will need to be reverted to the old version.


How to Use BIPIA

For the part on launching the vLLM service, you can refer to my previous blog notes:


Install BIPIA

git clone [email protected]:microsoft/BIPIA.git
cd BIPIA
pip3 install -e .



Create a New YAML Configuration File

We need to create our own configuration file under BIPIA/config/xxx.yaml. I named mine gemma_2_9b.yaml.

  • api_key: I filled in something random, keeping it in the same format as an OpenAI request
  • api_type: It seems necessary to fill in openai, so BIPIA will send requests in this format
  • model: My vLLM needs this name to send requests to the model service
  • api_base: The location of my vLLM service
  • llm_name: "gpt4", this name is fixed and does not actually send requests to OpenAI
api_key: "test"
api_type: "openai"
model: "/workspace/llm_model"
api_base: "http://192.168.1.78:4896/v1"
chat: True
llm_name: "gpt4"



Test Example Program (from official demo.ipynb)

First, import the necessary packages.

from bipia.data import AutoPIABuilder
from bipia.model import AutoLLM

from functools import partial
import jsonlines
from pathlib import Path

from datasets import Dataset
from accelerate import Accelerator



Confirm the data and parameters to be used.

# dataset args
seed = 2023 # fix the seed as 2023 to reinplement the same results in our paper
dataset_name = "email" # "code", "qa", "abstract", "table" for other subsets
context_data_file = "./BIPIA/benchmark/email/test.jsonl"
attack_data_file = "./BIPIA/benchmark/text_attack_test.json" # for emailQA task use text attacks

# model args
tensor_parallel_size = 1
llm_config_file = "./BIPIA/config/gemma_2_9b.yaml"

# output args
output_path = "./BIPIA/output/vicuna_7b.jsonl"



Create the dataset.

pia_builder = AutoPIABuilder.from_name(dataset_name)(seed)
pia_samples = pia_builder(
    context_data_file,
    attack_data_file,
    enable_stealth=False,
)

pia_dataset = Dataset.from_pandas(pia_samples)
pia_dataset[0]



Prepare the model object to be used, as well as the data for testing.

accelerator = Accelerator()

llm = AutoLLM.from_name(llm_config_file)(
    config=llm_config_file,
    accelerator=accelerator,
    tensor_parallel_size=tensor_parallel_size,
)

def rename_target(example):
    example["target"] = example["ideal"]
    return example

with accelerator.main_process_first():
    processed_datasets = pia_dataset.map(
        rename_target,
        desc="Processing Indirect PIA datasets (Rename target).",
    )

    processed_datasets = processed_datasets.map(
        partial(
            llm.process_fn,
            prompt_construct_fn=partial(
                pia_builder.construct_prompt,
                require_system_prompt=llm.require_system_prompt,
                ign_guidance=""
            ),
        ),
        desc="Processing Indirect PIA datasets.",
    )



Here's a slightly tricky part: The Chat Template for Gemma-2 does not support the system role! Therefore, many test datasets cannot be automatically formatted by Gemma-2’s Chat Template (they can't pass the Jinja step).

However, this evaluation script needs to be compared with models like Llama-3 that do support the system role, so I had to add a condition here to convert data that doesn't support the system role into a version with only the user role:

NOT_SUPPORT_SYSTEM_ROLE = True


def insert_user_assistant_replace_system(example):
    new_system_prompt = example["message"][0]
    new_system_prompt["role"] = "user"

    new_assistant_prompt = {
        "role": "assistant",
        "content": "ok!",
    }

    new_message = [new_system_prompt, new_assistant_prompt] + example["message"][1:]
    example["message"] = new_message

    return example

if NOT_SUPPORT_SYSTEM_ROLE:
    processed_datasets = processed_datasets.map(insert_user_assistant_replace_system)


In simple terms, I will convert the original:

[
{
"role": "system",
"content": "SYSTEM INSTRUCTIONS..."
},
{
"role": "user",
"content": "USER QUESTION..."
}
]


Into:

[
{
"role": "user",
"content": "SYSTEM INSTRUCTIONS..."
},
{
"role": "assistant",
"content": "ok!"
},
{
"role": "user",
"content": "USER QUESTION..."
}
]

Based on some simple tests I conducted, this method works just as well for models that do not support the system role, so I proceeded with the tests this way.


Next comes the longest part: Generating model responses.

responses = llm.generate(processed_datasets)


After generation, we can save the data and proceed with evaluation:

out = []
for attack_name, task_name, target, response, message, position in zip(
    processed_datasets["attack_name"],
    processed_datasets["task_name"],
    processed_datasets["target"],
    responses,
    processed_datasets["message"],
    processed_datasets["position"],
):
    out.append(
        {
            "attack_name": attack_name,
            "task_name": task_name,
            "response": response,
            "message": message,
            "target": target,
            "position": position,
        }
    )

output_path = Path(output_path)
output_path.parent.mkdir(exist_ok=True, parents=True)

with jsonlines.open(output_path, "w") as writer:
    writer.write_all(out)


from bipia.metrics import BipiaEvalFactory
from datasets import load_dataset
import numpy as np


# add your openai API key / Azure AOAI key in the ./config/gpt35.yaml to get my_gpt35.yaml
gpt_config_file = "./BIPIA/config/gemma_2_9b.yaml"
attacks = [f"Emoji Substitution-{i}" for i in range(5)]

ds = load_dataset("json", data_files="./BIPIA/output/vicuna_7b.jsonl", split="train")
ds = ds.filter(lambda x: x["attack_name"] in attacks)

evaluator = BipiaEvalFactory(
    gpt_config=gpt_config_file,
    activate_attacks=attacks,
)

asrs = evaluator.add_batch(
    predictions=ds["response"],
    references=ds["target"],
    attacks=ds["attack_name"],
    tasks=ds["task_name"],
)

avg_asr = np.mean(asrs)
print("The average ASR of Emoji Substitution is: ", avg_asr)


Output:

The average ASR of Emoji Substitution is:  0.113


On my fine-tuned Gemma-2-9B, the Attack Success Rate (ASR) is much lower than that of Vicuna-7B (official test was 0.824). Gemma-2-9B really is built on a solid foundation! It almost feels like an unfair advantage, haha.

Next, I plan to test different models and datasets, but I may refer to BIPIA's process and structure, refactor the source code, and then replace the usage of the old OpenAI package version with the new version, so it can still be compatible with the RAG evaluation code on my RAGAS side.


References


Read More

Leave a Reply