Skip to content

使用 OpenAI Moderation Endpoint 偵測不適當內容

Last Updated on 2023-06-15 by Clay

介紹

Moderation 模型是一個 OpenAI 所提供的免費工具,用來審查所謂的『不適當內容』。詳細的禁止條例可以參考 https://openai.com/policies/usage-policies

目前此工具對英文的支援度較高,對其他語言可能相對沒那麼好用。

使用者可以透過這個工具辨識出不適當的內容並做出處理,比如過濾掉訊息。

moderation 過濾掉的訊息種類如下:

CATEGORYDESCRIPTION
hateContent that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is not covered by this category.
hate/threateningHateful content that also includes violence or serious harm towards the targeted group.
self-harmContent that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
sexualContent meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
sexual/minorsSexual content that includes an individual who is under 18 years old.
violenceContent that promotes or glorifies violence or celebrates the suffering or humiliation of others.
violence/graphicViolent content that depicts death, violence, or serious physical injury in extreme graphic detail.

使用方法

curl https://api.openai.com/v1/moderations \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"input": "Sample text goes here"}'
import requests
import os

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",
}

data = {
    "input": "I want to kill them."
}

response = requests.post("https://api.openai.com/v1/moderations", headers=headers, json=data)

# Print the responseprint(response.json())

output:

{'id': 'modr-7PTCmG5D6bTT5hTbBwJEOjVoGfBa0',
 'model': 'text-moderation-004',
 'results': [{'flagged': True,
   'categories': {'sexual': False,
    'hate': False,
    'violence': True,
    'self-harm': False,
    'sexual/minors': False,
    'hate/threatening': False,
    'violence/graphic': False},
   'category_scores': {'sexual': 9.530887e-07,
    'hate': 0.18386647,
    'violence': 0.8870859,
    'self-harm': 1.7594473e-09,
    'sexual/minors': 1.3112696e-08,
    'hate/threatening': 0.003258761,
    'violence/graphic': 3.173159e-08}}]}


可以看到,對於『暴力』的分類,預測數值是很高的,這項工具的分類是可以說是挺精確的,感覺背後就是一個做多標籤分類的情緒分析(sentiment analytics)模型。

不過對我而言最重要的就是『中文』的可用性了。可惜經過了幾次測試後,最後還是不堪使用。


References


Read More

Leave a Reply