Last Updated on 2023-06-15 by Clay
介紹
Moderation 模型是一個 OpenAI 所提供的免費工具,用來審查所謂的『不適當內容』。詳細的禁止條例可以參考 https://openai.com/policies/usage-policies。
目前此工具對英文的支援度較高,對其他語言可能相對沒那麼好用。
使用者可以透過這個工具辨識出不適當的內容並做出處理,比如過濾掉訊息。
moderation 過濾掉的訊息種類如下:
CATEGORY | DESCRIPTION |
---|---|
hate | Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is not covered by this category. |
hate/threatening | Hateful content that also includes violence or serious harm towards the targeted group. |
self-harm | Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders. |
sexual | Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness). |
sexual/minors | Sexual content that includes an individual who is under 18 years old. |
violence | Content that promotes or glorifies violence or celebrates the suffering or humiliation of others. |
violence/graphic | Violent content that depicts death, violence, or serious physical injury in extreme graphic detail. |
使用方法
curl https://api.openai.com/v1/moderations \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{"input": "Sample text goes here"}'
import requests
import os
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",
}
data = {
"input": "I want to kill them."
}
response = requests.post("https://api.openai.com/v1/moderations", headers=headers, json=data)
# Print the responseprint(response.json())
output:
{'id': 'modr-7PTCmG5D6bTT5hTbBwJEOjVoGfBa0',
'model': 'text-moderation-004',
'results': [{'flagged': True,
'categories': {'sexual': False,
'hate': False,
'violence': True,
'self-harm': False,
'sexual/minors': False,
'hate/threatening': False,
'violence/graphic': False},
'category_scores': {'sexual': 9.530887e-07,
'hate': 0.18386647,
'violence': 0.8870859,
'self-harm': 1.7594473e-09,
'sexual/minors': 1.3112696e-08,
'hate/threatening': 0.003258761,
'violence/graphic': 3.173159e-08}}]}
可以看到,對於『暴力』的分類,預測數值是很高的,這項工具的分類是可以說是挺精確的,感覺背後就是一個做多標籤分類的情緒分析(sentiment analytics)模型。
不過對我而言最重要的就是『中文』的可用性了。可惜經過了幾次測試後,最後還是不堪使用。
References
- https://platform.openai.com/docs/guides/moderation
- https://openai.com/blog/new-and-improved-content-moderation-tooling