使用 OpenAI Moderation Endpoint 偵測不適當內容

2023-06-15


Moderation 模型是一個 OpenAI 所提供的免費工具,用來審查所謂的『不適當內容』。詳細的禁止條例可以參考



moderation 過濾掉的訊息種類如下:

hateContent that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is not covered by this category.
hate/threateningHateful content that also includes violence or serious harm towards the targeted group.
self-harmContent that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
sexualContent meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
sexual/minorsSexual content that includes an individual who is under 18 years old.
violenceContent that promotes or glorifies violence or celebrates the suffering or humiliation of others.
violence/graphicViolent content that depicts death, violence, or serious physical injury in extreme graphic detail.


curl \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"input": "Sample text goes here"}'
import requests
import os

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",

data = {
    "input": "I want to kill them."

response ="", headers=headers, json=data)

# Print the responseprint(response.json())


{'id': 'modr-7PTCmG5D6bTT5hTbBwJEOjVoGfBa0',
 'model': 'text-moderation-004',
 'results': [{'flagged': True,
   'categories': {'sexual': False,
    'hate': False,
    'violence': True,
    'self-harm': False,
    'sexual/minors': False,
    'hate/threatening': False,
    'violence/graphic': False},
   'category_scores': {'sexual': 9.530887e-07,
    'hate': 0.18386647,
    'violence': 0.8870859,
    'self-harm': 1.7594473e-09,
    'sexual/minors': 1.3112696e-08,
    'hate/threatening': 0.003258761,
    'violence/graphic': 3.173159e-08}}]}

可以看到,對於『暴力』的分類,預測數值是很高的,這項工具的分類是可以說是挺精確的,感覺背後就是一個做多標籤分類的情緒分析(sentiment analytics)模型。



