Last Updated on 2023-06-15 by Clay
Introduction
Moderation endpoint is a free tool that developed by OpenAI. It is design for detecting “Not Mode-rational Content” or “Sensitive Content“. The more detailed list you can refer https://openai.com/policies/usage-policies.
The tool is more supportive of English, and may not be as effective for other languages.
User can use this tool to detect the not moderation content and process it, such as filtering.
The detecting categories of moderation endpoint as follows:
CATEGORY | DESCRIPTION |
---|---|
hate | Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is not covered by this category. |
hate/threatening | Hateful content that also includes violence or serious harm towards the targeted group. |
self-harm | Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders. |
sexual | Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness). |
sexual/minors | Sexual content that includes an individual who is under 18 years old. |
violence | Content that promotes or glorifies violence or celebrates the suffering or humiliation of others. |
violence/graphic | Violent content that depicts death, violence, or serious physical injury in extreme graphic detail. |
Usage
curl https://api.openai.com/v1/moderations \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{"input": "Sample text goes here"}'
import requests
import os
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",
}
data = {
"input": "I want to kill them."
}
response = requests.post("https://api.openai.com/v1/moderations", headers=headers, json=data)
# Print the responseprint(response.json())
output:
{'id': 'modr-7PTCmG5D6bTT5hTbBwJEOjVoGfBa0',
'model': 'text-moderation-004',
'results': [{'flagged': True,
'categories': {'sexual': False,
'hate': False,
'violence': True,
'self-harm': False,
'sexual/minors': False,
'hate/threatening': False,
'violence/graphic': False},
'category_scores': {'sexual': 9.530887e-07,
'hate': 0.18386647,
'violence': 0.8870859,
'self-harm': 1.7594473e-09,
'sexual/minors': 1.3112696e-08,
'hate/threatening': 0.003258761,
'violence/graphic': 3.173159e-08}}]}
As you can see, the violence value is so high, indicating that the classification of this tool is quite accurate. It feels like there is sentiment analytics model doing multi-labels classification behind the scenes.
However, for me, the most important thing is the usability of Chinese. Unfortunately, after several tests, it ended up being unusable.
References
- https://platform.openai.com/docs/guides/moderation
- https://openai.com/blog/new-and-improved-content-moderation-tooling