Skip to content

Using OpenAI Moderation Endpoint To Dectect Not Moderation Content

Introduction

Moderation endpoint is a free tool that developed by OpenAI. It is design for detecting “Not Mode-rational Content” or “Sensitive Content“. The more detailed list you can refer https://openai.com/policies/usage-policies.

The tool is more supportive of English, and may not be as effective for other languages.

User can use this tool to detect the not moderation content and process it, such as filtering.

The detecting categories of moderation endpoint as follows:

CATEGORYDESCRIPTION
hateContent that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is not covered by this category.
hate/threateningHateful content that also includes violence or serious harm towards the targeted group.
self-harmContent that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
sexualContent meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
sexual/minorsSexual content that includes an individual who is under 18 years old.
violenceContent that promotes or glorifies violence or celebrates the suffering or humiliation of others.
violence/graphicViolent content that depicts death, violence, or serious physical injury in extreme graphic detail.


Usage

curl https://api.openai.com/v1/moderations \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"input": "Sample text goes here"}'
import requests
import os

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",
}

data = {
    "input": "I want to kill them."
}

response = requests.post("https://api.openai.com/v1/moderations", headers=headers, json=data)

# Print the responseprint(response.json())

output:

{'id': 'modr-7PTCmG5D6bTT5hTbBwJEOjVoGfBa0',
 'model': 'text-moderation-004',
 'results': [{'flagged': True,
   'categories': {'sexual': False,
    'hate': False,
    'violence': True,
    'self-harm': False,
    'sexual/minors': False,
    'hate/threatening': False,
    'violence/graphic': False},
   'category_scores': {'sexual': 9.530887e-07,
    'hate': 0.18386647,
    'violence': 0.8870859,
    'self-harm': 1.7594473e-09,
    'sexual/minors': 1.3112696e-08,
    'hate/threatening': 0.003258761,
    'violence/graphic': 3.173159e-08}}]}


As you can see, the violence value is so high, indicating that the classification of this tool is quite accurate. It feels like there is sentiment analytics model doing multi-labels classification behind the scenes.

However, for me, the most important thing is the usability of Chinese. Unfortunately, after several tests, it ended up being unusable.


References


Read More

Leave a Reply