Using OpenAI Moderation Endpoint To Dectect Not Moderation Content

Introduction

Moderation endpoint is a free tool that developed by OpenAI. It is design for detecting “Not Mode-rational Content” or “Sensitive Content“. The more detailed list you can refer https://openai.com/policies/usage-policies.

The tool is more supportive of English, and may not be as effective for other languages.

User can use this tool to detect the not moderation content and process it, such as filtering.

The detecting categories of moderation endpoint as follows:

CATEGORY	DESCRIPTION
hate	Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is not covered by this category.
hate/threatening	Hateful content that also includes violence or serious harm towards the targeted group.
self-harm	Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
sexual	Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
sexual/minors	Sexual content that includes an individual who is under 18 years old.
violence	Content that promotes or glorifies violence or celebrates the suffering or humiliation of others.
violence/graphic	Violent content that depicts death, violence, or serious physical injury in extreme graphic detail.

Usage

curl https://api.openai.com/v1/moderations \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"input": "Sample text goes here"}'

import requests
import os

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",
}

data = {
    "input": "I want to kill them."
}

response = requests.post("https://api.openai.com/v1/moderations", headers=headers, json=data)

# Print the responseprint(response.json())

output:

{'id': 'modr-7PTCmG5D6bTT5hTbBwJEOjVoGfBa0',
 'model': 'text-moderation-004',
 'results': [{'flagged': True,
   'categories': {'sexual': False,
    'hate': False,
    'violence': True,
    'self-harm': False,
    'sexual/minors': False,
    'hate/threatening': False,
    'violence/graphic': False},
   'category_scores': {'sexual': 9.530887e-07,
    'hate': 0.18386647,
    'violence': 0.8870859,
    'self-harm': 1.7594473e-09,
    'sexual/minors': 1.3112696e-08,
    'hate/threatening': 0.003258761,
    'violence/graphic': 3.173159e-08}}]}

As you can see, the violence value is so high, indicating that the classification of this tool is quite accurate. It feels like there is sentiment analytics model doing multi-labels classification behind the scenes.

However, for me, the most important thing is the usability of Chinese. Unfortunately, after several tests, it ended up being unusable.

Using OpenAI Moderation Endpoint To Dectect Not Moderation Content

Introduction

Usage

References

Read More

Related

Leave a ReplyCancel reply

Using OpenAI Moderation Endpoint To Dectect Not Moderation Content

Introduction

Usage

References

Read More

Share this:

Related

Leave a ReplyCancel reply