Skip to content

[Keras] How to build a Multi-label Classification Model

Recently, I encountered a task to perform Multi-label Classification, and I realized that I had never trained a model in this task.

The difficulty of building a model usually ranges from “Binary Classification” to “Multi-classes Classification” to “Multi-labels Classification”. The multi-label classification is more in line with the situation we will encounter in reality.

In the simplest terms, suppose we have the following picture:

「風景 樹 山 房子」的圖片搜尋結果

The second category is to ask you: Is there any mountain in this picture?
The answer is: “Yes”. Choose one answer.

The multi-category is: Is the landscape of this picture “nature”, “ocean”, “outer space”, or “desert”?
The answer is: “Nature”. Among the multiple options, we choose the only correct answer.

The multi-label classification is: Is there a “mountain” in this picture? “house”? “tree”? “Alien”?
The answer is: there are “mountains”, there are “houses”, there are “trees”, and then-there are no “aliens”.

Below, I will use the classic MNIST data set for testing. In addition to judging the number, I also want to judge one more label “whether it is greater than 5”.

If you want to refer to the normal MNIST classification, maybe you can refer to: [Keras] Use CNN to build a simple classifier to MNIST.

If you want to study the syntax of Keras, you can refer to: https://keras.io/ .


Wrong Method

At the beginning, I want to say that I have multiple labels, so what is the probability of using Softmax to predict each label? however, the result of my random trial happened to be infeasible.

It is stated in advance that what is recorded here is the wrong process. If you are not interested, you can refer to the sigmoid chapter at the back.

# -*- coding: utf-8 -*-
import numpy as np
from keras.models import Sequential, load_model
from keras.layers import Dense, Flatten, Conv2D, MaxPool2D
from keras.utils import np_utils
from keras.datasets import mnist
import matplotlib.pyplot as plt



First, import all the packages we need.

# Mnist Dataset
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
x_train = X_train.reshape(60000, 1, 28, 28)/255
x_test = X_test.reshape(10000, 1, 28, 28)/255

y_train = np_utils.to_categorical(Y_train).astype(int).tolist()
y_test = np_utils.to_categorical(Y_test).astype(int).tolist()


for n in range(len(y_train)):
    if y_train[n].index(1) < 5:
        y_train[n].append(0)
    else:
        y_train[n].append(1)

for n in range(len(y_test)):
    if y_test[n].index(1) < 5:
        y_test[n].append(0)
    else:
        y_test[n].append(1)


y_train = np.array(y_train)
y_test = np.array(y_test)



This time, I added a value after the label of one-hot: If the answer of label is greater than 5, then I will mark 1; otherwise, I will mark 0.

In this way, I not only have to predict the previous classification, but also determine whether it is greater than 5 in the end, forming a multi-label classification.

# Model Structure
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=3, input_shape=(1, 28, 28), activation='relu', padding='same'))
model.add(MaxPool2D(pool_size=2, data_format='channels_first'))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(11, activation='softmax'))
print(model.summary())



This is my model architecture. It is worth noting that the activation function (activation) I finally output is set to softmax.

# Train
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=64, verbose=1)



We start to train.

# Test
loss, accuracy = model.evaluate(x_test, y_test)
print('Test:')
print('Loss: %s\nAccuracy: %s' % (loss, accuracy))

# Save model
model.save('./CNN_Mnist.h5')

# Load Model
model = load_model('./CNN_Mnist.h5')

# Display
def image_predict(model, n):
    predict = model.predict(x_test)
    print('Answer:', Y_test[n])

    plt.plot(predict[n])
    plt.show()

    plt.imshow(X_test[n], cmap='gray')
    plt.show()


if __name__ == '__main__':
    image_predict(model, 9)



After training, I saved the model we trained. If I don’t plan to use it again, it doesn’t matter if I skip it or not; then I predicted the picture with index equal to 9 in our test data.

Output:

Test:
Loss: 0.07145553915500641
Accuracy: 0.9517456293106079
Answer: 9

At first glance, Accuracy is very high. It seems that the probability value of softmax multi-label classification is also very accurate. The number is “9” and the last marked “case greater than 5” is 1.

At first glance they were all correct, but after thinking about it, I found something wrong.

Although I use MNIST to test multi-label classification, there are many, very many label classifications that I really want to complete! Should I set a “threshold” to judge whether my data really has a “tag”?

Moreover, the number of tags for each material is not consistent, some may have only one tag, and some may have ten tags at one go.

In this way, how do I set the threshold?

So I found my wrong point. What I want is for each label to have its own independent probability value to judge whether my data really has a certain label.

So, what I need is not SOTFMAX, but SIGMOID.


Sigmoid

The following is the complete code after I changed it:

# -*- coding: utf-8 -*-
import numpy as np
from keras.models import Sequential, load_model
from keras.layers import Dense, Flatten, Conv2D, MaxPool2D
from keras.utils import np_utils
from keras.datasets import mnist
import matplotlib.pyplot as plt


# Mnist Dataset
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
x_train = X_train.reshape(60000, 1, 28, 28)/255
x_test = X_test.reshape(10000, 1, 28, 28)/255

y_train = np_utils.to_categorical(Y_train).astype(int).tolist()
y_test = np_utils.to_categorical(Y_test).astype(int).tolist()


for n in range(len(y_train)):
    if y_train[n].index(1) < 5:
        y_train[n].append(0)
    else:
        y_train[n].append(1)

for n in range(len(y_test)):
    if y_test[n].index(1) < 5:
        y_test[n].append(0)
    else:
        y_test[n].append(1)


y_train = np.array(y_train)
y_test = np.array(y_test)


# Model Structure
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=3, input_shape=(1, 28, 28), activation='relu', padding='same'))
model.add(MaxPool2D(pool_size=2, data_format='channels_first'))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(11, activation='sigmoid'))
print(model.summary())

# Train
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=64, verbose=1)

# Test
loss, accuracy = model.evaluate(x_test, y_test)
print('Test:')
print('Loss: %s\nAccuracy: %s' % (loss, accuracy))

# Save model
model.save('./CNN_Mnist.h5')

# Load Model
model = load_model('./CNN_Mnist.h5')

# Display
def image_predict(model, n):
    predict = model.predict(x_test)
    print('Answer:', Y_test[n])

    plt.plot(predict[n])
    plt.show()

    plt.imshow(X_test[n], cmap='gray')
    plt.show()


if __name__ == '__main__':
    image_predict(model, 9)



Output:

Test:
Loss: 0.013495276327256578
Accuracy: 0.995473325252533
Answer: 9

This time we can see: the output of “9” and “greater than 5” are both 1, and all others are 0, which means that our output values ​​are independent. This is the multi-label classification we want.

Leave a Reply