Skip to content

[Scikit-Learn] Tutorial (0) What is “Scikit-Learn”

Scikit-Learn is a open source machine learning framework in Python. It has six domains:


Classification

  • Check what type of our target.
  • Application: Spam detection, Image identification
  • Algorithm: SVM, Nearest neighbors, Random forest


Regression

  • Predicts the continuous value attribute of the test data.
  • Application: Drug response, Stock prices
  • Algorithm: SVR, Ridge regression, Lasso


Clustering

  • Automatically classify data into different clusters.
  • Application: Customer segmentation, Grouping experiment outcomes
  • Algorithm: K-Means、Spectral clustering、mean-shift


Dimensionality reduction

  • Retrieve the number of random variables we need to consider.
  • Application: Visualization, Increased efficiency


Model selection

  • Compare and verify parameters and models
  • Application: Improve accuracy by adjusting parameters
  • Module: Grid search, Cross validation, Metrics


Preprocessing

  • Feature extraction and normalization
  • Application: Preprocessing, Feature extraction


Data Set

Scikit-Learn offers a wide range of small toy datasets for users to test their various models:

  • Boston house-prices
  • Iris
  • Diabetes
  • Digits
  • Linnerud
  • Wine
  • Breast cancer Wisconsin

However, it must be noted that because these materials are too lightweight, them cannot really replace the data in the real world.

Next, I will organize the application of different models and data sets, and will continue to write down the Scikit-Learn introduction tutorial series.

Maybe refer to the algorithm map provided by Scikit-Learn official website:

Leave a Reply