Last Updated on 2021-03-29 by Clay
Scikit-Learn is a open source machine learning framework in Python. It has six domains:
Classification
- Check what type of our target.
- Application: Spam detection, Image identification
- Algorithm: SVM, Nearest neighbors, Random forest
Regression
- Predicts the continuous value attribute of the test data.
- Application: Drug response, Stock prices
- Algorithm: SVR, Ridge regression, Lasso
Clustering
- Automatically classify data into different clusters.
- Application: Customer segmentation, Grouping experiment outcomes
- Algorithm: K-Means、Spectral clustering、mean-shift
Dimensionality reduction
- Retrieve the number of random variables we need to consider.
- Application: Visualization, Increased efficiency
Model selection
- Compare and verify parameters and models
- Application: Improve accuracy by adjusting parameters
- Module: Grid search, Cross validation, Metrics
Preprocessing
- Feature extraction and normalization
- Application: Preprocessing, Feature extraction
Data Set
Scikit-Learn offers a wide range of small toy datasets for users to test their various models:
- Boston house-prices
- Iris
- Diabetes
- Digits
- Linnerud
- Wine
- Breast cancer Wisconsin
However, it must be noted that because these materials are too lightweight, them cannot really replace the data in the real world.
Next, I will organize the application of different models and data sets, and will continue to write down the Scikit-Learn introduction tutorial series.
Maybe refer to the algorithm map provided by Scikit-Learn official website: