在做 Logistic Regression 数据分析时不知道对应的数据集哪下载。此时,遇到了这个神奇网站,特在此做备份。
其他公共数据请查看: 数据分析公共数据集
I was looking for a list of Machine Learning datasets for comparing Logistic Regression model but I couldn’t find it easily. I spent some time curating it based on my need.
This post is collection of such datasets which you can download for your use.
1. Iris Dataset
The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
2. Titanic Dataset
Task is to use machine learning to create a model that predicts which passengers survived the Titanic shipwreck.
3. Bank Marketing Dataset
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed.
4. Haberman’s Survival Data Set
The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago’s Billings Hospital on the survival of patients who had undergone surgery for breast cancer.
5. Census Income dataset.
Predict whether income exceeds $50K/yr based on census data.
6. Wine Quality Data Set
Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests.
7. Credit Card Dataset
This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods.
8. Pima Indian Diabetes dataset
The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
Conclusion
I found this interesting post on Quora about how to find the required dataset from Kaggle :