Investigating Kaggle’s Titanic Dataset

The goal of this notebook is to try a few things for the Kaggle’s Titanic: Machine Learning from Disaster dataset. You can find the dataset and some information here. I use Python for this project. First, let’s start by reading the dataset to see what we have:

import pandas as pd
directory = '../../Datasets/Titanic/'
titanic_train = pd.read_csv(directory + 'train.csv')
titanic_test = pd.read_csv(directory + 'test.csv')
titanic_train.info()
titanic_test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 66.2+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 11 columns):
PassengerId    418 non-null int64
Pclass         418 non-null int64
Name           418 non-null object
Sex            418 non-null object
Age            332 non-null float64
SibSp          418 non-null int64
Parch          418 non-null int64
Ticket         418 non-null object
Fare           417 non-null float64
Cabin          91 non-null object
Embarked       418 non-null object
dtypes: float64(2), int64(4), object(5)
memory usage: 27.8+ KB

There are some missing data in Age, Cabin, and Fare. Let’s combine the training and test data and fill the missing data. For now, I’ll use simple methods. We can improve them later.

titanic_train_test = [titanic_train, titanic_test]
for dataset in titanic_train_test:
    dataset['Age'].fillna(dataset.Age.median(), inplace=True)
    dataset['Cabin'].fillna('U', inplace=True)
    dataset['Embarked'].fillna('S', inplace=True)
    dataset['Fare'].fillna(dataset.Fare.mean(), inplace=True)
titanic_train.info()
titanic_test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
PassengerId       891 non-null int64
Survived          891 non-null int64
Pclass            891 non-null int64
Name              891 non-null object
Sex               891 non-null object
Age               891 non-null float64
SibSp             891 non-null int64
Parch             891 non-null int64
Ticket            891 non-null object
Fare              891 non-null float64
Cabin             891 non-null object
Embarked          891 non-null object
FamilySize        891 non-null int64
CategoricalAge    891 non-null category
Title             891 non-null object
dtypes: category(1), float64(2), int64(6), object(6)
memory usage: 77.6+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 11 columns):
PassengerId    418 non-null int64
Pclass         418 non-null int64
Name           418 non-null object
Sex            418 non-null object
Age            418 non-null float64
SibSp          418 non-null int64
Parch          418 non-null int64
Ticket         418 non-null object
Fare           418 non-null float64
Cabin          418 non-null object
Embarked       418 non-null object
dtypes: float64(2), int64(4), object(5)
memory usage: 27.8+ KB

I’m going to define a few more columns for now. I define FamilySize (based on SibSp:# of siblings / spouses aboard the Titanic and Parch (of parents / children aboard the Titanic).

for dataset in titanic_train_test:
    dataset['FamilySize'] = dataset.SibSp + dataset.Parch
import matplotlib.pyplot as plt
%matplotlib inline  
titanic_train.groupby('Age').count().PassengerId.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x8752690>

png

It’s better to group the ages so we have a better understanding of the groups.

for dataset in titanic_train_test:
    dataset['AgeRange'], AgeBins = pd.cut(dataset['Age'], 10, retbins=True)
titanic_train.groupby('AgeRange').count().PassengerId
AgeRange
(0.34, 8.378]        54
(8.378, 16.336]      46
(16.336, 24.294]    177
(24.294, 32.252]    346
(32.252, 40.21]     118
(40.21, 48.168]      70
(48.168, 56.126]     45
(56.126, 64.084]     24
(64.084, 72.042]      9
(72.042, 80.0]        2
Name: PassengerId, dtype: int64
titanic_train.Name.head(10)
0                              Braund, Mr. Owen Harris
1    Cumings, Mrs. John Bradley (Florence Briggs Th...
2                               Heikkinen, Miss. Laina
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                             Allen, Mr. William Henry
5                                     Moran, Mr. James
6                              McCarthy, Mr. Timothy J
7                       Palsson, Master. Gosta Leonard
8    Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
9                  Nasser, Mrs. Nicholas (Adele Achem)
Name: Name, dtype: object

We can extract titles and then use it instead of Sex. Actually, there is a clear correlation between Sex and Title (Male cannot use Mrs for title). So, it doesn’t make sense to keep both. As the same time, I think Title shows more information than a binary Sex variable.

for dataset in titanic_train_test:
    dataset['Title'] = dataset['Name'].map(lambda name: name.split(',')[1].split('.')[0].strip())
titanic_train.Title.head(10)
0        Mr
1       Mrs
2      Miss
3       Mrs
4        Mr
5        Mr
6        Mr
7    Master
8       Mrs
9       Mrs
Name: Title, dtype: object
plt.subplot(2,1,1)
titanic_train.groupby('Title').count().PassengerId.plot()
plt.subplot(2,1,2)
titanic_test.groupby('Title').count().PassengerId.plot()
<matplotlib.axes._subplots.AxesSubplot at 0xb477910>

png

It make sense to just keep a few titles and get rid of the rare cases.

Title_Dictionary = {
    "Capt": "Rare",
    "Col": "Rare",
    "Major": "Rare",
    "Jonkheer": "Rare",
    "Don": "Rare",
    "Sir": "Rare",
    "Dr": "Rare",
    "Rev": "Rare",
    "the Countess": "Rare",
    "Dona": "Rare",
    "Mme": "Mrs",
    "Mlle": "Miss",
    "Ms": "Mrs",
    "Mr": "Mr",
    "Mrs": "Mrs",
    "Miss": "Miss",
    "Master": "Master",
    "Lady": "Rare"
}
for dataset in titanic_train_test:
    dataset['Title'] = dataset['Title'].map(Title_Dictionary)
titanic_train.groupby('Title').count().PassengerId.plot()
titanic_test.groupby('Title').count().PassengerId.plot()
<matplotlib.axes._subplots.AxesSubplot at 0xb1095f0>

png

Let’s see what we can get from last name. Basically, if we can use the same last name to distinguish people from the same family, it can be useful. The problem is that some last names are common and we should not use them (see max value is 9 in the training data).

for dataset in titanic_train_test:
    dataset['LastName'] = dataset['Name'].map(lambda name: name.split(',')[0].strip())
titanic_train.groupby('LastName').count().PassengerId.describe()

count    667.000000
mean       1.335832
std        0.854922
min        1.000000
25%        1.000000
50%        1.000000
75%        1.000000
max        9.000000
Name: PassengerId, dtype: float64
titanic_test.groupby('LastName').count().PassengerId.describe()
count    352.000000
mean       1.187500
std        0.505314
min        1.000000
25%        1.000000
50%        1.000000
75%        1.000000
max        4.000000
Name: PassengerId, dtype: float64

Let’s investigate what we have from tickets, cabins, and fares. Remeber, ‘U’ in Cabin means unknown. The 687 value in the training data is for that one. For ticket, things are better. The max number of items with the same ticket is 7. Maybe we can use it to find groups. Obviously, same fare does not mean anything is most cases. But, it we have very few persons with the same fare, maybe they got their tickets together.

titanic_train.groupby('Cabin').count().PassengerId.describe()
count    148.000000
mean       6.020270
std       56.360775
min        1.000000
25%        1.000000
50%        1.000000
75%        2.000000
max      687.000000
Name: PassengerId, dtype: float64
titanic_train.groupby('Ticket').count().PassengerId.describe()
count    681.000000
mean       1.308370
std        0.792652
min        1.000000
25%        1.000000
50%        1.000000
75%        1.000000
max        7.000000
Name: PassengerId, dtype: float64
titanic_train.groupby('Fare').count().PassengerId.describe()
count    248.000000
mean       3.592742
std        5.848930
min        1.000000
25%        1.000000
50%        2.000000
75%        4.000000
max       43.000000
Name: PassengerId, dtype: float64

I think we can find the frequencies of things now to see what are the distributions.

for dataset in titanic_train_test:
    for col in ['Ticket', 'Cabin', 'Fare', 'LastName']:
        freq_col = f'Freq{col}'

        freq = dataset[col].value_counts().to_frame()
        freq.columns = [freq_col]

        dataset[freq_col] = dataset.merge(freq, how='left', left_on=col, right_index=True)[freq_col]
titanic_train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 22 columns):
PassengerId       891 non-null int64
Survived          891 non-null int64
Pclass            891 non-null int64
Name              891 non-null object
Sex               891 non-null object
Age               891 non-null float64
SibSp             891 non-null int64
Parch             891 non-null int64
Ticket            891 non-null object
Fare              891 non-null float64
Cabin             891 non-null object
Embarked          891 non-null object
FamilySize        891 non-null int64
CategoricalAge    891 non-null category
Title             891 non-null object
AgeCat            891 non-null category
AgeRange          891 non-null category
LastName          891 non-null object
FreqTicket        891 non-null int64
FreqCabin         891 non-null int64
FreqFare          891 non-null int64
FreqLastName      891 non-null int64
dtypes: category(3), float64(2), int64(10), object(7)
memory usage: 111.0+ KB
titanic_train.groupby('FreqTicket').count().PassengerId.plot()
<matplotlib.axes._subplots.AxesSubplot at 0xb5b9e50>

png

titanic_train.groupby('FreqCabin').count().PassengerId
FreqCabin
1      101
2       76
3       15
4       12
687    687
Name: PassengerId, dtype: int64
titanic_train.groupby('FreqLastName').count().PassengerId.plot()
<matplotlib.axes._subplots.AxesSubplot at 0xb749b10>

png

titanic_train.groupby('FreqFare').count().PassengerId.plot()
<matplotlib.axes._subplots.AxesSubplot at 0xb806e90>

png

Now, we can group things together. I’m going to use FamilySize first, then FreqTicket, then FreqCabin and then FreqLastName. If there is nothing, then the passanger was alone.

def groupify(x):
    max_group = 5
    if x['FamilySize'] > 0:
        return x['FamilySize']
    elif x['FreqTicket'] > 1:
        return x['FreqTicket']
    elif x['FreqCabin'] > 1 and x['Cabin'] != 'U':
        return x['FreqCabin']
    elif 1 < x['FreqLastName'] < max_group:
        return x['FreqLastName']
    elif 1 < x['FreqFare'] < max_group:
        return x['FreqFare']
    else:
        return 0
for dataset in titanic_train_test:
    dataset['GroupSize'] = dataset.apply(groupify, axis=1)
titanic_train.groupby('GroupSize').count().PassengerId.plot()
<matplotlib.axes._subplots.AxesSubplot at 0xb89b470>

png

Let’s see what we got finally:

print(titanic_train[['Pclass', 'Survived']].groupby(['Pclass'], as_index=False).mean())
print()
print(titanic_train[['GroupSize', 'Survived']].groupby(['GroupSize'], as_index=False).mean())
print()
print(titanic_train[['Embarked', 'Survived']].groupby(['Embarked'], as_index=False).mean())
print()
print(titanic_train[['AgeRange', 'Survived']].groupby(['AgeRange'], as_index=False).mean())
print()
print(titanic_train[['Title', 'Survived']].groupby(['Title'], as_index=False).mean())
print()

   Pclass  Survived
0       1  0.629630
1       2  0.472826
2       3  0.242363

   GroupSize  Survived
0          0  0.241486
1          1  0.552795
2          2  0.439614
3          3  0.594937
4          4  0.384615
5          5  0.125000
6          6  0.333333
7          7  0.384615
8         10  0.000000

  Embarked  Survived
0        C  0.553571
1        Q  0.389610
2        S  0.339009

           AgeRange  Survived
0     (0.34, 8.378]  0.666667
1   (8.378, 16.336]  0.413043
2  (16.336, 24.294]  0.355932
3  (24.294, 32.252]  0.338150
4   (32.252, 40.21]  0.440678
5   (40.21, 48.168]  0.342857
6  (48.168, 56.126]  0.466667
7  (56.126, 64.084]  0.375000
8  (64.084, 72.042]  0.000000
9    (72.042, 80.0]  0.500000

    Title  Survived
0  Master  0.575000
1    Miss  0.701087
2      Mr  0.156673
3     Mrs  0.795276
4    Rare  0.347826

Let’s remove the ununsed columns and clean the final dataset.

y = titanic_train['Survived']
titanic_train.drop(['Survived'], axis=1, inplace=True)
for dataset in titanic_train_test:
    dataset['Title'] = dataset['Title'].map({"Mr": 1, "Miss": 2, "Mrs": 3, "Master": 4, "Rare": 5}).astype(int)

    dataset['Embarked'] = dataset['Embarked'].map({'S': 0, 'C': 1, 'Q': 2}).astype(int)

    for AgeGroup in range(0, len(AgeBins)):
        if AgeGroup == len(AgeBins) - 1:
            dataset.loc[dataset['Age'] > AgeBins[AgeGroup], 'Age'] = AgeGroup
        else:
            dataset.loc[
                (dataset['Age'] > AgeBins[AgeGroup]) & (dataset['Age'] <= AgeBins[AgeGroup + 1]), 'Age'] = AgeGroup

    dataset["Pclass"] = dataset["Pclass"].astype('int')

    # Sex & Title have correclation. We keep Title.
    for col in dataset.columns:
        if col not in ['Pclass', 'Age', 'Embarked', 'Title', 'GroupSize']:
            dataset.drop([col], inplace=True, axis=1)
    for col in dataset.columns:
        dataset[col] = dataset[col].astype("category")
titanic_train.columns
Index(['Pclass', 'Age', 'Embarked', 'Title', 'GroupSize'], dtype='object')

As you can see, I almost remove all columns and kept the very few. I beleive that other columns are very related to the above ones and as such there will be correlation between them.

I’m going to make binary variables from these columns.

titanic_train = pd.get_dummies(titanic_train, columns=None)
titanic_test = pd.get_dummies(titanic_test, columns=None)
titanic_train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 31 columns):
Pclass_1        891 non-null uint8
Pclass_2        891 non-null uint8
Pclass_3        891 non-null uint8
Age_0.0         891 non-null uint8
Age_1.0         891 non-null uint8
Age_2.0         891 non-null uint8
Age_3.0         891 non-null uint8
Age_4.0         891 non-null uint8
Age_5.0         891 non-null uint8
Age_6.0         891 non-null uint8
Age_7.0         891 non-null uint8
Age_8.0         891 non-null uint8
Age_9.0         891 non-null uint8
Age_10.0        891 non-null uint8
Embarked_0      891 non-null uint8
Embarked_1      891 non-null uint8
Embarked_2      891 non-null uint8
Title_1         891 non-null uint8
Title_2         891 non-null uint8
Title_3         891 non-null uint8
Title_4         891 non-null uint8
Title_5         891 non-null uint8
GroupSize_0     891 non-null uint8
GroupSize_1     891 non-null uint8
GroupSize_2     891 non-null uint8
GroupSize_3     891 non-null uint8
GroupSize_4     891 non-null uint8
GroupSize_5     891 non-null uint8
GroupSize_6     891 non-null uint8
GroupSize_7     891 non-null uint8
GroupSize_10    891 non-null uint8
dtypes: uint8(31)
memory usage: 27.0 KB

To make sure we are using the same feature sets for both train and test, I need to clean the dataset a little more.

missing_cols = set(titanic_train.columns) - set(titanic_test.columns)
for c in missing_cols:
    titanic_test[c] = 0
missing_cols = set(titanic_test.columns) - set(titanic_train.columns)
for c in missing_cols:
    titanic_test[c] = 0
X_train, y_train = titanic_train, y
X_test = titanic_test

I’m going to try a few classifiers with different paramters (cross validation)

from sklearn.model_selection import StratifiedShuffleSplit

# Set the parameters by cross-validation
cv = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=0)
from sklearn.model_selection import GridSearchCV
from sklearn import svm
import numpy as np

# run svm
C_range = np.logspace(-3, 3, 7)
gamma_range = np.logspace(-3, 3, 7)
param_grid = dict(gamma=gamma_range, C=C_range)
svm_model = GridSearchCV(svm.SVC(), param_grid=param_grid, cv=cv)
svm_model.fit(X_train, y_train)
GridSearchCV(cv=StratifiedShuffleSplit(n_splits=5, random_state=0, test_size=0.2,
            train_size=None),
       error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'gamma': array([  1.00000e-03,   1.00000e-02,   1.00000e-01,   1.00000e+00,
         1.00000e+01,   1.00000e+02,   1.00000e+03]), 'C': array([  1.00000e-03,   1.00000e-02,   1.00000e-01,   1.00000e+00,
         1.00000e+01,   1.00000e+02,   1.00000e+03])},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=0)
print("[SVM] The best parameters are %s with a score of %0.2f"
      % (svm_model.best_params_, svm_model.best_score_))
[SVM] The best parameters are {'C': 1.0, 'gamma': 0.10000000000000001} with a score of 0.82 ​    

Now, let’s try Multi-layer Perceptron.

from sklearn.neural_network import MLPClassifier

# MLP
alpha_range = np.logspace(-3, 3, 7)
param_grid = dict(alpha=alpha_range)
mlp = GridSearchCV(MLPClassifier(solver='lbfgs'), param_grid=param_grid, cv=cv)
mlp.fit(X_train, y_train)
GridSearchCV(cv=StratifiedShuffleSplit(n_splits=5, random_state=0, test_size=0.2,
            train_size=None),
       error_score='raise',
       estimator=MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='lbfgs', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'alpha': array([  1.00000e-03,   1.00000e-02,   1.00000e-01,   1.00000e+00,
         1.00000e+01,   1.00000e+02,   1.00000e+03])},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=0)
print("[MLP] The best parameters are %s with a score of %0.2f"
      % (mlp.best_params_, mlp.best_score_))
[MLP] The best parameters are {'alpha': 10.0} with a score of 0.81 ​    
from sklearn.tree import DecisionTreeClassifier

# Tree
max_depth_range = np.linspace(10, 15, 6).astype(int)
min_samples_split_range = np.linspace(2, 5, 4).astype(int)
param_grid = dict(max_depth=max_depth_range, min_samples_split=min_samples_split_range)
clf = GridSearchCV(DecisionTreeClassifier(), param_grid=param_grid, cv=cv)
clf.fit(X_train, y_train)

GridSearchCV(cv=StratifiedShuffleSplit(n_splits=5, random_state=0, test_size=0.2,
            train_size=None),
       error_score='raise',
       estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'max_depth': array([10, 11, 12, 13, 14, 15]), 'min_samples_split': array([2, 3, 4, 5])},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=0)
print("[TREE] The best parameters are %s with a score of %0.2f"
      % (clf.best_params_, clf.best_score_))
[TREE] The best parameters are {'max_depth': 10, 'min_samples_split': 3} with a score of 0.81 ​    
from sklearn.ensemble import RandomForestClassifier

# Random Forest
param_grid = {"n_estimators": [250, 300],
              "criterion": ["gini", "entropy"],
              "max_depth": [10, 15, 20],
              "min_samples_split": [2, 3, 4]}
cv = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=0)
forest = GridSearchCV(RandomForestClassifier(), param_grid=param_grid, cv=cv, verbose=2)
forest.fit(X_train, y_train)

Fitting 5 folds for each of 36 candidates, totalling 180 fits
[CV] criterion=gini, max_depth=10, min_samples_split=2, n_estimators=250 
[CV]  criterion=gini, max_depth=10, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=2, n_estimators=250 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.5s remaining:    0.0s ​    

[CV]  criterion=gini, max_depth=10, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=2, n_estimators=250 
[CV]  criterion=gini, max_depth=10, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=2, n_estimators=250 
[CV]  criterion=gini, max_depth=10, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=2, n_estimators=250 
[CV]  criterion=gini, max_depth=10, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=2, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=10, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=2, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=10, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=2, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=10, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=2, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=10, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=2, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=10, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=10, min_samples_split=3, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=10, min_samples_split=3, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=10, min_samples_split=3, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=10, min_samples_split=3, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=10, min_samples_split=3, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=3, n_estimators=300, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=3, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=10, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=3, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=10, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=3, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=10, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=3, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=10, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=10, min_samples_split=4, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=10, min_samples_split=4, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=10, min_samples_split=4, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=10, min_samples_split=4, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=10, min_samples_split=4, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=10, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=4, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=10, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=4, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=10, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=4, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=10, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=4, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=10, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=10, min_samples_split=4, n_estimators=300, total=   0.7s
[CV] criterion=gini, max_depth=15, min_samples_split=2, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=15, min_samples_split=2, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=15, min_samples_split=2, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=15, min_samples_split=2, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=15, min_samples_split=2, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=15, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=2, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=15, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=2, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=15, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=2, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=15, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=2, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=15, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=2, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=15, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=3, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=15, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=3, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=15, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=3, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=15, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=3, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=15, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=3, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=15, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=3, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=15, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=3, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=15, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=3, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=15, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=3, n_estimators=300, total=   0.7s
[CV] criterion=gini, max_depth=15, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=3, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=15, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=4, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=15, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=4, n_estimators=250, total=   0.6s
[CV] criterion=gini, max_depth=15, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=4, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=15, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=4, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=15, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=15, min_samples_split=4, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=15, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=4, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=15, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=4, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=15, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=4, n_estimators=300, total=   0.7s
[CV] criterion=gini, max_depth=15, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=4, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=15, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=15, min_samples_split=4, n_estimators=300, total=   0.6s
[CV] criterion=gini, max_depth=20, min_samples_split=2, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=20, min_samples_split=2, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=20, min_samples_split=2, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=20, min_samples_split=2, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=20, min_samples_split=2, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=2, n_estimators=250, total=   0.5s
[CV] criterion=gini, max_depth=20, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=2, n_estimators=300, total=   0.7s
[CV] criterion=gini, max_depth=20, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=2, n_estimators=300, total=   0.7s
[CV] criterion=gini, max_depth=20, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=2, n_estimators=300, total=   0.8s
[CV] criterion=gini, max_depth=20, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=2, n_estimators=300, total=   0.8s
[CV] criterion=gini, max_depth=20, min_samples_split=2, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=2, n_estimators=300, total=   0.8s
[CV] criterion=gini, max_depth=20, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=3, n_estimators=250, total=   0.7s
[CV] criterion=gini, max_depth=20, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=3, n_estimators=250, total=   0.7s
[CV] criterion=gini, max_depth=20, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=3, n_estimators=250, total=   0.9s
[CV] criterion=gini, max_depth=20, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=3, n_estimators=250, total=   0.6s
[CV] criterion=gini, max_depth=20, min_samples_split=3, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=3, n_estimators=250, total=   0.6s
[CV] criterion=gini, max_depth=20, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=3, n_estimators=300, total=   1.1s
[CV] criterion=gini, max_depth=20, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=3, n_estimators=300, total=   0.9s
[CV] criterion=gini, max_depth=20, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=3, n_estimators=300, total=   1.0s
[CV] criterion=gini, max_depth=20, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=3, n_estimators=300, total=   0.8s
[CV] criterion=gini, max_depth=20, min_samples_split=3, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=3, n_estimators=300, total=   0.8s
[CV] criterion=gini, max_depth=20, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=gini, max_depth=20, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=4, n_estimators=250, total=   0.6s
[CV] criterion=gini, max_depth=20, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=gini, max_depth=20, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=gini, max_depth=20, min_samples_split=4, n_estimators=250 
[CV]  criterion=gini, max_depth=20, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=gini, max_depth=20, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=4, n_estimators=300, total=   0.8s
[CV] criterion=gini, max_depth=20, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=4, n_estimators=300, total=   0.8s
[CV] criterion=gini, max_depth=20, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=4, n_estimators=300, total=   0.9s
[CV] criterion=gini, max_depth=20, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=4, n_estimators=300, total=   0.8s
[CV] criterion=gini, max_depth=20, min_samples_split=4, n_estimators=300 
[CV]  criterion=gini, max_depth=20, min_samples_split=4, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=250, total=   0.6s
[CV] criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=250, total=   0.6s
[CV] criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=2, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=250, total=   0.6s
[CV] criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=250, total=   0.6s
[CV] criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=3, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=250, total=   0.6s
[CV] criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=250, total=   0.6s
[CV] criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=250, total=   0.6s
[CV] criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=300 
[CV]  criterion=entropy, max_depth=10, min_samples_split=4, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=250, total=   0.8s
[CV] criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=250, total=   0.8s
[CV] criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=250, total=   0.8s
[CV] criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=250, total=   0.8s
[CV] criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=15, min_samples_split=2, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=250, total=   0.8s
[CV] criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=15, min_samples_split=3, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=250, total=   0.8s
[CV] criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=300 
    [CV]  criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=300, total=   1.0s
[CV] criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=300 
[CV]  criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=300 
[CV]  criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=300 
[CV]  criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=300 
[CV]  criterion=entropy, max_depth=15, min_samples_split=4, n_estimators=300, total=   1.0s
[CV] criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=250, total=   0.9s
[CV] criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=2, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=3, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=250 
[CV]  criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=250, total=   0.7s
[CV] criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=300, total=   0.9s
[CV] criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=300, total=   0.8s
[CV] criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=300 
[CV]  criterion=entropy, max_depth=20, min_samples_split=4, n_estimators=300, total=   0.8s


[Parallel(n_jobs=1)]: Done 180 out of 180 | elapsed:  2.6min finished ​    




GridSearchCV(cv=StratifiedShuffleSplit(n_splits=5, random_state=0, test_size=0.2,
            train_size=None),
       error_score='raise',
       estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'n_estimators': [250, 300], 'criterion': ['gini', 'entropy'], 'max_depth': [10, 15, 20], 'min_samples_split': [2, 3, 4]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=2)
print("[FOREST] The best parameters are %s with a score of %0.2f"
      % (forest.best_params_, forest.best_score_))
[FOREST] The best parameters are {'criterion': 'entropy', 'max_depth': 10, 'min_samples_split': 4, 'n_estimators': 300} with a score of 0.82 ​    

So, it seems the random forest is the best one.

Leave a Comment

Your email address will not be published. Required fields are marked *

Loading...