0

Я пытаюсь узнать, как найти оптимальные гиперпараметры в классификаторе деревьев решений, используя метод GridSearchCV() из scikit-learn.Недопустимая ошибка параметра при выполнении python scikit-learn grid-search method

Проблема заключается в том, что это нормально, если я указать только один параметр опции, это прекрасно, как в следующем:

print(__doc__) 

# Code source: Gael Varoquaux 
# Modified for documentation by Jaques Grobler 
# License: BSD 3 clause 

from sklearn import datasets 
from sklearn.grid_search import GridSearchCV 
from sklearn.tree import DecisionTreeClassifier 

# define classifier 
dt = DecisionTreeClassifier() 

# import some data to play with 
iris = datasets.load_iris() 
X = iris.data[:, :2] # we only take the first two features. 
y = iris.target 

# define parameter values that should be searched 
min_samples_split_options = range(2, 4) 

# create a parameter grid: map the parameter names to the values that should be saved 
param_grid_dt = dict(min_samples_split= min_samples_split_options) # for DT 

# instantiate the grid 
grid = GridSearchCV(dt, param_grid_dt, cv=10, scoring='accuracy') 

# fit the grid with param 
grid.fit(X, y) 

# view complete results 
grid.grid_scores_ 

'''# examine results from first tuple 
print grid.grid_scores_[0].parameters 
print grid.grid_scores_[0].cv_validation_scores 
print grid.grid_scores_[0].mean_validation_score''' 

# examine the best model 
print '*******Final results*********' 
print grid.best_score_ 
print grid.best_params_ 
print grid.best_estimator_ 

Результат:

None 
*******Final results********* 
0.68 
{'min_samples_split': 3} 
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None, 
      max_features=None, max_leaf_nodes=None, min_samples_leaf=1, 
      min_samples_split=3, min_weight_fraction_leaf=0.0, 
      presort=False, random_state=None, splitter='best') 

Но когда я добавить еще параметры варианты в миксе, это дает мне ошибку «недопустимый параметр» следующим образом:

print(__doc__) 


# Code source: Gael Varoquaux 
# Modified for documentation by Jaques Grobler 
# License: BSD 3 clause 

from sklearn import datasets 
from sklearn.grid_search import GridSearchCV 
from sklearn.tree import DecisionTreeClassifier 

# define classifier 
dt = DecisionTreeClassifier() 

# import some data to play with 
iris = datasets.load_iris() 
X = iris.data[:, :2] # we only take the first two features. 
y = iris.target 

# define parameter values that should be searched 
max_depth_options = range(10, 251) # for DT 
min_samples_split_options = range(2, 4) 

# create a parameter grid: map the parameter names to the values that should be saved 
param_grid_dt = dict(max_depth=max_depth_options, min_sample_split=min_samples_split_options) # for DT 

# instantiate the grid 
grid = GridSearchCV(dt, param_grid_dt, cv=10, scoring='accuracy') 

# fit the grid with param 
grid.fit(X, y) 

'''# view complete results 
grid.grid_scores_ 

# examine results from first tuple 
print grid.grid_scores_[0].parameters 
print grid.grid_scores_[0].cv_validation_scores 
print grid.grid_scores_[0].mean_validation_score 

# examine the best model 
print '*******Final results*********' 
print grid.best_score_ 
print grid.best_params_ 
print grid.best_estimator_''' 

Res ии:

None 
Traceback (most recent call last): 
    File "C:\Users\KubiK\Desktop\GridSearch_ex6.py", line 31, in <module> 
    grid.fit(X, y) 
    File "C:\Users\KubiK\Anaconda2\lib\site-packages\sklearn\grid_search.py", line 804, in fit 
    return self._fit(X, y, ParameterGrid(self.param_grid)) 
    File "C:\Users\KubiK\Anaconda2\lib\site-packages\sklearn\grid_search.py", line 553, in _fit 
    for parameters in parameter_iterable 
    File "C:\Users\KubiK\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 800, in __call__ 
    while self.dispatch_one_batch(iterator): 
    File "C:\Users\KubiK\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 658, in dispatch_one_batch 
    self._dispatch(tasks) 
    File "C:\Users\KubiK\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 566, in _dispatch 
    job = ImmediateComputeBatch(batch) 
    File "C:\Users\KubiK\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 180, in __init__ 
    self.results = batch() 
    File "C:\Users\KubiK\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 72, in __call__ 
    return [func(*args, **kwargs) for func, args, kwargs in self.items] 
    File "C:\Users\KubiK\Anaconda2\lib\site-packages\sklearn\cross_validation.py", line 1520, in _fit_and_score 
    estimator.set_params(**parameters) 
    File "C:\Users\KubiK\Anaconda2\lib\site-packages\sklearn\base.py", line 270, in set_params 
    (key, self.__class__.__name__)) 
ValueError: Invalid parameter min_sample_split for estimator DecisionTreeClassifier. Check the list of available parameters with `estimator.get_params().keys()`. 
[Finished in 0.3s] 
+0

Ваш код должен быть 'min_samples_split' не' min_sample_split' – maxymoo

ответ

2

Там опечатка в коде, он должен быть min_samples_split не min_sample_split.