# **使用 Hyperopt 高级模型训练**

在高级模型训练教程中，我们已经了解了在 deepchem 包中使用 GridHyperparamOpt 进行超参数优化。在本教程中，我们将研究另一个称为 Hyperopt 的超参数调优库。

## Colab

This tutorial and the rest in this sequence can be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepchem/deepchem/blob/master/examples/tutorials/Hyperopt_training.ipynb)

## 准备

要运行本教程需要安装 Hyperopt 库。

To run DeepChem and Hyperopt within Colab, you'll need to run the following installation commands. You can of course run this tutorial locally if you prefer. In that case, don't run these cells since they will download and install DeepChem and Hyperopt in your local machine again.

In [1]:
!pip install deepchem
!pip install hyperopt

Collecting deepchem
  Downloading deepchem-2.6.1-py3-none-any.whl (608 kB)
[?25l[K     |▌                               | 10 kB 31.6 MB/s eta 0:00:01[K     |█                               | 20 kB 27.2 MB/s eta 0:00:01[K     |█▋                              | 30 kB 11.2 MB/s eta 0:00:01[K     |██▏                             | 40 kB 8.9 MB/s eta 0:00:01[K     |██▊                             | 51 kB 5.3 MB/s eta 0:00:01[K     |███▎                            | 61 kB 5.4 MB/s eta 0:00:01[K     |███▊                            | 71 kB 5.4 MB/s eta 0:00:01[K     |████▎                           | 81 kB 6.1 MB/s eta 0:00:01[K     |████▉                           | 92 kB 6.2 MB/s eta 0:00:01[K     |█████▍                          | 102 kB 5.2 MB/s eta 0:00:01[K     |██████                          | 112 kB 5.2 MB/s eta 0:00:01[K     |██████▌                         | 122 kB 5.2 MB/s eta 0:00:01[K     |███████                         | 133 kB 5.2 MB/s eta 0:00:01



## 通过hyperopt进行超参数优化

让我们从加载 HIV 数据集开始。它根据是否抑制艾滋病毒复制对超过4万个分子进行了分类。

In [2]:
import deepchem as dc
tasks, datasets, transformers = dc.molnet.load_hiv(featurizer='ECFP', split='scaffold')
train_dataset, valid_dataset, test_dataset = datasets

'split' is deprecated.  Use 'splitter' instead.


现在，让我们导入 hyperopt 库，我们将使用它来提供最佳参数

In [3]:
from hyperopt import hp, fmin, tpe, Trials

然后，我们必须声明一个字典，其中包含所有超形参及其将调优的范围。这本字典将作为 hyperopt 的搜索空间。

在字典中声明范围的一些基本方法是：

*   hp.choice('label',[*choices*]) : this is used to specify a list of choices
*   hp.uniform('label' ,low=*low_value* ,high=*high_value*) :  this is used to specify a uniform distibution

在低值和高值之间。它们之间的值可以是任何实数，不一定是整数。

在这里，我们将使用多任务分类器对 HIV 数据集进行分类，因此适当的搜索空间如下所示。

In [None]:
search_space = {
    'layer_sizes': hp.choice('layer_sizes',[[500], [1000], [2000],[1000,1000]]),
    'dropouts': hp.uniform('dropout',low=0.2, high=0.5),
    'learning_rate': hp.uniform('learning_rate',high=0.001, low=0.0001)
}

然后，我们应该声明一个由 hyperopt 最小化的函数。所以，这里我们应该使用这个函数来最小化我们的多任务分类器模型。此外，我们使用 validation callback 在每1000步验证分类器，然后将最佳分数作为返回值传递。这里使用的指标是 ` roc_auc_score ` ，需要最大化它。使一个非负值最大化相当于使其负数最小化，因此我们将返回验证得分的负数。

In [None]:
import tempfile
#tempfile is used to save the best checkpoint later in the program.

metric = dc.metrics.Metric(dc.metrics.roc_auc_score)

def fm(args):
  save_dir = tempfile.mkdtemp()
  model = dc.models.MultitaskClassifier(n_tasks=len(tasks),n_features=1024,layer_sizes=args['layer_sizes'],dropouts=args['dropouts'],learning_rate=args['learning_rate'])
  #validation callback that saves the best checkpoint, i.e the one with the maximum score.
  validation=dc.models.ValidationCallback(valid_dataset, 1000, [metric],save_dir=save_dir,transformers=transformers,save_on_minimum=False)
  
  model.fit(train_dataset, nb_epoch=25,callbacks=validation)

  #restoring the best checkpoint and passing the negative of its validation score to be minimized.
  model.restore(model_dir=save_dir)
  valid_score = model.evaluate(valid_dataset, [metric], transformers)

  return -1*valid_score['roc_auc_score']

在这里，我们调用 hyperopt 的 fmin 函数，在这里我们传递要最小化的函数、要遵循的算法、最大 eval 数和一个 trials 对象。Trials 对象用于保存所有超参数、损失和其他信息，这意味着你可以在运行优化后访问它们。此外，Trials 可以帮助你保存重要信息，以便稍后加载，然后恢复优化过程。

此外，该算法有三种选择，无需额外配置即可使用。他们是:-


*   Random Search - rand.suggest
*   TPE (Tree Parzen Estimators) - tpe.suggest
*   Adaptive TPE - atpe.suggest

In [None]:
trials=Trials()
best = fmin(fm,
    		space= search_space,
    		algo=tpe.suggest,
    		max_evals=15,
    		trials = trials)


  0%|          | 0/15 [00:00<?, ?it/s, best loss: ?]Step 1000 validation: roc_auc_score=0.777648
Step 2000 validation: roc_auc_score=0.755485
Step 3000 validation: roc_auc_score=0.739519
Step 4000 validation: roc_auc_score=0.764756
Step 5000 validation: roc_auc_score=0.757006
Step 6000 validation: roc_auc_score=0.752609
Step 7000 validation: roc_auc_score=0.763002
Step 8000 validation: roc_auc_score=0.749202
  7%|▋         | 1/15 [05:37<1:18:46, 337.58s/it, best loss: -0.7776476459925534]Step 1000 validation: roc_auc_score=0.750455
Step 2000 validation: roc_auc_score=0.783594
Step 3000 validation: roc_auc_score=0.775872
Step 4000 validation: roc_auc_score=0.768825
Step 5000 validation: roc_auc_score=0.769555
Step 6000 validation: roc_auc_score=0.765324
Step 7000 validation: roc_auc_score=0.771146
Step 8000 validation: roc_auc_score=0.760138
 13%|█▎        | 2/15 [07:05<41:16, 190.51s/it, best loss: -0.7835939030962179]  Step 1000 validation: roc_auc_score=0.744178
Step 2000 validation

下面的代码用于打印 hyperopt 找到的最佳超参数。

In [None]:
print("Best: {}".format(best))


Best: {'dropout': 0.3749846096922802, 'layer_sizes': 0, 'learning_rate': 0.0007544819475363869}


这里发现的超参数不一定是最好的，但可以大致了解哪些参数是有效的。为了得到更准确的结果，必须增加验证周期的数量和模型拟合的周期。但是这样做可能会增加寻找最佳超参数的时间。

# Congratulations! Time to join the Community!

Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:

## Star DeepChem on [GitHub](https://github.com/deepchem/deepchem)
This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.

## Join the DeepChem Gitter
The DeepChem [Gitter](https://gitter.im/deepchem/Lobby) hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!