Manage script's configurations with hydra

Two years ago when while working on my Master's project which consisted in training an AI model for textual information retrieval, I had to train my models with different parameters to analyse their behaviour. At first, I had to change the values directly in my code which was very tedious so I decided to manage these different parameters with the standard input using the argparse module that comes by default with python. With the increasing number of parameters, it was really getting complicated and that's when I discovered HYDRA and it definitely allowed me to move forward more quickly and complete my experiments. So, I decided to write this brief article to introduce you to HYDRA and how it works. I hope it will be helpful to others, as it was for me.

What is HYDRA

HYDRA is a powerful open-source tool developed by Facebook's researchers to facilitate dynamic configuration creation. Hydra defines configurations from YAML files and this configuration can be modified with standard parameters from the CLI. The key features of HYDRA include:

  • Hierarchical configuration is composable from multiple sources.

  • Configuration can be specified or overridden from the command line.

  • Dynamic command line tab completion.

  • Run your application locally or launch it to run remotely.

  • Run multiple jobs with different arguments with a single command.

How do HYDRA work?

Installation

HYDRA is a Python package and can therefore be installed from the Package Index using the following command.

pip install hydra-core

Managing configuration with HYDRA

To use HYDRA to manage configurations, the first step is to create our configuration as a YAML file. The following is an example of a configuration file.

db:
  driver: postgres
  database: database_name
  user: username
  pass: password

social:
  google:
    client_id: <GOOGLE_CLIENT_ID>
    client_secret: <GOOGLE_CLIENT_SECRET>

This configuration can be used in a Python module as follows:

import hydra
from omegaconf import DictConfig, OmegaConf
from database import DBDriver
from socialauth import GoogleAuth

@hydra.main(version_base=None, config_path=".", config_name="conf")
def main(configs : DictConfig) -> None:
    # db = DBDriver(configs.db)
    # google_provider = GoogleAuth(configs.social.google)
    print(OmegaConf.to_yaml(cfg))

if __name__ == "__main__":
    main()

If you run this script with python main.py without providing any command line parameters, you will get the following parameters:

python main.py
{'driver': 'postgres', 'database': 'database_name', 'user': 'username', 'pass': 'password'}
{'client_id': '<GOOGLE_CLIENT_ID>', 'client_secret': '<GOOGLE_CLIENT_SECRET>'}
db:
  driver: postgres
  database: database_name
  user: username
  pass: password

social:
  google:
    client_id: <GOOGLE_CLIENT_ID>
    client_secret: <GOOGLE_CLIENT_SECRET>

However, due to the power and flexibility of HYDRA, the values of these parameters can be modified directly during the execution of the program. The following example shows the result of the execution of the program by modifying the parameters:

python main.py db.driver=mysql db.database=example social.google.client_id=a4a4a4a4 social.google_secret=f5f5f5f5f5f5
{'driver': 'mysql', 'database': 'example', 'user': 'username', 'pass': 'password'}
{'client_id': 'a4a4a4a4', 'client_secret': 'f5f5f5f5f5f5'}
db:
  driver: mysql
  database: example
  user: username
  pass: password

social:
  google:
    client_id: a4a4a4a4
    client_secret: f5f5f5f5f5f5f

It is important to note that, the values written in the configuration file are just the default values. It is also possible to define parameters without default values. In this case, the user would have to pass the values at runtime as we do for functions in programming. For such a configuration, the default value is ??? as we can see in the following:

db:
  driver: postgres
  database: database_name
  user: username
  pass: password

social:
  google:
    client_id: ???
    client_secret: ???

If we run our program without passing values for the parameters social.google.client_id and social.google.client_secret we will get the following error:

python main.py
{'driver': 'postgres', 'database': 'database_name', 'user': 'username', 'pass': 'password'}
Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/username/github/tutorials/hydra/main.py", line 9, in main
    google_provider = GoogleAuth(**configs.social.google)
omegaconf.errors.MissingMandatoryValue: Missing mandatory value: social.google.client_id
    full_key: social.google.client_id
    object_type=dict

Conclusion

HYDRA is a powerful open-source tool developed by Facebook's researchers to facilitate dynamic configuration creation. Hydra defines configurations from YAML files and this configuration can be modified with standard parameters from the CLI.