Two years ago when while working on my Master's project which consisted in training an AI model for textual information retrieval, I had to train my models with different parameters to analyse their behaviour. At first, I had to change the values directly in my code which was very tedious so I decided to manage these different parameters with the standard input using the argparse module that comes by default with python. With the increasing number of parameters, it was really getting complicated and that's when I discovered HYDRA and it definitely allowed me to move forward more quickly and complete my experiments. So, I decided to write this brief article to introduce you to HYDRA and how it works. I hope it will be helpful to others, as it was for me.
What is HYDRA
HYDRA is a powerful open-source tool developed by Facebook's researchers to facilitate dynamic configuration creation. Hydra defines configurations from YAML files and this configuration can be modified with standard parameters from the CLI. The key features of HYDRA include:
Hierarchical configuration is composable from multiple sources.
Configuration can be specified or overridden from the command line.
Dynamic command line tab completion.
Run your application locally or launch it to run remotely.
Run multiple jobs with different arguments with a single command.
How do HYDRA work?
Installation
HYDRA is a Python package and can therefore be installed from the Package Index using the following command.
pip install hydra-core
Managing configuration with HYDRA
To use HYDRA to manage configurations, the first step is to create our configuration as a YAML file. The following is an example of a configuration file.
db:
driver: postgres
database: database_name
user: username
pass: password
social:
google:
client_id: <GOOGLE_CLIENT_ID>
client_secret: <GOOGLE_CLIENT_SECRET>
This configuration can be used in a Python module as follows:
import hydra
from omegaconf import DictConfig, OmegaConf
from database import DBDriver
from socialauth import GoogleAuth
@hydra.main(version_base=None, config_path=".", config_name="conf")
def main(configs : DictConfig) -> None:
# db = DBDriver(configs.db)
# google_provider = GoogleAuth(configs.social.google)
print(OmegaConf.to_yaml(cfg))
if __name__ == "__main__":
main()
If you run this script with python
main.py
without providing any command line parameters, you will get the following parameters:
python main.py
{'driver': 'postgres', 'database': 'database_name', 'user': 'username', 'pass': 'password'}
{'client_id': '<GOOGLE_CLIENT_ID>', 'client_secret': '<GOOGLE_CLIENT_SECRET>'}
db:
driver: postgres
database: database_name
user: username
pass: password
social:
google:
client_id: <GOOGLE_CLIENT_ID>
client_secret: <GOOGLE_CLIENT_SECRET>
However, due to the power and flexibility of HYDRA, the values of these parameters can be modified directly during the execution of the program. The following example shows the result of the execution of the program by modifying the parameters:
python main.py db.driver=mysql db.database=example social.google.client_id=a4a4a4a4 social.google_secret=f5f5f5f5f5f5
{'driver': 'mysql', 'database': 'example', 'user': 'username', 'pass': 'password'}
{'client_id': 'a4a4a4a4', 'client_secret': 'f5f5f5f5f5f5'}
db:
driver: mysql
database: example
user: username
pass: password
social:
google:
client_id: a4a4a4a4
client_secret: f5f5f5f5f5f5f
It is important to note that, the values written in the configuration file are just the default values. It is also possible to define parameters without default values. In this case, the user would have to pass the values at runtime as we do for functions in programming. For such a configuration, the default value is ???
as we can see in the following:
db:
driver: postgres
database: database_name
user: username
pass: password
social:
google:
client_id: ???
client_secret: ???
If we run our program without passing values for the parameters social.google
.client_id
and social.google
.client_secret
we will get the following error:
python main.py
{'driver': 'postgres', 'database': 'database_name', 'user': 'username', 'pass': 'password'}
Error executing job with overrides: []
Traceback (most recent call last):
File "/home/username/github/tutorials/hydra/main.py", line 9, in main
google_provider = GoogleAuth(**configs.social.google)
omegaconf.errors.MissingMandatoryValue: Missing mandatory value: social.google.client_id
full_key: social.google.client_id
object_type=dict
Conclusion
HYDRA is a powerful open-source tool developed by Facebook's researchers to facilitate dynamic configuration creation. Hydra defines configurations from YAML files and this configuration can be modified with standard parameters from the CLI.