{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting started with configs\n", "\n", "One of the key features provided by utilsd is to connect pricipled configs of `configclass` (`dataclass` essentially), with declarative languages (e.g., JSON/YAML), and concrete objects that are to configure. This tutorial is a step-by-step walkthrough of the most important features `utilsd.config` is capable of." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `configclass`\n", "\n", "Imagine a simple scenario, where you want to configure a deep learning experiment with some hyper-parameters. Let's say learning rate, batch size, and number of epochs. In order to manage your experiment, you would like:\n", "\n", "1. to easily write new configurations (as little code change as possible);\n", "2. configurations are human-friendly and manageable;\n", "3. when writing code, the config should look like a python object (with type-checking and code-completion).\n", "\n", "If the things above bothers you, `configclass` is exactly what you need. Next, we will go through the core features for `configclass`, by implementing the scenario mentioned previously." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from utilsd.config import configclass" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "@configclass\n", "class ExperimentConf:\n", " learning_rate: float\n", " batch_size: int # annotate with type\n", " num_epochs: int = 10 # default value is 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the example above, we create a config class with 3 fields. The field `num_epochs` has a default value 10.\n", "\n", "The syntax are very similar to (actually almost same as) those in dataclass. Refer to python documentation of [dataclass](https://docs.python.org/3/library/dataclasses.html) for background knowledge.\n", "\n", "Afterwards, a experiment config can be created with:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ExperimentConf(learning_rate=0.001, batch_size=10, num_epochs=10)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ExperimentConf(1e-3, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The power of config class is that, it can be created with a dict-like config data. That is to say, users don't have to prepare the experiment config in the python way above. Instead, they can prepare a dict-like config beforehand:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ExperimentConf(learning_rate=0.001, batch_size=10, num_epochs=30)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "config_data = {'learning_rate': 1e-3, 'batch_size': 10, 'num_epochs': 30}\n", "ExperimentConf.fromdict(config_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This can become non-trivial when the config class becomes complex (nested), or config data is not well formatted, which usually happens when it is read from a text file." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ExperimentConf(optimizer=OptimizerConf(opt_type='adam', learning_rate=0.001), batch_size=4, num_epochs=1)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@configclass\n", "class OptimizerConf:\n", " opt_type: str\n", " learning_rate: float = 0.1\n", "\n", "@configclass\n", "class ExperimentConf:\n", " optimizer: OptimizerConf\n", " batch_size: int # annotate with type\n", " num_epochs: int = 10 # default value is 10\n", "\n", "ExperimentConf.fromdict(dict(optimizer={'opt_type': 'adam', 'learning_rate': 0.1}, batch_size=4, num_epochs=2))\n", "\n", "# `fromdict` will do the conversion automatically between int/float/str.\n", "# The following will also work:\n", "ExperimentConf.fromdict(dict(\n", " optimizer={\n", " 'opt_type': 'adam',\n", " 'learning_rate': '1e-3' # expect a float but found a str here\n", " },\n", " batch_size=4.0,\n", " num_epochs=1\n", "))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We show the usage of basic types like int/float/str, as well as how to write nested config above.\n", "\n", "Apart from these usages, `configclass` supports the following type annoations:\n", "\n", "* `typing.Any` (try to avoid using it, because no type-checking is available for Any)\n", "* `typing.Optional` (when set to None is also legal for this field)\n", "* `typing.List[xxx]`\n", "* `typing.Dict[xxx, xxx]`\n", "* `typing.Tuple[xxx, xxx, ...]` (the ellipsis here means tuple can be arbitrarily long, writing ellipsis here is not currently supported)\n", "* `typing.Union[aaa, bbb, ccc]` (the types are tried one by one until one type is validated)\n", "* `Enum`\n", "\n", "Note that the inner types in optional, list, dict, and tuple will be expanded for type-checking and conversion. We show a (complex) example below." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from enum import Enum\n", "from typing import Optional, Tuple, Union, List, Dict, Any\n", "\n", "class OptimizerType(str, Enum):\n", " SGD = 'sgd'\n", " Adam = 'adam'\n", "\n", "@configclass\n", "class OptimizerConfig:\n", " opt_type: OptimizerType\n", " learning_rate: float\n", " momentum: float\n", " weight_decay: float\n", " grad_clip: Optional[float] # optional but must set, either set to none or a float\n", " betas: Optional[Tuple[float, float]] = None\n", " other_params: Optional[Dict[str, Any]] = None\n", "\n", "@configclass\n", "class TrainerConfig:\n", " optimizer: OptimizerConfig\n", " num_epochs: Union[int, List[int]]\n", " batch_size: int\n", " fast_dev_run: bool = False" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "TrainerConfig(optimizer=OptimizerConfig(opt_type=, learning_rate=0.1, momentum=0.9, weight_decay=0.0, grad_clip=None, betas=None, other_params={'eps': '1e-8'}), num_epochs=[10, 20], batch_size=10, fast_dev_run=True)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "config = TrainerConfig.fromdict({\n", " 'optimizer': {\n", " 'opt_type': 'adam',\n", " 'learning_rate': 0.1,\n", " 'momentum': 0.9,\n", " 'weight_decay': 0,\n", " 'grad_clip': None, # has to set, otherwise will complain\n", " # betas has default value, can be omitted\n", " 'other_params': {\n", " 'eps': '1e-8' # not converting, because it's any\n", " }\n", " },\n", " 'num_epochs': [10, 20], # union type\n", " 'batch_size': 10,\n", " 'fast_dev_run': True\n", "})\n", "\n", "config" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'optimizer': {'opt_type': 'adam',\n", " 'learning_rate': 0.1,\n", " 'momentum': 0.9,\n", " 'weight_decay': 0.0,\n", " 'grad_clip': None,\n", " 'betas': None,\n", " 'other_params': {'eps': '1e-8'}},\n", " 'num_epochs': [10, 20],\n", " 'batch_size': 10,\n", " 'fast_dev_run': True}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "config.asdict()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## From file\n", "\n", "To manage config files without touching Python code, we recommend saving the configs into files like JSON or YAML (users can even put them into a separate Python file if they want). The syntax of those YAML/JSON files are similar to those within [MMCV config](https://mmcv.readthedocs.io/en/latest/understand_mmcv/config.html). We recommend reading the tutorial, because some features can be very helpful, such as using `_base_` to inherit base config.\n", "\n", "Afterwards, config can be created via:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "optimizer:\n", " opt_type: adam\n", " learning_rate: 1e-3 # pyyaml loads it as a string, but it's okay because we have our converting\n", " momentum: 0.9\n", " weight_decay: 0\n", " grad_clip: null\n", " other_params:\n", " eps: 1.0e-8 # converting doesn't help because the annotated type is any\n", "num_epochs: [10, 20]\n", "batch_size: 10\n", "fast_dev_run: true\n" ] } ], "source": [ "! cat assets/config_trainer.yml" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "TrainerConfig(optimizer=OptimizerConfig(opt_type=, learning_rate=0.001, momentum=0.9, weight_decay=0.0, grad_clip=None, betas=None, other_params={'eps': 1e-08}), num_epochs=[10, 20], batch_size=10, fast_dev_run=True)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "TrainerConfig.fromfile('assets/config_trainer.yml')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## From command line\n", "\n", "When debugging, it can be helpful to have some extra arguments which can be hot-updated at runtime, without changing any Python code or YAML code. To this end, we provide `XXXConfig.fromcli()`, which automatically generates a command line parser that accepts a base config as well as overriding arguments. The overriding arguments will override the base config." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "from enum import Enum\n", "from typing import Optional, Tuple, Union, List, Dict, Any\n", "from utilsd.config import configclass\n", "\n", "class OptimizerType(str, Enum):\n", " SGD = 'sgd'\n", " Adam = 'adam'\n", "\n", "@configclass\n", "class OptimizerConfig:\n", " opt_type: OptimizerType\n", " learning_rate: float\n", " momentum: float\n", " weight_decay: float\n", " grad_clip: Optional[float] # optional but must set, either set to none or a float\n", " betas: Optional[Tuple[float, float]] = None\n", " other_params: Optional[Dict[str, Any]] = None\n", "\n", "@configclass\n", "class TrainerConfig:\n", " optimizer: OptimizerConfig\n", " num_epochs: Union[int, List[int]]\n", " batch_size: int\n", " fast_dev_run: bool = False\n", "\n", "config = TrainerConfig.fromcli()\n", "print(config)\n" ] } ], "source": [ "! cat assets/config_trainer.py" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: config_trainer.py [--batch_size INTEGER] [--fast_dev_run BOOL]\n", " [--num_epochs JSON] [--num_epochs.0 INTEGER]\n", " [--num_epochs.1 INTEGER] [--optimizer JSON]\n", " [--optimizer.grad_clip FLOAT]\n", " [--optimizer.learning_rate FLOAT]\n", " [--optimizer.momentum FLOAT]\n", " [--optimizer.opt_type STRING]\n", " [--optimizer.other_params JSON]\n", " [--optimizer.weight_decay FLOAT] [-h]\n", " exp\n", "\n", "Command line auto-generated with utilsd.config. A path to base config file\n", "(like JSON/YAML) needs to be specified first. Then some extra arguments to\n", "override the fields in the base config. Please note the type of arguments\n", "(always use `-h` for reference): `JSON` type means the field accepts a `JSON`\n", "for overriding.\n", "\n", "positional arguments:\n", " exp Experiment YAML file\n", "\n", "optional arguments:\n", " --batch_size INTEGER\n", " --fast_dev_run BOOL\n", " --num_epochs JSON\n", " --num_epochs.0 INTEGER\n", " --num_epochs.1 INTEGER\n", " --optimizer JSON\n", " --optimizer.grad_clip FLOAT\n", " --optimizer.learning_rate FLOAT\n", " --optimizer.momentum FLOAT\n", " --optimizer.opt_type STRING\n", " --optimizer.other_params JSON\n", " --optimizer.weight_decay FLOAT\n", " -h, --help Show this help message and exit\n" ] } ], "source": [ "! python assets/config_trainer.py assets/config_trainer.yml -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It can be seen in the help message that, not only the primitives can be replaced, but also the lists and dicts. To replace them, try to write the object into a JSON." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TrainerConfig(optimizer=OptimizerConfig(opt_type=, learning_rate=0.001, momentum=0.9, weight_decay=0.0, grad_clip=None, betas=None, other_params={'eps': 1e-08}), num_epochs=[1, 2, 3], batch_size=10, fast_dev_run=True)\n" ] } ], "source": [ "! python assets/config_trainer.py assets/config_trainer.yml --num_epochs \"[1, 2, 3]\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For advanced usages, please refer to API references of `utilsd.config`." ] } ], "metadata": { "interpreter": { "hash": "e719d61bf2f8783310605d7d33b4a79d3d3808a4b1e839bcbd4d25925ed7dae7" }, "kernelspec": { "display_name": "Python 3.8.12 64-bit ('utilsd': conda)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }