Getting started with configs#

One of the key features provided by utilsd is to connect pricipled configs of configclass (dataclass essentially), with declarative languages (e.g., JSON/YAML), and concrete objects that are to configure. This tutorial is a step-by-step walkthrough of the most important features utilsd.config is capable of.

configclass#

Imagine a simple scenario, where you want to configure a deep learning experiment with some hyper-parameters. Let’s say learning rate, batch size, and number of epochs. In order to manage your experiment, you would like:

  1. to easily write new configurations (as little code change as possible);

  2. configurations are human-friendly and manageable;

  3. when writing code, the config should look like a python object (with type-checking and code-completion).

If the things above bothers you, configclass is exactly what you need. Next, we will go through the core features for configclass, by implementing the scenario mentioned previously.

[1]:
from utilsd.config import configclass
[2]:
@configclass
class ExperimentConf:
    learning_rate: float
    batch_size: int       # annotate with type
    num_epochs: int = 10  # default value is 10

In the example above, we create a config class with 3 fields. The field num_epochs has a default value 10.

The syntax are very similar to (actually almost same as) those in dataclass. Refer to python documentation of dataclass for background knowledge.

Afterwards, a experiment config can be created with:

[3]:
ExperimentConf(1e-3, 10)
[3]:
ExperimentConf(learning_rate=0.001, batch_size=10, num_epochs=10)

The power of config class is that, it can be created with a dict-like config data. That is to say, users don’t have to prepare the experiment config in the python way above. Instead, they can prepare a dict-like config beforehand:

[4]:
config_data = {'learning_rate': 1e-3, 'batch_size': 10, 'num_epochs': 30}
ExperimentConf.fromdict(config_data)
[4]:
ExperimentConf(learning_rate=0.001, batch_size=10, num_epochs=30)

This can become non-trivial when the config class becomes complex (nested), or config data is not well formatted, which usually happens when it is read from a text file.

[5]:
@configclass
class OptimizerConf:
    opt_type: str
    learning_rate: float = 0.1

@configclass
class ExperimentConf:
    optimizer: OptimizerConf
    batch_size: int          # annotate with type
    num_epochs: int = 10     # default value is 10

ExperimentConf.fromdict(dict(optimizer={'opt_type': 'adam', 'learning_rate': 0.1}, batch_size=4, num_epochs=2))

# `fromdict` will do the conversion automatically between int/float/str.
# The following will also work:
ExperimentConf.fromdict(dict(
    optimizer={
        'opt_type': 'adam',
        'learning_rate': '1e-3'  # expect a float but found a str here
    },
    batch_size=4.0,
    num_epochs=1
))
[5]:
ExperimentConf(optimizer=OptimizerConf(opt_type='adam', learning_rate=0.001), batch_size=4, num_epochs=1)

We show the usage of basic types like int/float/str, as well as how to write nested config above.

Apart from these usages, configclass supports the following type annoations:

  • typing.Any (try to avoid using it, because no type-checking is available for Any)

  • typing.Optional (when set to None is also legal for this field)

  • typing.List[xxx]

  • typing.Dict[xxx, xxx]

  • typing.Tuple[xxx, xxx, ...] (the ellipsis here means tuple can be arbitrarily long, writing ellipsis here is not currently supported)

  • typing.Union[aaa, bbb, ccc] (the types are tried one by one until one type is validated)

  • Enum

Note that the inner types in optional, list, dict, and tuple will be expanded for type-checking and conversion. We show a (complex) example below.

[6]:
from enum import Enum
from typing import Optional, Tuple, Union, List, Dict, Any

class OptimizerType(str, Enum):
    SGD = 'sgd'
    Adam = 'adam'

@configclass
class OptimizerConfig:
    opt_type: OptimizerType
    learning_rate: float
    momentum: float
    weight_decay: float
    grad_clip: Optional[float]   # optional but must set, either set to none or a float
    betas: Optional[Tuple[float, float]] = None
    other_params: Optional[Dict[str, Any]] = None

@configclass
class TrainerConfig:
    optimizer: OptimizerConfig
    num_epochs: Union[int, List[int]]
    batch_size: int
    fast_dev_run: bool = False
[7]:
config = TrainerConfig.fromdict({
    'optimizer': {
        'opt_type': 'adam',
        'learning_rate': 0.1,
        'momentum': 0.9,
        'weight_decay': 0,
        'grad_clip': None,  # has to set, otherwise will complain
        # betas has default value, can be omitted
        'other_params': {
            'eps': '1e-8'  # not converting, because it's any
        }
    },
    'num_epochs': [10, 20],  # union type
    'batch_size': 10,
    'fast_dev_run': True
})

config
[7]:
TrainerConfig(optimizer=OptimizerConfig(opt_type=<OptimizerType.Adam: 'adam'>, learning_rate=0.1, momentum=0.9, weight_decay=0.0, grad_clip=None, betas=None, other_params={'eps': '1e-8'}), num_epochs=[10, 20], batch_size=10, fast_dev_run=True)
[8]:
config.asdict()
[8]:
{'optimizer': {'opt_type': 'adam',
  'learning_rate': 0.1,
  'momentum': 0.9,
  'weight_decay': 0.0,
  'grad_clip': None,
  'betas': None,
  'other_params': {'eps': '1e-8'}},
 'num_epochs': [10, 20],
 'batch_size': 10,
 'fast_dev_run': True}

From file#

To manage config files without touching Python code, we recommend saving the configs into files like JSON or YAML (users can even put them into a separate Python file if they want). The syntax of those YAML/JSON files are similar to those within MMCV config. We recommend reading the tutorial, because some features can be very helpful, such as using _base_ to inherit base config.

Afterwards, config can be created via:

[9]:
! cat assets/config_trainer.yml
optimizer:
  opt_type: adam
  learning_rate: 1e-3  # pyyaml loads it as a string, but it's okay because we have our converting
  momentum: 0.9
  weight_decay: 0
  grad_clip: null
  other_params:
    eps: 1.0e-8  # converting doesn't help because the annotated type is any
num_epochs: [10, 20]
batch_size: 10
fast_dev_run: true
[10]:
TrainerConfig.fromfile('assets/config_trainer.yml')
[10]:
TrainerConfig(optimizer=OptimizerConfig(opt_type=<OptimizerType.Adam: 'adam'>, learning_rate=0.001, momentum=0.9, weight_decay=0.0, grad_clip=None, betas=None, other_params={'eps': 1e-08}), num_epochs=[10, 20], batch_size=10, fast_dev_run=True)

From command line#

When debugging, it can be helpful to have some extra arguments which can be hot-updated at runtime, without changing any Python code or YAML code. To this end, we provide XXXConfig.fromcli(), which automatically generates a command line parser that accepts a base config as well as overriding arguments. The overriding arguments will override the base config.

[11]:
! cat assets/config_trainer.py
from enum import Enum
from typing import Optional, Tuple, Union, List, Dict, Any
from utilsd.config import configclass

class OptimizerType(str, Enum):
    SGD = 'sgd'
    Adam = 'adam'

@configclass
class OptimizerConfig:
    opt_type: OptimizerType
    learning_rate: float
    momentum: float
    weight_decay: float
    grad_clip: Optional[float]   # optional but must set, either set to none or a float
    betas: Optional[Tuple[float, float]] = None
    other_params: Optional[Dict[str, Any]] = None

@configclass
class TrainerConfig:
    optimizer: OptimizerConfig
    num_epochs: Union[int, List[int]]
    batch_size: int
    fast_dev_run: bool = False

config = TrainerConfig.fromcli()
print(config)
[12]:
! python assets/config_trainer.py assets/config_trainer.yml -h
usage: config_trainer.py [--batch_size INTEGER] [--fast_dev_run BOOL]
                         [--num_epochs JSON] [--num_epochs.0 INTEGER]
                         [--num_epochs.1 INTEGER] [--optimizer JSON]
                         [--optimizer.grad_clip FLOAT]
                         [--optimizer.learning_rate FLOAT]
                         [--optimizer.momentum FLOAT]
                         [--optimizer.opt_type STRING]
                         [--optimizer.other_params JSON]
                         [--optimizer.weight_decay FLOAT] [-h]
                         exp

Command line auto-generated with utilsd.config. A path to base config file
(like JSON/YAML) needs to be specified first. Then some extra arguments to
override the fields in the base config. Please note the type of arguments
(always use `-h` for reference): `JSON` type means the field accepts a `JSON`
for overriding.

positional arguments:
  exp                   Experiment YAML file

optional arguments:
  --batch_size INTEGER
  --fast_dev_run BOOL
  --num_epochs JSON
  --num_epochs.0 INTEGER
  --num_epochs.1 INTEGER
  --optimizer JSON
  --optimizer.grad_clip FLOAT
  --optimizer.learning_rate FLOAT
  --optimizer.momentum FLOAT
  --optimizer.opt_type STRING
  --optimizer.other_params JSON
  --optimizer.weight_decay FLOAT
  -h, --help            Show this help message and exit

It can be seen in the help message that, not only the primitives can be replaced, but also the lists and dicts. To replace them, try to write the object into a JSON.

[13]:
! python assets/config_trainer.py assets/config_trainer.yml --num_epochs "[1, 2, 3]"
TrainerConfig(optimizer=OptimizerConfig(opt_type=<OptimizerType.Adam: 'adam'>, learning_rate=0.001, momentum=0.9, weight_decay=0.0, grad_clip=None, betas=None, other_params={'eps': 1e-08}), num_epochs=[1, 2, 3], batch_size=10, fast_dev_run=True)

For advanced usages, please refer to API references of utilsd.config.