{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting started with configs\n",
    "\n",
    "One of the key features provided by utilsd is to connect pricipled configs of `configclass` (`dataclass` essentially), with declarative languages (e.g., JSON/YAML), and concrete objects that are to configure. This tutorial is a step-by-step walkthrough of the most important features `utilsd.config` is capable of."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `configclass`\n",
    "\n",
    "Imagine a simple scenario, where you want to configure a deep learning experiment with some hyper-parameters. Let's say learning rate, batch size, and number of epochs. In order to manage your experiment, you would like:\n",
    "\n",
    "1. to easily write new configurations (as little code change as possible);\n",
    "2. configurations are human-friendly and manageable;\n",
    "3. when writing code, the config should look like a python object (with type-checking and code-completion).\n",
    "\n",
    "If the things above bothers you, `configclass` is exactly what you need. Next, we will go through the core features for `configclass`, by implementing the scenario mentioned previously."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from utilsd.config import configclass"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "@configclass\n",
    "class ExperimentConf:\n",
    "    learning_rate: float\n",
    "    batch_size: int       # annotate with type\n",
    "    num_epochs: int = 10  # default value is 10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the example above, we create a config class with 3 fields. The field `num_epochs` has a default value 10.\n",
    "\n",
    "The syntax are very similar to (actually almost same as) those in dataclass. Refer to python documentation of [dataclass](https://docs.python.org/3/library/dataclasses.html) for background knowledge.\n",
    "\n",
    "Afterwards, a experiment config can be created with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ExperimentConf(learning_rate=0.001, batch_size=10, num_epochs=10)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ExperimentConf(1e-3, 10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The power of config class is that, it can be created with a dict-like config data. That is to say, users don't have to prepare the experiment config in the python way above. Instead, they can prepare a dict-like config beforehand:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ExperimentConf(learning_rate=0.001, batch_size=10, num_epochs=30)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "config_data = {'learning_rate': 1e-3, 'batch_size': 10, 'num_epochs': 30}\n",
    "ExperimentConf.fromdict(config_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This can become non-trivial when the config class becomes complex (nested), or config data is not well formatted, which usually happens when it is read from a text file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ExperimentConf(optimizer=OptimizerConf(opt_type='adam', learning_rate=0.001), batch_size=4, num_epochs=1)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@configclass\n",
    "class OptimizerConf:\n",
    "    opt_type: str\n",
    "    learning_rate: float = 0.1\n",
    "\n",
    "@configclass\n",
    "class ExperimentConf:\n",
    "    optimizer: OptimizerConf\n",
    "    batch_size: int          # annotate with type\n",
    "    num_epochs: int = 10     # default value is 10\n",
    "\n",
    "ExperimentConf.fromdict(dict(optimizer={'opt_type': 'adam', 'learning_rate': 0.1}, batch_size=4, num_epochs=2))\n",
    "\n",
    "# `fromdict` will do the conversion automatically between int/float/str.\n",
    "# The following will also work:\n",
    "ExperimentConf.fromdict(dict(\n",
    "    optimizer={\n",
    "        'opt_type': 'adam',\n",
    "        'learning_rate': '1e-3'  # expect a float but found a str here\n",
    "    },\n",
    "    batch_size=4.0,\n",
    "    num_epochs=1\n",
    "))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We show the usage of basic types like int/float/str, as well as how to write nested config above.\n",
    "\n",
    "Apart from these usages, `configclass` supports the following type annoations:\n",
    "\n",
    "* `typing.Any` (try to avoid using it, because no type-checking is available for Any)\n",
    "* `typing.Optional` (when set to None is also legal for this field)\n",
    "* `typing.List[xxx]`\n",
    "* `typing.Dict[xxx, xxx]`\n",
    "* `typing.Tuple[xxx, xxx, ...]` (the ellipsis here means tuple can be arbitrarily long, writing ellipsis here is not currently supported)\n",
    "* `typing.Union[aaa, bbb, ccc]` (the types are tried one by one until one type is validated)\n",
    "* `Enum`\n",
    "\n",
    "Note that the inner types in optional, list, dict, and tuple will be expanded for type-checking and conversion. We show a (complex) example below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from enum import Enum\n",
    "from typing import Optional, Tuple, Union, List, Dict, Any\n",
    "\n",
    "class OptimizerType(str, Enum):\n",
    "    SGD = 'sgd'\n",
    "    Adam = 'adam'\n",
    "\n",
    "@configclass\n",
    "class OptimizerConfig:\n",
    "    opt_type: OptimizerType\n",
    "    learning_rate: float\n",
    "    momentum: float\n",
    "    weight_decay: float\n",
    "    grad_clip: Optional[float]   # optional but must set, either set to none or a float\n",
    "    betas: Optional[Tuple[float, float]] = None\n",
    "    other_params: Optional[Dict[str, Any]] = None\n",
    "\n",
    "@configclass\n",
    "class TrainerConfig:\n",
    "    optimizer: OptimizerConfig\n",
    "    num_epochs: Union[int, List[int]]\n",
    "    batch_size: int\n",
    "    fast_dev_run: bool = False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TrainerConfig(optimizer=OptimizerConfig(opt_type=<OptimizerType.Adam: 'adam'>, learning_rate=0.1, momentum=0.9, weight_decay=0.0, grad_clip=None, betas=None, other_params={'eps': '1e-8'}), num_epochs=[10, 20], batch_size=10, fast_dev_run=True)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "config = TrainerConfig.fromdict({\n",
    "    'optimizer': {\n",
    "        'opt_type': 'adam',\n",
    "        'learning_rate': 0.1,\n",
    "        'momentum': 0.9,\n",
    "        'weight_decay': 0,\n",
    "        'grad_clip': None,  # has to set, otherwise will complain\n",
    "        # betas has default value, can be omitted\n",
    "        'other_params': {\n",
    "            'eps': '1e-8'  # not converting, because it's any\n",
    "        }\n",
    "    },\n",
    "    'num_epochs': [10, 20],  # union type\n",
    "    'batch_size': 10,\n",
    "    'fast_dev_run': True\n",
    "})\n",
    "\n",
    "config"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'optimizer': {'opt_type': 'adam',\n",
       "  'learning_rate': 0.1,\n",
       "  'momentum': 0.9,\n",
       "  'weight_decay': 0.0,\n",
       "  'grad_clip': None,\n",
       "  'betas': None,\n",
       "  'other_params': {'eps': '1e-8'}},\n",
       " 'num_epochs': [10, 20],\n",
       " 'batch_size': 10,\n",
       " 'fast_dev_run': True}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "config.asdict()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## From file\n",
    "\n",
    "To manage config files without touching Python code, we recommend saving the configs into files like JSON or YAML (users can even put them into a separate Python file if they want). The syntax of those YAML/JSON files are similar to those within [MMCV config](https://mmcv.readthedocs.io/en/latest/understand_mmcv/config.html). We recommend reading the tutorial, because some features can be very helpful, such as using `_base_` to inherit base config.\n",
    "\n",
    "Afterwards, config can be created via:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "optimizer:\n",
      "  opt_type: adam\n",
      "  learning_rate: 1e-3  # pyyaml loads it as a string, but it's okay because we have our converting\n",
      "  momentum: 0.9\n",
      "  weight_decay: 0\n",
      "  grad_clip: null\n",
      "  other_params:\n",
      "    eps: 1.0e-8  # converting doesn't help because the annotated type is any\n",
      "num_epochs: [10, 20]\n",
      "batch_size: 10\n",
      "fast_dev_run: true\n"
     ]
    }
   ],
   "source": [
    "! cat assets/config_trainer.yml"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TrainerConfig(optimizer=OptimizerConfig(opt_type=<OptimizerType.Adam: 'adam'>, learning_rate=0.001, momentum=0.9, weight_decay=0.0, grad_clip=None, betas=None, other_params={'eps': 1e-08}), num_epochs=[10, 20], batch_size=10, fast_dev_run=True)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "TrainerConfig.fromfile('assets/config_trainer.yml')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## From command line\n",
    "\n",
    "When debugging, it can be helpful to have some extra arguments which can be hot-updated at runtime, without changing any Python code or YAML code. To this end, we provide `XXXConfig.fromcli()`, which automatically generates a command line parser that accepts a base config as well as overriding arguments. The overriding arguments will override the base config."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "from enum import Enum\n",
      "from typing import Optional, Tuple, Union, List, Dict, Any\n",
      "from utilsd.config import configclass\n",
      "\n",
      "class OptimizerType(str, Enum):\n",
      "    SGD = 'sgd'\n",
      "    Adam = 'adam'\n",
      "\n",
      "@configclass\n",
      "class OptimizerConfig:\n",
      "    opt_type: OptimizerType\n",
      "    learning_rate: float\n",
      "    momentum: float\n",
      "    weight_decay: float\n",
      "    grad_clip: Optional[float]   # optional but must set, either set to none or a float\n",
      "    betas: Optional[Tuple[float, float]] = None\n",
      "    other_params: Optional[Dict[str, Any]] = None\n",
      "\n",
      "@configclass\n",
      "class TrainerConfig:\n",
      "    optimizer: OptimizerConfig\n",
      "    num_epochs: Union[int, List[int]]\n",
      "    batch_size: int\n",
      "    fast_dev_run: bool = False\n",
      "\n",
      "config = TrainerConfig.fromcli()\n",
      "print(config)\n"
     ]
    }
   ],
   "source": [
    "! cat assets/config_trainer.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "usage: config_trainer.py [--batch_size INTEGER] [--fast_dev_run BOOL]\n",
      "                         [--num_epochs JSON] [--num_epochs.0 INTEGER]\n",
      "                         [--num_epochs.1 INTEGER] [--optimizer JSON]\n",
      "                         [--optimizer.grad_clip FLOAT]\n",
      "                         [--optimizer.learning_rate FLOAT]\n",
      "                         [--optimizer.momentum FLOAT]\n",
      "                         [--optimizer.opt_type STRING]\n",
      "                         [--optimizer.other_params JSON]\n",
      "                         [--optimizer.weight_decay FLOAT] [-h]\n",
      "                         exp\n",
      "\n",
      "Command line auto-generated with utilsd.config. A path to base config file\n",
      "(like JSON/YAML) needs to be specified first. Then some extra arguments to\n",
      "override the fields in the base config. Please note the type of arguments\n",
      "(always use `-h` for reference): `JSON` type means the field accepts a `JSON`\n",
      "for overriding.\n",
      "\n",
      "positional arguments:\n",
      "  exp                   Experiment YAML file\n",
      "\n",
      "optional arguments:\n",
      "  --batch_size INTEGER\n",
      "  --fast_dev_run BOOL\n",
      "  --num_epochs JSON\n",
      "  --num_epochs.0 INTEGER\n",
      "  --num_epochs.1 INTEGER\n",
      "  --optimizer JSON\n",
      "  --optimizer.grad_clip FLOAT\n",
      "  --optimizer.learning_rate FLOAT\n",
      "  --optimizer.momentum FLOAT\n",
      "  --optimizer.opt_type STRING\n",
      "  --optimizer.other_params JSON\n",
      "  --optimizer.weight_decay FLOAT\n",
      "  -h, --help            Show this help message and exit\n"
     ]
    }
   ],
   "source": [
    "! python assets/config_trainer.py assets/config_trainer.yml -h"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It can be seen in the help message that, not only the primitives can be replaced, but also the lists and dicts. To replace them, try to write the object into a JSON."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TrainerConfig(optimizer=OptimizerConfig(opt_type=<OptimizerType.Adam: 'adam'>, learning_rate=0.001, momentum=0.9, weight_decay=0.0, grad_clip=None, betas=None, other_params={'eps': 1e-08}), num_epochs=[1, 2, 3], batch_size=10, fast_dev_run=True)\n"
     ]
    }
   ],
   "source": [
    "! python assets/config_trainer.py assets/config_trainer.yml --num_epochs \"[1, 2, 3]\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For advanced usages, please refer to API references of `utilsd.config`."
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "e719d61bf2f8783310605d7d33b4a79d3d3808a4b1e839bcbd4d25925ed7dae7"
  },
  "kernelspec": {
   "display_name": "Python 3.8.12 64-bit ('utilsd': conda)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}