Introduction ============ .. include:: ../../README.rst Cookbooks --------- The cookbooks are the user facing units of automation. There is a collection of cookbooks to automate and orchestrate operations in the WMF infrastructure. The cookbooks are executed by the ``cookbook`` binary provided by the ``spicerack`` package. Cookbooks hierarchy ^^^^^^^^^^^^^^^^^^^ The cookbooks must be structured in a tree, as they can be run also from an interactive menu that shows the tree from an arbitrary entry point downwards. Each cookbook filename must be a valid Python module name, hence all lowercase, with underscore if that improves readability and that doesn't start with a number. Given that the cookbooks are imported dynamically, a broader set of characters like dashes and starting with a number are technically allowed and the current standard at WMF is to name the cookbooks with dashes instead of underscores. Example of cookbooks tree:: cookbooks |-- __init__.py |-- top-level-cookbook.py |-- group1 | |-- __init__.py | `-- important-cookbook.py `-- group2 |-- __init__.py `-- subgroup1 |-- __init__.py `-- some-task.py API interfaces ^^^^^^^^^^^^^^ Each cookbook must follow one of the two available API interfaces: * `Class interface`_ (preferred) * `Module interface`_ Class interface """"""""""""""" When using the class interface, you will need to define two classes: * one *runner* class that extends :py:class:`spicerack.cookbook.CookbookRunnerBase` and implements the ``__init__``, ``run`` and (optionally) ``rollback`` and ``runtime_description`` methods * one *base* class that extends :py:class:`spicerack.cookbook.CookbookBase` and implements the ``argument_parser`` and ``get_runner`` methods. Let's see how to implement a simple cookbook that depools a service from dns discovery. Our cookbook will accept three command-line arguments: the service name, the datacenter and the action to perform. :: import argparse from wmflib.constants import CORE_DATACENTERS from spicerack.cookbook import CookbookRunnerBase, CookbookBase class ServiceRouter(CookbookBase): def argument_parser(self) -> argparse.ArgumentParser: parser = super().argument_parser() # returns a bare ArgumentParser with the correct defaults parser.add_argument("service") parser.add_argument("datacenter", choice=CORE_DATACENTERS) parser.add_argument("action", choice=("pool", "depool")) return parser def get_runner(self, args) -> CookbookRunnerBase: return ServiceRouterRunner(args, self.spicerack) Here, `self.spicerack` is a an initialized :py:class:`spicerack.Spicerack` instance. Now we need to implement the class that will actually do the work. There are four methods to implement: * `__init__(args, spicerack)` to set up the properties of the class * `runtime_description`, returning a string that will be used to log the cookbook action to SAL * `run` that should contain the cookbook operations. * `rollback` an optional rollback method. Typically for a rollback method to work, you'll need to store some state in the class. So here is our *runner* class: :: class ServiceRouterRunner(CookbookRunnerBase): def __init__(args, spicerack): # args here is the result of CookbookBase.argument_parser().parse_args() self.service = args.service self.datacenter = args.datacenter self.action = args.action self.discovery = spicerack.discovery(self.service) # Save the initial state for eventual rollback state = self.discovery.active_datacenters self.was_pooled = self.datacenter in state[self.service] def runtime_description(self) -> str: return f"{self.action} service {self.service} in {self.datacenter}" def run(self): if self.action == "pool": self.discovery.pool(self.datacenter) else: self.discovery.depool(self.datacenter) def rollback(self): if self.was_pooled: self.discovery.pool(self.datacenter) else: self.discovery.depool(self.datacenter) If the ``run`` method returns a non-zero exit code or raises any exception the optional ``rollback`` method will be called to allow the cookbook to perform any cleanup action. Any exception raised by the ``rollback`` method will be logged and the cookbook will exit with a reserved exit code. The derived classes can have any name and multiple cookbooks in the same module are supported. Module interface """""""""""""""" A simple function-based API interface for the cookbooks in which each cookbook is a Python module that defines the following constants and functions. .. module:: cookbook-module .. attribute:: __title__ A module attribute that defines the cookbook title. It must be a single line string. :type: str .. attribute:: MAX_CONCURRENCY Optional module attribute that defines how many parallel runs of the cookbook are allowed. If not set the value defined in :py:attr:`spicerack.cookbook.CookbookRunnerBase.max_concurrency` will be used. :type: int .. attribute:: LOCK_TTL Optional module attribute that defines the concurrency lock time to live (TTL) in seconds. For each concurrent run a lock is acquired for this amount of seconds. If not set the value defined in :py:attr:`spicerack.cookbook.CookbookRunnerBase.lock_ttl` will be used. :type: int .. function:: argument_parser() -> argparse.ArgumentParser: Optional module function to define if the cookbook should accept command line arguments. If defined the returned argument parser will be used to parse the cookbook's arguments. If not defined it means that the cookbook doesn't accept any argument and if called with arguments it's considered an error. Cookbooks are encouraged to define an ``argument_parser()`` function so that an help message is automatically available with ``-h/--help`` and it can be shown both when running a cookbook directly or in the interactive menu. :returns: the argument parser instance. :rtype: argparse.ArgumentParser .. function:: run(args, spicerack) Mandatory module function with the actual execution of the cookbook. :param args: the parsed arguments that were parsed using the defined ``argument_parser()`` module function or :py:data:`None` if the cookbook doesn't support any argument. :type args: argparse.Namespace or None :param spicerack: the Spicerack accessor instance with which the cookbook can access all the Spicerack capabilities. :type spicerack: spicerack.Spicerack :returns: the return code of the cookbook, it should be zero or :py:data:`None` on success, a positive integer smaller than ``128`` and not in the range ``90-99`` (see :ref:`Reserved exit codes`) in case of failure. :rtype: int or None Logging ^^^^^^^ The logging is already pre-setup by the ``cookbook`` entry point script that initialize the root logger, so that each cookbook can just initialize its own :py:mod:`logging` instance and log. A special logger to send notification to the ``#wikimedia-operations`` IRC channel with the ``!log`` prefix is also available through the ``spicerack`` argument, passed to the cookbook's ``run()`` function for the module API or available in the cookbook class as ``self.spicerack`` for the class API, in its ``sal_logger`` property. An additional ``irc_logger`` logger is also available to just write to the ``#wikimedia-operations`` IRC channel. Both IRC loggers log to both IRC and the nomal log outputs of Spicerack. If the dry-run mode is set it does not log to IRC. Log files """"""""" The log files can be found in ``/var/log/spicerack/${PATH_OF_THE_COOKBOOK}`` on the host where the cookbooks are run. All normal log messages are sent to two separate files, of which one always logs at ``DEBUG`` level even if ``-v/--verbose`` is not set. So for example running the cookbook ``foo.bar.baz`` will generate two log files:: /var/log/spicerack/foo/bar/baz.log # with INFO and higher log levels /var/log/spicerack/foo/bar/baz-extended.log # with all log levels If the cookbook is started with a directory of multiple cookbooks then the logs are all concentrated in the directory path ones:: /var/log/spicerack/foo/bar.log # with INFO and higher log levels /var/log/spicerack/foo/bar-extended.log # with all log levels Example """"""" :: import logging logger = logging.getLogger(__name__) logger.info('message') # this goes to stdout in the operator shell and is logged in both files. logger.debug('message') # this goes to stdout in the operator shell only if -v/--verbose is set and is logged only # in the extended file. def run(args, spicerack): spicerack.irc_logger.info('message') # This sends a message to the #wikimedia-operation IRC channel with: # !log user@host message Spicerack library ^^^^^^^^^^^^^^^^^ All the available modules in the Spicerack package are exposed to the cookbooks through the ``spicerack`` instance injected in the cookbook. It offers helper methods to obtain initialized instances of all the available libraries. This instance exposes also some of the global CLI arguments parsed by the ``cookbook`` entry point script such as ``dry_run`` and ``verbose`` as getters. See :py:class:`spicerack.Spicerack` for more details. Exception handling ^^^^^^^^^^^^^^^^^^ In general each module in the :py:mod:`spicerack` package has its own exception class to raise specific errors, and all of them are derived from the base class :py:class:`spicerack.exceptions.SpicerackError`. .. _reserved-codes: Reserved exit codes ^^^^^^^^^^^^^^^^^^^ Cookbook exit codes in the range ``90-99`` are reserved by Spicerack and must not be used by the cookbooks. The currently defined reserved exit codes are documented in the :py:mod:`spicerack.cookbook` module. .. _distributed-locking: Distributed locking ^^^^^^^^^^^^^^^^^^^ Spicerack supports also distributed locking to prevent some actions from being executed multiple times in parallel in the environments with etcd configured. Each lock can be defined with arbitraty concurrency and TTL (time to live). That means that each lock can either be exclusive or allow a given number of parallel executions. The locks are saved in etcd. The locking support can be globablly enabled/disabled via configuration file and can also be disabled on a given cookbook run via the ``--no-locks`` command line flag. This can be used in an emergency if unable to acquire locks or if there are issues with the locking backend. Spicerack will automatically retry for half an hour to acquire a lock if there is no slot available for the given key and concurrency, listing which are the holders of the exiting locks for the same key in the form ``user@host [PID]``. Example output in case of being unable to acquire the lock:: [1/27, retrying in 5.00s] Unable to acquire lock: {'concurrency': 1, 'created': '2023-10-19 12:52:06.006568', 'owner': 'user1@cumin2002 [249024]', 'ttl': 300} for key /spicerack/locks/cookbooks/sre.dns.netbox. There are already 1 concurrent locks and the concurrency allowed is 1: 2023-10-19 12:52:05.985199: user2@cumin1001 [340699] There are three types of locks: * **Spicerack locks**: acquired by Spicerack modules around specific lines of code that are deemed critical and require a dedicated lock. * **Cookbooks custom locks**: locks created by the cookbooks using the Spicerack accessor :py:meth:`spicerack.Spicerack.lock` around specific lines of code. * **Automatic cookbook locks for each run**: Spicerack acquires a lock for each cookbook run with the cookbook full name as key (e.g. ``sre.hosts.name``). By default it uses the concurrency and TTL defined in :py:attr:`spicerack.cookbook.CookbookRunnerBase.max_concurrency` and :py:attr:`spicerack.cookbook.CookbookRunnerBase.lock_ttl` respectively. The cookbook can customize these parameters in two different ways: * **Static override**: just overriding the ``max_concurrency`` and ``lock_ttl`` class properties in the cookbook runner class will make the lock be acquired with these parameters. * **Dynamic override**: for a more in-depth customization, the cookbook runner class can override the :py:attr:`spicerack.cookbook.CookbookRunnerBase.lock_args` instance property to dynamically return a :py:attr:`spicerack.cookbook.LockArgs` instance based on any live argument. This way the cookbook can also provide a custom key suffix to use for the lock key, allowing to hold a different lock based on the use case. For example: * If the cookbook has a read-only (e.g. check, list, etc.) and a read-write (e.g. create, update, delete) mode of operation, it could set the ``max_concurrency`` to ``0`` when executed in read-only mode and to ``1`` or a very low value when executed in read-write mode. * If the cookbook targets a specific host/cluster it could use the host/cluster name as suffix so that the lock will be per-host/cluster. An unlimited concurrent runs of the cookbook can be made with different hosts/clusters but for example it could limit to only one concurrent run of the cookbook for any given host/cluster.:: @property def lock_args(self): """Make the cookbook lock per-cluster.""" return LockArgs(suffix=self.cluster, concurrency=1, ttl=600)