Introduction

Spicerack - Automation framework for the WMF production infrastructure

Spicerack provides an entry point to all the libraries needed to automate and orchestrate tasks inside the Wikimedia Foundation's (WMF) production infrastructure. It provides also an entry point script cookbook to list and run the available cookbooks, both one by one or via an interactive menu.

Cookbooks

The cookbooks are the user facing units of automation. There is a collection of cookbooks to automate and orchestrate operations in the WMF infrastructure. The cookbooks are executed by the cookbook binary provided by the spicerack package.

Cookbooks hierarchy

The cookbooks must be structured in a tree, as they can be run also from an interactive menu that shows the tree from an arbitrary entry point downwards.

Each cookbook filename must be a valid Python module name, hence all lowercase, with underscore if that improves readability and that doesn't start with a number.

Given that the cookbooks are imported dynamically, a broader set of characters like dashes and starting with a number are technically allowed and the current standard at WMF is to name the cookbooks with dashes instead of underscores.

Example of cookbooks tree:

cookbooks
|-- __init__.py
|-- top-level-cookbook.py
|-- group1
|   |-- __init__.py
|   `-- important-cookbook.py
`-- group2
    |-- __init__.py
    `-- subgroup1
        |-- __init__.py
        `-- some-task.py

API interfaces

Each cookbook must follow one of the two available API interfaces:

Class interface (preferred)
Module interface

Class interface

When using the class interface, you will need to define two classes:

one runner class that extends spicerack.cookbook.CookbookRunnerBase and implements the __init__, run and (optionally) rollback and runtime_description methods
one base class that extends spicerack.cookbook.CookbookBase and implements the argument_parser and get_runner methods.

Let's see how to implement a simple cookbook that depools a service from dns discovery. Our cookbook will accept three command-line arguments: the service name, the datacenter and the action to perform.

import argparse
from wmflib.constants import CORE_DATACENTERS
from spicerack.cookbook import CookbookRunnerBase, CookbookBase
class ServiceRouter(CookbookBase):
    def argument_parser(self) -> argparse.ArgumentParser:
        parser = super().argument_parser() # returns a bare ArgumentParser with the correct defaults
        parser.add_argument("service")
        parser.add_argument("datacenter", choice=CORE_DATACENTERS)
        parser.add_argument("action", choice=("pool", "depool"))
        return parser

    def get_runner(self, args) -> CookbookRunnerBase:
        return ServiceRouterRunner(args, self.spicerack)

Here, self.spicerack is a an initialized spicerack.Spicerack instance. Now we need to implement the class that will actually do the work. There are four methods to implement:

__init__(args, spicerack) to set up the properties of the class
runtime_description, returning a string that will be used to log the cookbook action to SAL
run that should contain the cookbook operations.
rollback an optional rollback method. Typically for a rollback method to work, you'll need to store some state in the class.

So here is our runner class:

class ServiceRouterRunner(CookbookRunnerBase):
    def __init__(self, args, spicerack):
        # args here is the result of CookbookBase.argument_parser().parse_args()
        self.service = args.service
        self.datacenter = args.datacenter
        self.action = args.action
        self.discovery = spicerack.discovery(self.service)
        # Save the initial state for eventual rollback
        state = self.discovery.active_datacenters
        self.was_pooled = self.datacenter in state[self.service]

    def runtime_description(self) -> str:
        return f"{self.action} service {self.service} in {self.datacenter}"

    def run(self):
        if self.action == "pool":
            self.discovery.pool(self.datacenter)
        else:
            self.discovery.depool(self.datacenter)

    def rollback(self):
        if self.was_pooled:
            self.discovery.pool(self.datacenter)
        else:
            self.discovery.depool(self.datacenter)

If the run method returns a non-zero exit code or raises any exception the optional rollback method will be called to allow the cookbook to perform any cleanup action. Any exception raised by the rollback method will be logged and the cookbook will exit with a reserved exit code.

The derived classes can have any name and multiple cookbooks in the same module are supported.

Module interface

A simple function-based API interface for the cookbooks in which each cookbook is a Python module that defines the following constants and functions.

cookbook-module.__title__

A module attribute that defines the cookbook title. It must be a single line string.

Type:: str

cookbook-module.MAX_CONCURRENCY

Optional module attribute that defines how many parallel runs of the cookbook are allowed. If not set the value defined in spicerack.cookbook.CookbookRunnerBase.max_concurrency will be used.

Type:: int

cookbook-module.LOCK_TTL

Optional module attribute that defines the concurrency lock time to live (TTL) in seconds. For each concurrent run a lock is acquired for this amount of seconds. If not set the value defined in spicerack.cookbook.CookbookRunnerBase.lock_ttl will be used.

Type:: int

cookbook-module.argument_parser() → argparse.ArgumentParser:

Optional module function to define if the cookbook should accept command line arguments.

If defined the returned argument parser will be used to parse the cookbook's arguments.

If not defined it means that the cookbook doesn't accept any argument and if called with arguments it's considered an error.

Cookbooks are encouraged to define an argument_parser() function so that an help message is automatically available with -h/--help and it can be shown both when running a cookbook directly or in the interactive menu.

Returns:: the argument parser instance.
Return type:: argparse.ArgumentParser

cookbook-module.run(args, spicerack)

Mandatory module function with the actual execution of the cookbook.

Parameters:

args (argparse.Namespace or None) -- the parsed arguments that were parsed using the defined argument_parser() module function or None if the cookbook doesn't support any argument.
spicerack (spicerack.Spicerack) -- the Spicerack accessor instance with which the cookbook can access all the Spicerack capabilities.

Returns:

the return code of the cookbook, it should be zero or None on success, a positive integer smaller than 128 and not in the range 90-99 (see Reserved exit codes) in case of failure.

Return type:

int or None

Logging

The logging is already pre-setup by the cookbook entry point script that initialize the root logger, so that each cookbook can just initialize its own logging instance and log.

A special logger to send notification to the #wikimedia-operations IRC channel with the !log prefix is also available through the spicerack argument, passed to the cookbook's run() function for the module API or available in the cookbook class as self.spicerack for the class API, in its sal_logger property. An additional irc_logger logger is also available to just write to the #wikimedia-operations IRC channel.

Both IRC loggers log to both IRC and the nomal log outputs of Spicerack. If the dry-run mode is set it does not log to IRC.

Log files

The log files can be found in /var/log/spicerack/${PATH_OF_THE_COOKBOOK} on the host where the cookbooks are run. All normal log messages are sent to two separate files, of which one always logs at DEBUG level even if -v/--verbose is not set. So for example running the cookbook foo.bar.baz will generate two log files:

/var/log/spicerack/foo/bar/baz.log  # with INFO and higher log levels
/var/log/spicerack/foo/bar/baz-extended.log  # with all log levels

If the cookbook is started with a directory of multiple cookbooks then the logs are all concentrated in the directory path ones:

/var/log/spicerack/foo/bar.log  # with INFO and higher log levels
/var/log/spicerack/foo/bar-extended.log  # with all log levels

Example

import logging

logger = logging.getLogger(__name__)

logger.info('message')  # this goes to stdout in the operator shell and is logged in both files.
logger.debug('message') # this goes to stdout in the operator shell only if -v/--verbose is set and is logged only
                        # in the extended file.

def run(args, spicerack):
    spicerack.irc_logger.info('message')  # This sends a message to the #wikimedia-operation IRC channel with:
                                          # !log user@host message

Spicerack library

All the available modules in the Spicerack package are exposed to the cookbooks through the spicerack instance injected in the cookbook. It offers helper methods to obtain initialized instances of all the available libraries. This instance exposes also some of the global CLI arguments parsed by the cookbook entry point script such as dry_run and verbose as getters. See spicerack.Spicerack for more details.

Exception handling

In general each module in the spicerack package has its own exception class to raise specific errors, and all of them are derived from the base class spicerack.exceptions.SpicerackError.

Reserved exit codes

Cookbook exit codes in the range 90-99 are reserved by Spicerack and must not be used by the cookbooks. The currently defined reserved exit codes are documented in the spicerack.cookbook module.

Distributed locking

Spicerack supports also distributed locking to prevent some actions from being executed multiple times in parallel in the environments with etcd configured. Each lock can be defined with arbitraty concurrency and TTL (time to live). That means that each lock can either be exclusive or allow a given number of parallel executions. The locks are saved in etcd.

The locking support can be globablly enabled/disabled via configuration file and can also be disabled on a given cookbook run via the --no-locks command line flag. This can be used in an emergency if unable to acquire locks or if there are issues with the locking backend.

Spicerack will automatically retry for half an hour to acquire a lock if there is no slot available for the given key and concurrency, listing which are the holders of the exiting locks for the same key in the form user@host [PID].

Example output in case of being unable to acquire the lock:

[1/27, retrying in 5.00s] Unable to acquire lock: {'concurrency': 1, 'created': '2023-10-19 12:52:06.006568', 'owner': 'user1@cumin2002 [249024]', 'ttl': 300} for key /spicerack/locks/cookbooks/sre.dns.netbox.
There are already 1 concurrent locks and the concurrency allowed is 1:
      2023-10-19 12:52:05.985199: user2@cumin1001 [340699]

There are three types of locks:

Spicerack locks: acquired by Spicerack modules around specific lines of code that are deemed critical and require a dedicated lock.
Cookbooks custom locks: locks created by the cookbooks using the Spicerack accessor spicerack.Spicerack.lock() around specific lines of code.
Automatic cookbook locks for each run: Spicerack acquires a lock for each cookbook run with the cookbook full name as key (e.g. sre.hosts.name). By default it uses the concurrency and TTL defined in spicerack.cookbook.CookbookRunnerBase.max_concurrency and spicerack.cookbook.CookbookRunnerBase.lock_ttl respectively. The cookbook can customize these parameters in two different ways:
- Static override: just overriding the max_concurrency and lock_ttl class properties in the cookbook runner class will make the lock be acquired with these parameters.
- Dynamic override: for a more in-depth customization, the cookbook runner class can override the spicerack.cookbook.CookbookRunnerBase.lock_args instance property to dynamically return a spicerack.cookbook.LockArgs instance based on any live argument. This way the cookbook can also provide a custom key suffix to use for the lock key, allowing to hold a different lock based on the use case. For example:
  - If the cookbook has a read-only (e.g. check, list, etc.) and a read-write (e.g. create, update, delete) mode of operation, it could set the max_concurrency to 0 when executed in read-only mode and to 1 or a very low value when executed in read-write mode.
  - If the cookbook targets a specific host/cluster it could use the host/cluster name as suffix so that the lock will be per-host/cluster. An unlimited concurrent runs of the cookbook can be made with different hosts/clusters but for example it could limit to only one concurrent run of the cookbook for any given host/cluster.:
    @property def lock_args(self): """Make the cookbook lock per-cluster.""" return LockArgs(suffix=self.cluster, concurrency=1, ttl=600)