# Puppet Function: cron_splay

- Defined in:
- puppet/modules/wmflib/lib/puppet/parser/functions/cron_splay.rb

- Function type:
- Ruby 3.x API

## Overview

Given an array of fqdn which a cron is applicable to, and a period arg which is one of 'hourly', 'daily', or 'weekly', this sorts the fqdn set with per-datacenter interleaving for DC-numbered hosts, splays them to fixed even intervals within the total period, and then outputs a set of crontab time fields for the fqdn currently being compiled-for.

The idea here is to ensure each host in the set executes the cron once per time period, and also ensure the time between hosts is consistent (no edge cases much closer than the average) by splaying them as evenly as possible with rounding errors. For the case of hosts with NNNN numbers indicating the datacenter in the first digit, we also maximize the period between any two hosts in a given datacenter by interleaving sorted per-DC lists of hosts before splaying.

The third and final argument is a static seed which modulates the splayed values in two different ways to minimize the effects of multiple cron_splay() with the same hostlist and period. It is used to select a determinstically random “offset” for the splayed time values (so that the first host doesn't always start at 00:00), and is also used to permute the order of the hosts within each DC uniquely.

**Examples:**

```
$times = fqdn_splay($hosts, 'weekly', 'foo-static-seed')
cron { 'foo':
minute => $times['minute'],
hour => $times['hour'],
weekday => $times['weekday'],
}
```

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
# File 'puppet/modules/wmflib/lib/puppet/parser/functions/cron_splay.rb', line 8 newfunction(:cron_splay, :type => :rvalue, :doc => <<-EOS Given an array of fqdn which a cron is applicable to, and a period arg which is one of 'hourly', 'daily', or 'weekly', this sorts the fqdn set with per-datacenter interleaving for DC-numbered hosts, splays them to fixed even intervals within the total period, and then outputs a set of crontab time fields for the fqdn currently being compiled-for. The idea here is to ensure each host in the set executes the cron once per time period, and also ensure the time between hosts is consistent (no edge cases much closer than the average) by splaying them as evenly as possible with rounding errors. For the case of hosts with NNNN numbers indicating the datacenter in the first digit, we also maximize the period between any two hosts in a given datacenter by interleaving sorted per-DC lists of hosts before splaying. The third and final argument is a static seed which modulates the splayed values in two different ways to minimize the effects of multiple cron_splay() with the same hostlist and period. It is used to select a determinstically random "offset" for the splayed time values (so that the first host doesn't always start at 00:00), and is also used to permute the order of the hosts within each DC uniquely. *Examples:* $times = fqdn_splay($hosts, 'weekly', 'foo-static-seed') cron { 'foo': minute => $times['minute'], hour => $times['hour'], weekday => $times['weekday'], } EOS ) do |arguments| unless arguments.size == 3 raise(Puppet::ParseError, "cron_splay(): Wrong number of arguments " + "given (#{arguments.size} for 3)") end hosts = arguments[0] period = arguments[1] seed = arguments[2] unless hosts.is_a?(Array) raise(Puppet::ParseError, 'cron_splay(): Argument 1 must be an array') end unless period.is_a?(String) raise(Puppet::ParseError, 'cron_splay(): Argument 2 must be an string') end unless seed.is_a?(String) raise(Puppet::ParseError, 'cron_splay(): Argument 3 must be an string') end case period when 'hourly' mins = 60 when 'daily' mins = 1440 when 'weekly' mins = 10080 else raise(Puppet::ParseError, 'cron_splay(): invalid period') end # Avoid this edge case for now. At sufficiently large host counts and # small period, randomization is probably better anyways. if hosts.length > mins raise(Puppet::ParseError, 'cron_splay(): too many hosts for period') end # split hosts into N lists based the first digit of /NNNN/, defaulting to zero sublists = [ [], [], [], [], [], [], [], [], [], [] ] for h in hosts match = /([1-9])[0-9]{3}/.match(h) if match sublists[match[1].to_i].push(h) else sublists[0].push(h) end end # sort each sublist into a determinstic order based on seed for s in sublists s.sort_by! { |x| Digest::MD5.hexdigest(seed + x) } end # interleave sublists into "ordered" longest = sublists.max_by(&:length) sublists -= [longest] ordered = longest.zip(*sublists).flatten.compact # find the index of this host in ordered this_idx = ordered.index(lookupvar('::fqdn')) if this_idx.nil? raise(Puppet::ParseError, 'cron_splay(): this host not in set') end # find the truncated-integer splayed value of this host tval = this_idx * mins / ordered.length # use the seed (again) to add a time offset to the splayed values, # the time offset never being larger than the splayed interval tval += Digest::MD5.hexdigest(seed).to_i(16) % (mins / ordered.length) # generate the output output = {} output['minute'] = tval % 60 if period == 'hourly' output['hour'] = '*' else output['hour'] = (tval / 60) % 24 end if period == 'weekly' output['weekday'] = tval / 1440 else output['weekday'] = '*' end return output end |