{"id":279,"date":"2015-02-23T17:23:06","date_gmt":"2015-02-23T17:23:06","guid":{"rendered":"https:\/\/laur.ie\/blog\/?p=279"},"modified":"2015-02-23T17:23:06","modified_gmt":"2015-02-23T17:23:06","slug":"systemd-using-execstop-to-depool-nodes-for-fun-and-profit","status":"publish","type":"post","link":"https:\/\/laur.ie\/blog\/2015\/02\/systemd-using-execstop-to-depool-nodes-for-fun-and-profit\/","title":{"rendered":"systemd: Using ExecStop to depool nodes for fun and profit"},"content":{"rendered":"<p><em>Preface: There is a tonne of drama about systemd on the internets; it won&#8217;t take you long to find it, if you&#8217;re curious. Despite that, I&#8217;m largely a fan and focusing on all the cool stuff I can finally do as an ops person without basically re-writing crappy bash scripts for a living (cough sys-v init)\u00a0<\/em><\/p>\n<h2>Process Supervision<\/h2>\n<p>Without going into the basics about systemd too much (I quite enjoy <a href=\"http:\/\/blog.jorgenschaefer.de\/2014\/07\/why-systemd.html\">this post<\/a> as an intro), you tell systemd to run your executable using the &#8220;<code>ExecStart<\/code>&#8221; part of the config, and it will go and run that command and make sure it keeps running. Wonderful! In this case, we wanted to keep <a href=\"http:\/\/hhvm.com\/\">HHVM<\/a> running all the time, so we told systemd to do it, in 3 lines. Waaaay easier than sys-v init.<\/p>\n<h2>ExecStop<\/h2>\n<p>By default when you tell systemd to stop a process, and you haven&#8217;t told it\u00a0<strong>how<\/strong> to stop the process, it&#8217;s just going to gracefully kill the process and any other processes it spawned.<\/p>\n<p>However, there is also the <code>ExecStop<\/code> configuration option that will be executed\u00a0<strong>before<\/strong> systemd kills your processes, adding a new &#8220;deactivating&#8221; step to the process. It takes any executable name (or many) as an argument, so you can abuse this to do literally anything as cleanup before your processes get killed.<\/p>\n<p>Systemd will also continue to do it&#8217;s regular killing of processes if by the end of running your <code>ExecStop<\/code> script the processes are not all dead.<\/p>\n<h2>Load balancer health checks<\/h2>\n<p>We have a load balancer that uses a bunch of health checks to ensure that the node that it&#8217;s asking to do work can actually still do work before it sends it there.<\/p>\n<p>One of these is hitting an HTTP endpoint we set up, let&#8217;s call it &#8220;<code>status.php<\/code>&#8221; which just contains the text &#8220;Status:OK&#8221;. This way, if the server dies, or PHP breaks, or Apache breaks, that node will be automatically depooled and we don&#8217;t serve garbage to the user. Yay!<\/p>\n<h2>Example: automatic depooling using ExecStop<\/h2>\n<p>Armed with my new <code>ExecStop<\/code> super power, I realised we were able to let the load balancer know this node was no longer available\u00a0<strong>before<\/strong> killing the process.<\/p>\n<p>I wrote a simple bash script that:<\/p>\n<ul>\n<li>Moves the status.php file to status.php.disabled<\/li>\n<li>Starts pinging the built in HHVM &#8220;load&#8221; endpoint (which tells you how many requests are in flight in HHVM) to see if the load has hit 0<\/li>\n<li>if the curl to the &#8220;load&#8221; endpoint fails, we try again after 1 second<\/li>\n<li>If we hit 30 seconds and the load isn&#8217;t 0 or we still can&#8217;t reach the endpoint, we just carry on anyway; something is wrong.<\/li>\n<li>Once the load is &#8220;0&#8221;, we can continue<\/li>\n<li>use `pidof` to kill the HHVM process<\/li>\n<li>Move status.php.disabled back to status.php<\/li>\n<\/ul>\n<p>And now, i can reference this in our HHVM systemd unit file:<\/p>\n<pre id=\"LC1\" class=\"line\">[Unit]\r\nDescription=HHVM HipHop Virtual Machine (FCGI)\r\n\r\n[Service]\r\nRestart=always\r\nExecStart=\/usr\/bin\/hhvm -c &lt;snip&gt;\r\nExecStop=\/usr\/local\/bin\/hhvm_stop.sh<\/pre>\n<p>Now when I call <code>service hhvm stop<\/code>, it takes 6-10 seconds for the stop to complete, because the traffic is gracefully removed.<\/p>\n<h2>Logging<\/h2>\n<p>Another thing I personally love about systemd, is the increase visibility the operator gets about what&#8217;s going on. In sys-v, if you&#8217;re lucky, someone put a &#8220;status&#8221; action in their bash script and it might tell you if the pid exists.<\/p>\n<p>In systemd, you get a tonne of information about what&#8217;s going on; the processes that have been launched (including child processes), the PIDs, logs associated with that process, and in the case of something like Apache, the process can report information back:<\/p>\n<figure style=\"width: 1329px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/laur.ie\/grb\/7s-h44efoq3gg0o4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"https:\/\/laur.ie\/grb\/7s-h44efoq3gg0o4.png\" alt=\"\" width=\"1329\" height=\"278\" \/><\/a><figcaption class=\"wp-caption-text\">Apache systemd status output showing requests per second<\/figcaption><\/figure>\n<p>In this case, our ExecStop script output gets shown when you look at the status output of systemd:<\/p>\n<pre class=\"p1\">[root@hhvm01 ~]# systemctl status hhvm -l\r\nhhvm.service - HHVM HipHop Virtual Machine (FCGI)\r\n\u00a0 \u00a0Loaded: loaded (\/usr\/lib\/systemd\/system\/hhvm.service; enabled)\r\n\u00a0 \u00a0Active: <b>inactive (dead)<\/b> since Tue 2015-02-17 22:00:52 UTC; 48s ago\r\n\u00a0 Process: 23889 <b>ExecStop=\/usr\/local\/bin\/hhvm_stop.sh (code=exited, status=0\/SUCCESS<\/b>)\r\n\u00a0 Process: 37601 ExecStart=\/usr\/bin\/hhvm &lt;snip&gt; (code=killed, signal=TERM)\r\n\u00a0 Main PID: 37601 (code=killed, signal=TERM)\r\n\r\nFeb 17 22:00:45 hhvm01 hhvm_stop.sh[23889]: <strong>Moving status.php to status.php.disabled<\/strong>\r\nFeb 17 22:00:47 hhvm01 hvm_stop.sh[23889]: <b>Waiting another second (currently up to 8) because the load is still 16\r\n<\/b>Feb 17 22:00:48 hhvm01 hhvm_stop.sh[23889]: Waiting another second (currently up to 9) because the load is still 10\r\nFeb 17 22:00:49 hhvm01 hhvm_stop.sh[23889]: Waiting another second (currently up to 10) because the load is still 10\r\nFeb 17 22:00:50 hhvm01 hhvm_stop.sh[23889]: <b>Load was 0 after 11 seconds, now we can kill HHVM.\r\n<\/b>Feb 17 22:00:50 hhvm01 hhvm_stop.sh[23889]: <b>Killing HHVM\r\n<\/b>Feb 17 22:00:52 hhvm01 hhvm_stop.sh[23889]: Flipping status.php.disabled to status.php\r\nFeb 17 22:00:52 hhvm01 systemd[1]: Stopped HHVM HipHop Virtual Machine (FCGI).<\/pre>\n<p>Now all the information about what happened during the ExecStop process is captured for debugging later! No more having no idea what happened during the shut down.<\/p>\n<p>When the script is in the process of running, the <code>systemd status<\/code> output will show as &#8220;deactivating&#8221; so you know it&#8217;s still ongoing.<\/p>\n<p>&nbsp;<\/p>\n<h2>Summary<\/h2>\n<p>This is just one example of how you might use\/abuse the <code>ExecStop<\/code> to do work before killing processes. Whilst this was technically possible before, IMO the ease of use and the added introspection means this is actually feasible for production systems.<\/p>\n<p>I&#8217;ve gisted a copy of the script <a href=\"https:\/\/gist.github.com\/lozzd\/adce5651587538ca034b\">here<\/a>, if you want to steal it and modify it for your own use.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Preface: There is a tonne of drama about systemd on the internets; it won&#8217;t take you long to find it, if you&#8217;re curious. Despite that, I&#8217;m largely a fan and focusing on all the cool stuff I can finally do as an ops person without basically re-writing crappy bash scripts for a living (cough sys-v &hellip; <a href=\"https:\/\/laur.ie\/blog\/2015\/02\/systemd-using-execstop-to-depool-nodes-for-fun-and-profit\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">systemd: Using ExecStop to depool nodes for fun and profit<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3,5],"tags":[],"class_list":["post-279","post","type-post","status-publish","format-standard","hentry","category-technology","category-work"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pioRW-4v","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/laur.ie\/blog\/wp-json\/wp\/v2\/posts\/279","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/laur.ie\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/laur.ie\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/laur.ie\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/laur.ie\/blog\/wp-json\/wp\/v2\/comments?post=279"}],"version-history":[{"count":1,"href":"https:\/\/laur.ie\/blog\/wp-json\/wp\/v2\/posts\/279\/revisions"}],"predecessor-version":[{"id":280,"href":"https:\/\/laur.ie\/blog\/wp-json\/wp\/v2\/posts\/279\/revisions\/280"}],"wp:attachment":[{"href":"https:\/\/laur.ie\/blog\/wp-json\/wp\/v2\/media?parent=279"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/laur.ie\/blog\/wp-json\/wp\/v2\/categories?post=279"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/laur.ie\/blog\/wp-json\/wp\/v2\/tags?post=279"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}