6. Diagnosing System Problems and OutagesΒΆ

This chapter covers using dimscli as a distributed shell for diagnosing problems throughout a DIMS deployment.

Ansible has two primary CLI programs, ansible and ansible-playbook. Both of these programs are passed a set of hosts on which they are to operate using an Inventory.

Note

Read about Ansible and how it is used by the DIMS project in Section ansibleplaybooks:ansiblefundamentals of ansibleplaybooks:ansibleplaybooks.

[dimsenv] dittrich@dimsdemo1:~/dims/git/python-dimscli (develop*) $ cat complete_inventory
[all]
floyd2-p.prisem.washington.edu
foswiki-int.prisem.washington.edu
git.prisem.washington.edu
hub.prisem.washington.edu
jenkins-int.prisem.washington.edu
jira-int.prisem.washington.edu
lapp-int.prisem.washington.edu
lapp.prisem.washington.edu
linda-vm1.prisem.washington.edu
rabbitmq.prisem.washington.edu
sso.prisem.washington.edu
time.prisem.washington.edu
u12-dev-svr-1.prisem.washington.edu
u12-dev-ws-1.prisem.washington.edu
wellington.prisem.washington.edu

Using this inventory, the modules command and shell can be used to run commands as needed to diagnose all of these hosts at once.

[dimsenv] dittrich@dimsdemo1:~/dims/git/python-dimscli (develop*) $ dimscli ansible command --program "uptime" --inventory complete_inventory --remote-port 8422 --remote-user dittrich
+-------------------------------------+--------+-------------------------------------------------------------------------+
| Host                                | Status | Results                                                                 |
+-------------------------------------+--------+-------------------------------------------------------------------------+
| rabbitmq.prisem.washington.edu      | GOOD   |  22:07:53 up 33 days,  4:32,  1 user,  load average: 0.07, 0.13, 0.09   |
| wellington.prisem.washington.edu    | GOOD   |  22:07:57 up 159 days, 12:16,  1 user,  load average: 1.16, 0.86, 0.58  |
| linda-vm1.prisem.washington.edu     | GOOD   |  22:07:54 up 159 days, 12:03,  1 user,  load average: 0.00, 0.01, 0.05  |
| git.prisem.washington.edu           | GOOD   |  22:07:54 up 159 days, 12:03,  2 users,  load average: 0.00, 0.01, 0.05 |
| time.prisem.washington.edu          | GOOD   |  22:07:55 up 33 days,  4:33,  2 users,  load average: 0.01, 0.07, 0.12  |
| jenkins-int.prisem.washington.edu   | GOOD   |  22:07:55 up 159 days, 12:03,  1 user,  load average: 0.00, 0.01, 0.05  |
| u12-dev-ws-1.prisem.washington.edu  | GOOD   |  22:07:56 up 159 days, 12:03,  1 user,  load average: 0.00, 0.02, 0.05  |
| sso.prisem.washington.edu           | GOOD   |  22:07:56 up 159 days, 12:03,  1 user,  load average: 0.00, 0.01, 0.05  |
| lapp-int.prisem.washington.edu      | GOOD   |  22:07:54 up 159 days, 12:04,  2 users,  load average: 0.00, 0.01, 0.05 |
| foswiki-int.prisem.washington.edu   | GOOD   |  22:07:55 up 159 days, 12:04,  1 user,  load average: 0.00, 0.01, 0.05  |
| u12-dev-svr-1.prisem.washington.edu | GOOD   |  22:07:59 up 155 days, 14:56,  1 user,  load average: 0.05, 0.08, 0.06  |
| hub.prisem.washington.edu           | GOOD   |  06:07:53 up 141 days, 12:19,  1 user,  load average: 0.08, 0.03, 0.05  |
| floyd2-p.prisem.washington.edu      | GOOD   |  22:07:53 up 33 days,  4:32,  1 user,  load average: 0.00, 0.01, 0.05   |
| jira-int.prisem.washington.edu      | GOOD   |  22:07:54 up 159 days, 12:03,  2 users,  load average: 0.00, 0.01, 0.05 |
| lapp.prisem.washington.edu          | GOOD   |  22:07:54 up 159 days, 12:04,  2 users,  load average: 0.00, 0.01, 0.05 |
+-------------------------------------+--------+-------------------------------------------------------------------------+
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
 To: dims-devops@uw.ops-trust.net
 From: Jenkins <dims@eclipse.prisem.washington.edu>
 Subject: [dims devops] [Jenkins] [FAILURE] jenkins-update-cifbulk-server-develop-16
 Date: Thu Jan 14 20:35:21 PST 2016
 Message-ID: <20160115043521.C7D5E1C004F@jenkins>

 Started by an SCM change
 [EnvInject] - Loading node environment variables.
 Building in workspace /var/lib/jenkins/jobs/update-cifbulk-server-develop/workspace

 Deleting project workspace... done

 [ssh-agent] Using credentials ansible (Ansible user ssh key - root)
 [ssh-agent] Looking for ssh-agent implementation...
 [ssh-agent]   Java/JNR ssh-agent
 [ssh-agent] Started.

  ...

 TASK: [cifbulk-server | Make config change available and restart if updating existing] ***
 <rabbitmq.prisem.washington.edu> REMOTE_MODULE command . /opt/dims/envs/dimsenv/bin/activate && supervisorctl -c /etc/supervisord.conf reread #USE_SHELL
 failed: [rabbitmq.prisem.washington.edu] => (item=reread) => {"changed": true, "cmd": ". /opt/dims/envs/dimsenv/bin/activate && supervisorctl -c /etc/supervisord.conf reread", "delta": "0:00:00.229614", "end": "2016-01-14 20:34:49.409784", "item": "reread", "rc": 2, "start": "2016-01-14 20:34:49.180170"}
 stderr: Error: could not find config file /etc/supervisord.conf
 For help, use /usr/bin/supervisorctl -h
 <rabbitmq.prisem.washington.edu> REMOTE_MODULE command . /opt/dims/envs/dimsenv/bin/activate && supervisorctl -c /etc/supervisord.conf update #USE_SHELL
 failed: [rabbitmq.prisem.washington.edu] => (item=update) => {"changed": true, "cmd": ". /opt/dims/envs/dimsenv/bin/activate && supervisorctl -c /etc/supervisord.conf update", "delta": "0:00:00.235882", "end": "2016-01-14 20:34:50.097224", "item": "update", "rc": 2, "start": "2016-01-14 20:34:49.861342"}
 stderr: Error: could not find config file /etc/supervisord.conf
 For help, use /usr/bin/supervisorctl -h

 FATAL: all hosts have already failed -- aborting

 PLAY RECAP ********************************************************************
            to retry, use: --limit @/var/lib/jenkins/cifbulk-server-configure.retry

 rabbitmq.prisem.washington.edu : ok=11   changed=4    unreachable=0    failed=1

 Build step 'Execute shell' marked build as failure
 [ssh-agent] Stopped.
 Warning: you have no plugins providing access control for builds, so falling back to legacy behavior of permitting any downstream builds to be triggered
 Finished: FAILURE
 --
 [[ UW/DIMS ]]: All message content remains the property of the author
 and must not be forwarded or redistributed without explicit permission.
[dimsenv] dittrich@dimsdemo1:~/dims/git/ansible-playbooks (develop*) $ grep -r supervisord.conf
roles/supervisor-install/tasks/main.yml:  template: "src=supervisord.conf.j2 dest={{ dims_supervisord_conf }} owner=root group=root"
roles/supervisor-install/tasks/main.yml:  file: path=/etc/dims-supervisord.conf state=absent
roles/supervisor-install/templates/supervisor.j2:DAEMON_OPTS="-c {{ dims_supervisord_conf }} $DAEMON_OPTS"
roles/cifbulk-server/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} {{ item }}"
roles/cifbulk-server/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} start {{ name_base }}:"
roles/prisem-scripts-deploy/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} restart {{ item }}:"
roles/anon-server/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} {{ item }}"
roles/anon-server/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} start {{ name_base }}:"
roles/consul-install/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} remove {{ consul_basename }}"
roles/consul-install/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} {{ item }}"
roles/consul-install/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} start {{ consul_basename }}:"
roles/crosscor-server/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} {{ item }}"
roles/crosscor-server/tasks/main.yml:  shell: ". {{ dimsenv_activate }} && supervisorctl -c {{ dims_supervisord_conf }} start {{ name_base }}:"
group_vars/all:dims_supervisord_conf: '/etc/supervisord.conf'
[dimsenv] dittrich@dimsdemo1:~/dims/git/python-dimscli (develop*) $ dimscli ansible shell --program "find /etc -name supervisord.conf" --inventory complete_inventory --remote-port 8422 --remote-u
ser dittrich
+-------------------------------------+--------+----------------------------------+
| Host                                | Status | Results                          |
+-------------------------------------+--------+----------------------------------+
| rabbitmq.prisem.washington.edu      | GOOD   | /etc/supervisor/supervisord.conf |
| wellington.prisem.washington.edu    | GOOD   |                                  |
| hub.prisem.washington.edu           | GOOD   |                                  |
| git.prisem.washington.edu           | GOOD   | /etc/supervisor/supervisord.conf |
| u12-dev-ws-1.prisem.washington.edu  | GOOD   |                                  |
| sso.prisem.washington.edu           | GOOD   |                                  |
| jenkins-int.prisem.washington.edu   | GOOD   | /etc/supervisor/supervisord.conf |
| foswiki-int.prisem.washington.edu   | GOOD   |                                  |
| lapp-int.prisem.washington.edu      | GOOD   |                                  |
| u12-dev-svr-1.prisem.washington.edu | GOOD   | /etc/supervisor/supervisord.conf |
| linda-vm1.prisem.washington.edu     | GOOD   |                                  |
| lapp.prisem.washington.edu          | GOOD   |                                  |
| floyd2-p.prisem.washington.edu      | GOOD   |                                  |
| jira-int.prisem.washington.edu      | GOOD   | /etc/supervisor/supervisord.conf |
| time.prisem.washington.edu          | GOOD   |                                  |
+-------------------------------------+--------+----------------------------------+
[dimsenv] dittrich@dimsdemo1:~/dims/git/python-dimscli (develop*) $ dimscli ansible shell --program "find /etc -name '*supervisor'*" --inventory complete_inventory --remote-port 8422 --remote-use
r dittrich
+-------------------------------------+--------+-------------------------------------------------+
| Host                                | Status | Results                                         |
+-------------------------------------+--------+-------------------------------------------------+
| rabbitmq.prisem.washington.edu      | GOOD   | /etc/rc0.d/K20supervisor                        |
|                                     |        | /etc/rc3.d/S20supervisor                        |
|                                     |        | /etc/rc1.d/K20supervisor                        |
|                                     |        | /etc/default/supervisor                         |
|                                     |        | /etc/rc2.d/S20supervisor                        |
|                                     |        | /etc/rc6.d/K20supervisor                        |
|                                     |        | /etc/supervisor                                 |
|                                     |        | /etc/supervisor/supervisord.conf.20140214204135 |
|                                     |        | /etc/supervisor/supervisord.conf.20140214200547 |
|                                     |        | /etc/supervisor/supervisord.conf.20140616162335 |
|                                     |        | /etc/supervisor/supervisord.conf.20140814132409 |
|                                     |        | /etc/supervisor/supervisord.conf.20140616162451 |
|                                     |        | /etc/supervisor/supervisord.conf.20140616162248 |
|                                     |        | /etc/supervisor/supervisord.conf.20140131230939 |
|                                     |        | /etc/supervisor/supervisord.conf.20140222154901 |
|                                     |        | /etc/supervisor/supervisord.conf.20140214194415 |
|                                     |        | /etc/supervisor/supervisord.conf.20140222155042 |
|                                     |        | /etc/supervisor/supervisord.conf.20150208174308 |
|                                     |        | /etc/supervisor/supervisord.conf.20140814132717 |
|                                     |        | /etc/supervisor/supervisord.conf.20140215134451 |
|                                     |        | /etc/supervisor/supervisord.conf.20150208174742 |
|                                     |        | /etc/supervisor/supervisord.conf.20140911193305 |
|                                     |        | /etc/supervisor/supervisord.conf.20140219200951 |
|                                     |        | /etc/supervisor/supervisord.conf.20140911202633 |
|                                     |        | /etc/supervisor/supervisord.conf                |
|                                     |        | /etc/supervisor/supervisord.conf.20140222154751 |
|                                     |        | /etc/supervisor/supervisord.conf.20150208174403 |
|                                     |        | /etc/supervisor/supervisord.conf.20140814132351 |
|                                     |        | /etc/supervisor/supervisord.conf.20140814132759 |
|                                     |        | /etc/rc4.d/S20supervisor                        |
|                                     |        | /etc/init.d/supervisor                          |
|                                     |        | /etc/rc5.d/S20supervisor                        |
| wellington.prisem.washington.edu    | GOOD   |                                                 |
| linda-vm1.prisem.washington.edu     | GOOD   | /etc/rc0.d/K20supervisor                        |
|                                     |        | /etc/rc3.d/S20supervisor                        |
|                                     |        | /etc/rc1.d/K20supervisor                        |
|                                     |        | /etc/rc2.d/S20supervisor                        |
|                                     |        | /etc/rc6.d/K20supervisor                        |
|                                     |        | /etc/supervisor                                 |
|                                     |        | /etc/rc4.d/S20supervisor                        |
|                                     |        | /etc/dims-supervisord.conf                      |
|                                     |        | /etc/init.d/supervisor                          |
|                                     |        | /etc/rc5.d/S20supervisor                        |
| git.prisem.washington.edu           | GOOD   | /etc/rc0.d/K20supervisor                        |
|                                     |        | /etc/rc3.d/S20supervisor                        |
|                                     |        | /etc/rc1.d/K20supervisor                        |
|                                     |        | /etc/default/supervisor                         |
|                                     |        | /etc/rc2.d/S20supervisor                        |
|                                     |        | /etc/rc6.d/K20supervisor                        |
|                                     |        | /etc/supervisor                                 |
|                                     |        | /etc/supervisor/supervisord.conf                |
|                                     |        | /etc/rc4.d/S20supervisor                        |
|                                     |        | /etc/init.d/supervisor                          |
|                                     |        | /etc/rc5.d/S20supervisor                        |
| time.prisem.washington.edu          | GOOD   |                                                 |
| jenkins-int.prisem.washington.edu   | GOOD   | /etc/rc0.d/K20supervisor                        |
|                                     |        | /etc/rc3.d/S20supervisor                        |
|                                     |        | /etc/rc1.d/K20supervisor                        |
|                                     |        | /etc/default/supervisor                         |
|                                     |        | /etc/rc2.d/S20supervisor                        |
|                                     |        | /etc/rc6.d/K20supervisor                        |
|                                     |        | /etc/supervisor                                 |
|                                     |        | /etc/supervisor/supervisord.conf                |
|                                     |        | /etc/rc4.d/S20supervisor                        |
|                                     |        | /etc/init.d/supervisor                          |
|                                     |        | /etc/rc5.d/S20supervisor                        |
| u12-dev-ws-1.prisem.washington.edu  | GOOD   |                                                 |
| sso.prisem.washington.edu           | GOOD   |                                                 |
| lapp-int.prisem.washington.edu      | GOOD   |                                                 |
| foswiki-int.prisem.washington.edu   | GOOD   |                                                 |
| u12-dev-svr-1.prisem.washington.edu | GOOD   | /etc/rc2.d/S20supervisor                        |
|                                     |        | /etc/rc4.d/S20supervisor                        |
|                                     |        | /etc/init.d/supervisor                          |
|                                     |        | /etc/rc5.d/S20supervisor                        |
|                                     |        | /etc/rc3.d/S20supervisor                        |
|                                     |        | /etc/supervisor                                 |
|                                     |        | /etc/supervisor/supervisord.conf                |
|                                     |        | /etc/rc6.d/K20supervisor                        |
|                                     |        | /etc/rc1.d/K20supervisor                        |
|                                     |        | /etc/rc0.d/K20supervisor                        |
| hub.prisem.washington.edu           | GOOD   |                                                 |
| floyd2-p.prisem.washington.edu      | GOOD   |                                                 |
| jira-int.prisem.washington.edu      | GOOD   | /etc/rc0.d/K20supervisor                        |
|                                     |        | /etc/rc3.d/S20supervisor                        |
|                                     |        | /etc/rc1.d/K20supervisor                        |
|                                     |        | /etc/default/supervisor                         |
|                                     |        | /etc/rc2.d/S20supervisor                        |
|                                     |        | /etc/rc6.d/K20supervisor                        |
|                                     |        | /etc/supervisor                                 |
|                                     |        | /etc/supervisor/supervisord.conf                |
|                                     |        | /etc/rc4.d/S20supervisor                        |
|                                     |        | /etc/init.d/supervisor                          |
|                                     |        | /etc/rc5.d/S20supervisor                        |
| lapp.prisem.washington.edu          | GOOD   |                                                 |
+-------------------------------------+--------+-------------------------------------------------+

While the concept of putting a list of host names into a file with a label is simple to understand, it is not very flexible or scalable. Ansible supports a concept called a Dynamic Inventory. Rather than passing a hosts file using -i or --inventory, you can pass a Python script that produces a special JSON object.

What is not very widely known is that you can also trigger creation of a dynamic inventory within ansible or ansible-playbook by passing a list for the -i or --inventory option. Rather than creating a temporary file with [all] at the top, followed by a list of three host names, then passing that file with -i or --inventory, just pass a comma-separated list instead:

[dimsenv] dittrich@dimsdemo1:~/dims/git/python-dimscli (develop*) $ dimscli ansible shell --program "find /etc -name supervisord.conf" --inventory rabbitmq.prisem.washington.edu,time.prisem.washi
ngton.edu,u12-dev-svr-1.prisem.washington.edu --remote-port 8422 --remote-user dittrich
+-------------------------------------+--------+----------------------------------+
| Host                                | Status | Results                          |
+-------------------------------------+--------+----------------------------------+
| rabbitmq.prisem.washington.edu      | GOOD   | /etc/supervisor/supervisord.conf |
| time.prisem.washington.edu          | GOOD   |                                  |
| u12-dev-svr-1.prisem.washington.edu | GOOD   | /etc/supervisor/supervisord.conf |
+-------------------------------------+--------+----------------------------------+

There is a subtle trick for passing just a single host, and that is to pass the name with a trailing comma (,), as seen here:

[dimsenv] dittrich@dimsdemo1:~/dims/git/python-dimscli (develop*) $ dimscli ansible shell --program "find /etc -name supervisord.conf" --inventory rabbitmq.prisem.washington.edu, --remote-port 84
22 --remote-user dittrich
+--------------------------------+--------+----------------------------------+
| Host                           | Status | Results                          |
+--------------------------------+--------+----------------------------------+
| rabbitmq.prisem.washington.edu | GOOD   | /etc/supervisor/supervisord.conf |
+--------------------------------+--------+----------------------------------+