How to debug problems with Ansible?

Sometimes ansible does not do what you want. And increasing the details doesn't help. For example, now I'm trying to start the coturn server, which comes with an init script on systemd OS (Debian Jessie). Ansible believes that it works, but it is not. How to see what happens under the hood? What commands are executed and what exit / exit code?

+10
debugging ansible
source share
6 answers

Debugging Modules

  • The easiest way is to run ansible / ansible-playbook with an increased level of detail by adding -vvv to the execution line.

  • The easiest way for modules written in Python (Linux / Unix) is to run ansible / ansible-playbook with the environment variable ANSIBLE_KEEP_REMOTE_FILES set to 1 (on the host machine).

    This causes Ansible to leave an exact copy of the scripts written in Python (successful or not) on the target machine.

    The script path is printed in the Ansible log, and for common tasks, they are stored in the SSH user home directory: ~/.ansible/tmp/ .

    The exact logic is built into the scripts and depends on each module. Some of them use Python with standard or external libraries, some of them call external commands.

Slot machine debugging

  • Similar to debugging modules that increase the level of detail with the -vvv , more data needs to be printed in the Ansible log

  • Since Ansible 2.1 a Debugger for playing books allows you to debug interactively unsuccessful tasks: check, modify data; restart the task.

Debugging Connections

  • Adding the -vvvv to ansible / ansible-playbook log to include debugging information for the connections.
+14
source share

Here is what I came up with.

Ansible sends the modules to the target system and runs them there. Therefore, if you change the module locally, your changes will take effect when you start the playbook. On my machine modules are /usr/lib/python2.7/site-packages/ansible/modules ( ansible-2.1.2.0 ). And the service module is in core/system/service.py . Unused modules (instances of the AnsibleModule class declared in module_utils/basic.py ) have a log method that sends messages to the systemd log, if available, or returns to syslog . So, run journalctl -f on the target system, add the debug statements ( module.log(msg='test') ) to the local module and run your playbook. You will see debug statements under the name ansible-basic.py .

Also, when you run ansible-playbook with -vvv , you can see some debug output in the systemd log, at least invoking message and error messages, if any.

One more thing, if you try to debug code that is executed locally using pdb ( import pdb; pdb.set_trace() ), most likely you are a BdbQuit exception of BdbQuit . This is because python closes stdin when creating a stream ( ansible worker). The solution here is to re-open stdin before running pdb.set_trace() as suggested here :

 sys.stdin = open('/dev/tty') import pdb; pdb.set_trace() 
+7
source share

Debugging roles / playbooks

Basically, debugging managed automation over large inventories in large networks is nothing more than debugging a distributed network application. This can be very tedious and delicate, and lacks user-friendly tools.

Thus, I believe that the answer to your question is to combine all the answers before my + small addition. So here:

  • absolutely necessary: โ€‹โ€‹you need to know what is happening, that is, what you automate, what you expect. for example, the inability to detect a service using the systemd block as running or stopped usually means an error in the service module file or service module, so you need to 1. identify the error, 2. Report the error to the supplier / community, 3. Provide a workaround with TODO and a link to a mistake. 4. When the error is fixed - delete the workaround

  • to make your code easier to debug used modules as far as you can

  • give all meaningful names of tasks and variables.

  • use static code analysis tools like ansible-lint . This will save you some really stupid little mistakes.

  • use verbosity flags and log path

  • use the debug module wisely

  • โ€œKnow your factsโ€ - sometimes itโ€™s useful to dump information about the target machine into a file and pull it into an indispensable master

    • use strategy: debug , in some cases you can get into the task debugger if it fails. Then you can evaluate all the parameters that the task uses and decide what to do next.

    • the last application will use the Python debugger, binding it to a local pass and / or remote Python executing modules. This is usually complicated: do you need to allow opening an additional port on the machine, and if the code that opens the port causes a problem?

In addition, it is sometimes useful to โ€œlook to the sideโ€ - connect to the target nodes and increase their debugging (more detailed logging)

Of course, a collection of magazines makes it easy to track changes that occur as a result of invisible operations.

As you can see, like any other distributed applications and frameworks, the debugging ability is still not what we would like.

Filters / Plugin

This is basically Python development, debugging like any Python application

Modules

Depending on the technology and the complexity of what you need to see both what is happening locally and remotely, you better choose a language that is simple enough to debug remotely.

+2
source share

You can use the register module and the debug module to print the return values. For example, I want to know what is the return code of my script execution called by "somescript.sh", so I will have tasks in the game, for example:

 - name: my task shell: "bash somescript.sh" register: output - debug: msg: "{{ output.rc }}" 

For full return values, you can access Ansible, you can check this page: http://docs.ansible.com/ansible/latest/common_return_values.html

+2
source share

There are several levels of debugging you may need, but the easiest way is to add the environment variable ANSIBLE_STRATEGY=debug , which allows the debugger to execute the first error.

+1
source share

Debugging Possible tasks can be almost impossible if the tasks are not your own. Contrary to what the Ansible website says.

No special coding skills required

Ansible requires highly specialized programming skills because it is not YAML or Python, it is a dirty mixture of both.

The idea of โ€‹โ€‹using markup languages โ€‹โ€‹for programming was previously discussed. XML was very popular in the Java community at a time. XSLT is also a great example.

As Ansible projects grow, complexity grows exponentially as a result. Take, for example, the OpenShift Ansible project, which performs the following task:

 - name: Create the master server certificate command: > {{ hostvars[openshift_ca_host]['first_master_client_binary'] }} adm ca create-server-cert {% for named_ca_certificate in openshift.master.named_certificates | default([]) | lib_utils_oo_collect('cafile') %} --certificate-authority {{ named_ca_certificate }} {% endfor %} {% for legacy_ca_certificate in g_master_legacy_ca_result.files | default([]) | lib_utils_oo_collect('path') %} --certificate-authority {{ legacy_ca_certificate }} {% endfor %} --hostnames={{ hostvars[item].openshift.common.all_hostnames | join(',') }} --cert={{ openshift_generated_configs_dir }}/master-{{ hostvars[item].openshift.common.hostname }}/master.server.crt --key={{ openshift_generated_configs_dir }}/master-{{ hostvars[item].openshift.common.hostname }}/master.server.key --expire-days={{ openshift_master_cert_expire_days }} --signer-cert={{ openshift_ca_cert }} --signer-key={{ openshift_ca_key }} --signer-serial={{ openshift_ca_serial }} --overwrite=false when: item != openshift_ca_host with_items: "{{ hostvars | lib_utils_oo_select_keys(groups['oo_masters_to_config']) | lib_utils_oo_collect(attribute='inventory_hostname', filters={'master_certs_missing':True}) }}" delegate_to: "{{ openshift_ca_host }}" run_once: true 

I think we can all agree that this is programming in YAML. Not a good idea. This particular fragment may fail with a message like

fatal: [master0]: FAILED! => {"msg": "The conditional check 'item! = openshift_ca_host' failed. The error was: error evaluating the conditional (item! = openshift_ca_host): 'item' not defined \ n \ nAn error occurred in '/ home / user / openshift-ansible / roles / openshift_master_certificates / tasks / main.yml ': line 39, column 3, but may \ n be elsewhere in the file depending on the exact syntax problem. \ n \ n Violation agreement: \ n \ n \ n- name: create master server certificate \ n ^ here \ n "}

If you press such a message, you are doomed. But do we have a debugger? Ok, let's see what happens.

 master0] TASK: openshift_master_certificates : Create the master server certificate (debug)> p task.args {u'_raw_params': u"{{ hostvars[openshift_ca_host]['first_master_client_binary'] }} adm ca create-server-cert {% for named_ca_certificate in openshift.master.named_certificates | default([]) | lib_utils_oo_collect('cafile') %} --certificate-authority {{ named_ca_certificate }} {% endfor %} {% for legacy_ca_certificate in g_master_legacy_ca_result.files | default([]) | lib_utils_oo_collect('path') %} --certificate-authority {{ legacy_ca_certificate }} {% endfor %} --hostnames={{ hostvars[item].openshift.common.all_hostnames | join(',') }} --cert={{ openshift_generated_configs_dir }}/master-{{ hostvars[item].openshift.common.hostname }}/master.server.crt --key={{ openshift_generated_configs_dir }}/master-{{ hostvars[item].openshift.common.hostname }}/master.server.key --expire-days={{ openshift_master_cert_expire_days }} --signer-cert={{ openshift_ca_cert }} --signer-key={{ openshift_ca_key }} --signer-serial={{ openshift_ca_serial }} --overwrite=false"} [master0] TASK: openshift_master_certificates : Create the master server certificate (debug)> exit 

How does this help? This is not true.

The point here is that it is an incredibly bad idea to use YAML as a programming language. This is a mess. And the symptoms of the mess we create are everywhere.

Some additional facts. Providing a prerequisite phase for Azure of Openshift Ansible takes +50 minutes. The deployment phase takes more than +70 minutes. Everytime! First run or subsequent runs. And there is no way to limit the provision of a single node. This limit problem was part of Ansible in 2012, and it still remains part of Ansible. This fact tells us something.

The point here is that Ansible should be used as intended. For simple tasks without YAML programming. Great for a large number of servers, but should not be used for complex configuration management tasks.

Ansible is not an infrastructure tool (Code IaC).

If you ask how to debug Ansible problems, you use it in such a way that it is not intended for use. Do not use it as an IaC tool.

+1
source share

All Articles