At the time of writing there is no VCAP Design exam stream, thus you’re automatically granted the new VMware Certified Implementation Expert 6 – Network Virtualisation (VCIX6-NV) certification by successfully passing the VCAP6-NV Deploy exam.
For previous blogs in this series please refer to the VCAP6-NV Reference Guide I created. This has all the links to VMware NSX content and lists out each exam objective and the associated blog. Check it out here –>Exam Objective Reference Guide.
This blogs covers:
Section 7 – Perform Advanced VMware NSX Troubleshooting
Objective 7.1 – Troubleshoot Common VMware NSX Installation/Configuration Issues
- Troubleshoot NSX Manager Services
- Download Technical Supports Logs from NSX Manager
- Troubleshoot Host Preparation Issues
- Troubleshoot NSX Controller Cluster Status, Roles and Connectivity
- Troubleshoot Logical Switch Transport Zone and NSX Edge Mappings
- Troubleshoot Logical Router Interface and Route Mappings
- Troubleshoot Distributed and Edge Firewall Implementations
Another brief blog, getting to the dregs now!
Troubleshoot NSX Manager Services
First I would want to log into the NSX Manager and confirm what services are having issues, assuming you can still log into the appliance.
Log into the NSX Manager by hitting the webpage. For me this is https://labnsx01.lab.local
Click the View Summary tab.
Confirm the following services are started: vPostgres, RabbitMQ, NSX Management Service and the NSX Universal Syncronisation Service should you be running cross-vCenter NSX.
If any of these are not running, try starting them and watch the outcome.
To access the NSX Manager logs you need to SSH into the NSX Manager appliance.
There are two logs to have a good look at (1) the NSX Manager log and (2) the System log.
These commands are: show log manager & show log system
Both of these commands can have appended: follow to continuously update the logs, reverse to show the logs in the reverse order or last to only show the last n number of lines.
Example below of show log manager
Example below of show log system follow
You can also configure Syslog on the NSX Manager which I have already done prior (see Specify a Syslog Server in blog 2).
You can see the Syslog settings on the NSX Manager appliance – under the Manage Appliance Settings tab.
By configuring the Syslog settings this sends all the logs to that destination.
My syslog server is VMware vRealize Log Insight, if I log into the webpage for Log Insight I can locate and search the NSX Manager logs.
Download Technical Supports Logs from NSX Manager
Technical Support Logs when downloaded are compressed in the .gz format. You would download these logfiles and then upload to VMware Support for analysis.
Log into the NSX Manager appliance.
Click Download Tech Support Log.
The logs start to be collected
Click Download to save the log files locally.
The files are downloaded.
You can also download Tech Support Logs for the Edge Services Gateway (ESG) and the NSX Controllers.
Troubleshoot Host Preparation Issues
Read more about host preparation in blog 3 under Prepare a cluster for NSX.
Host preparation installs NSX VIBs and drivers onto ESXi hosts. You want all your hosts to be prepared and ready for service. Sometimes there are issues and you will need to resolve.
Log into the Web Client.
Click Networking and Security.
Click Installation, then Host Preparation. This will show all vCenter Server cluster and hosts.
The below is a healthy cluster and all hosts have a green tick.
If any of the hosts have an error or state you will need to resolve this.
You might get Not Ready because the host/s might have had the VIBs and drivers uninstalled and the hosts require a reboot.
Check the Communication Channel Health. Run this option on either the cluster or host level.
Check DNS records for hosts are correct, both forward and reverse.
Make sure Common Components are installed and running on the NSX Manager appliance. Check and restart the RabbitMQ service if more than one host affected.
Check the ESX Agent Manager. This is responsible for automating vSphere agents. Check this from Web Client under vCenter Server Extensions.
Troubleshoot NSX Controller Cluster Status, Roles and Connectivity
There are 3 items here: status, roles and connectivity.
Log into the Web Client.
Click Networking and Security.
Click Installation, then Management.
Controller Cluster Status
Make sure all the controllers are healthy and connected.
If you had an issue with a controller you can delete it and recreate it, this is pretty fast – minutes to redeploy.
You can check the status of the controllers also by connecting by SSH to them.
Run the command: show control-cluster status
The above controller is healthy and joined to the cluster.
Here are all the show control-cluster commands:
Controller Cluster Roles
In any cluster something has to be the boss, be the primary or master etc. To check the role assign to an NSX Controller you need to SSH into the controller and run some commands.
Run the command: show control-cluster roles.
The controller with YES under the row MASTER is the master for that one role.
On the controller shown above, it is the master for all roles.
If I run the same command on another host:
Controller Cluster Connectivity
Make sure the controllers are connected.
Run the command show control-cluster connections
This command shows cluster communication. You can see this host is listening on multiple ports, currently with 6 connections open via port 443 etc.
Troubleshoot Distributed and Edge Firewall Implementations
Remember that the DFW is in the kernel of the hypervisor.
You want to at a minimum check that the NSX Firewall is enabled on the hosts.
Make sure VMware Tools are installed on the VMs.
I am not going into details on troubleshooting. This will come from experience and playing/breaking the product and learning from it.
And that’s it for this blog!
- Monitor and Analyze Virtual Machine Traffic with Flow Monitoring
- Troubleshoot Virtual Machine Connectivity
- Troubleshoot Dynamic Routing Protocols
Be Social; Please Share.