Troubleshooting Elasticsearch Cluster Health: VMware Workspace ONE Access Operational Tutorial
Introduction
Elasticsearch is an open source search and analytics solution used within VMware Workspace ONE® Access. Although it is not a VMware product, on-premises Workspace ONE Access administrators can benefit from a generic Elasticsearch troubleshooting guide.
You can search the Internet for more tips and tricks on how to troubleshoot your Workspace ONE Access and Elasticsearch environment.
Audience
This guide is intended for experienced IT administrators of existing environments. Knowledge of Workspace ONE Access and Elasticsearch is assumed.
Troubleshooting Elasticsearch Cluster Health
Most issues with Elasticsearch and Workspace ONE Access arise when you create a cluster. It is common to see the following errors in the System Diagnostic page:
The first step is to try to understand why there is an issue with the Elasticsearch status.
Connect to the console (either via vSphere remote console or SSH) of your Workspace ONE Access virtual appliances. After you log in as ROOT, run the following command to verify the cluster health:
curl 'http://localhost:9200/_cluster/health?pretty=true'
The status is usually either yellow or red. (If you have a single Workspace ONE Access node, the cluster health is always yellow because there is no cluster).
You can also see how many unassigned shards your cluster has.
Next, verify that all three nodes report the same primary cluster. Run the following command on each of the nodes:
curl http://localhost:9200/_cluster/state/master_node,nodes?pretty
Compare the output between all three nodes. A common reason for a red status on the Elasticsearch cluster is that all nodes do not share a common view of the cluster. You might see that one node does not list the same node as the primary and it might not even list the other nodes. This node is most likely the culprit of the red state.
Most of the time, a simple restart of the Elasticsearch service is enough to bring the node back into the cluster.
On the node at fault, run the following commands:
service elasticsearch stop
Wait a few minutes and then run:
service elasticsearch start
Give Elasticsearch time to start and verify that all nodes now report the same primary cluster and that the primary cluster lists all nodes:
curl http://localhost:9200/_cluster/state/master_node,nodes?pretty
Next, check the cluster health again:
curl 'http://localhost:9200/_cluster/health?pretty=true'
If you still have unassigned shards, this must be fixed. There are a couple of different routes; first, try to reassign the shards. The following guide walks you through the steps: How to fix your elasticsearch cluster.
But if this fails, as a last resort, you can delete the unassigned shards by following this blog post: ELK: Deleting unassigned shards to restore cluster health.
After you have completed the previous steps, you should have a system that reports each status as green and has no unassigned shards.
Finally, each component on your systems diagnostic page reports a successful green status.
Conclusion
This guide provided basic steps to troubleshoot Elasticsearch cluster health in your Workspace ONE Access environment.
For more information, see the Workspace ONE Access Activity Path. Activity paths provide step-by-step guidance to help you level-up in your product knowledge. You will find everything from beginner to advanced curated assets in the form of articles, videos, and labs.
About the Author
This guide was written by:
Peter Bjork, Principal Architect, End-User-Computing Technical Marketing, VMware.
To comment on this paper, contact VMware End-User-Computing Technical Marketing at euc_tech_content_feedback@vmware.com.