Troubleshooting

Entering Anaconda Enterprise environment

To enter the Anaconda Enterprise environment and gain access to kubectl and other commands within Anaconda Enterprise, use the command:

sudo gravity enter

Moving files and data

Occasionally you may need to move files and data from the host machine to the Anaconda Enterprise environment. If so, there are two shared mounts to pass data back and forth between the two environments:

  • host: /opt/anaconda/ -> AE environment: /opt/anaconda/
  • host: /var/lib/gravity/planet/share -> AE environment: /ext/share

If data is written to either of the locations, that data will be available on both the host machine and within the Anaconda Enterprise environment

Debugging

AWS Traffic needs to handle the public IPs and ports. You should either use a canonical security group with the proper ports opened or manually add the specific ports listed in Network Requirements.

Failed installations

If an installation fails, you can view the failed logs as part of the support bundle in the failed installation UI.

After executing sudo gravity enter you can check /var/log/messages to troubleshoot a failed installation or these types of errors.

After executing sudo gravity enter you can run journalctl to look at logs to troubleshoot a failed installation or these types of errors:

journalctl -u gravity-23423lkqjfefqpfh2.service

NOTE: Replace gravity-23423lkqjfefqpfh2.service with the name of your gravity service.

You may see messages in /var/log/messages related to errors such as “etcd cluster is misconfigured” and “etcd has no leader” from one of the installation jobs, particularly gravity-site. This usually indicates that etcd needs more compute power, needs more space or is on a slow disk.

Anaconda Enterprise is very sensitive to disk latency, so we usually recommend using a better disk for /var/lib/gravity on target machines and/or putting etcd data on a separate disk. For example, you can mount etcd under /var/lib/gravity/planet/etcd on the hosts.

After a failed installation, you can uninstall Anaconda Enterprise and start over with a fresh installation.

Failed on pulling gravitational/rbac

If the node refuses to install and fails on pulling gravitational/rbac, create a new directory TMPDIR before installing and provide write access to user 1000.

Problems during air gap project migration

The command anaconda-project lock over-specifies the channel list resulting in a conda bug where it adds defaults from the internet to the list of channels.

Solution:

Add to the .condarc: “default_channels”. This way, when conda adds “defaults” to the command it is adding the internal repo server and not the repo.continuum.io URLs.

EXAMPLE:

default_channels:
- anaconda
channels:
  - our-internal
  - out-partners
  - rdkit
  - bioconda
  - defaults
  - r-channel
  - conda-forge
channel_alias: https://:8086/conda
auto_update_conda: false
ssl_verify: /etc/ssl/certs/ca.2048.cer

Problems during post-install or post-upgrade steps

Post-install and post-upgrade steps run as Kubernetes jobs. When they complete running the pods used to run them are not removed. These and other stopped pods can be found using:

kubectl get pods -a

The logs in each of these three pods will be helpful for diagnosing issues in the following steps:

Pod Issues in this step
ae-wagonwheel post-install UI
install installation step
postupdate post-update steps

Problems after post-install configuration does not work

In order to reinitialize the post-install configuration UI, which can come in handy for regenerating temporary (self-signed) SSL certificates or reconfiguring the platform based on your domain name, you must re-create and re-expose the service on a new port.

First, recreate the ap-wagonwheel deployment:

kubectl create -f /var/lib/gravity/site/packages/unpacked/gravitational.io/AnacondaEnterprise/5.X.X/resources/wagonwheel.yaml -n kube-system

NOTE: Replace 5.X.X with your actual version number.

Then execute sudo gravity enter and run:

kubectl get deploy -n kube-system

to check the services running in the system namespace. One of these should be ae-wagonwheel, the post-install configuration UI. To make this visible to the outside world, run:

kubectl expose deploy ae-wagonwheel --port=8000 --type=NodePort --name=post-install -n kube-system

This will run the UI on a new port, allocated by Kubernetes, under the name post-install. Run:

kubectl get svc -n kube-system | grep post-install

to find out which port it is listening under, then navigate to http://<your domain>:<this port> to see the post-install UI again.

LDAP error in ap-auth

[LDAP: error code 12 - Unavailable Critical Extension]; remaining name 'dc=acme, dc=com'

This error can be caused when pagination is turned on. Pagination is a server side extension and is not supported by some LDAP servers, notably the Sun Directory server.