Troubleshooting¶
Entering Anaconda Enterprise environment¶
To enter the Anaconda Enterprise environment and gain access to kubectl and
other commands within Anaconda Enterprise, use the command:
sudo gravity enter
Moving files and data¶
Occasionally you may need to move files and data from the host machine to the Anaconda Enterprise environment. If so, there are two shared mounts to pass data back and forth between the two environments:
- host:
/opt/anaconda/-> AE environment:/opt/anaconda/ - host:
/var/lib/gravity/planet/share-> AE environment:/ext/share
If data is written to either of the locations, that data will be available on both the host machine and within the Anaconda Enterprise environment
Debugging¶
AWS Traffic needs to handle the public IPs and ports. You should either use a canonical security group with the proper ports opened or manually add the specific ports listed in Network Requirements.
Failed installations¶
If an installation fails, you can view the failed logs as part of the support bundle in the failed installation UI.
After executing sudo gravity enter you can check /var/log/messages to
troubleshoot a failed installation or these types of errors.
After executing sudo gravity enter you can run journalctl to look at
logs to troubleshoot a failed installation or these types of errors:
journalctl -u gravity-23423lkqjfefqpfh2.service
NOTE: Replace gravity-23423lkqjfefqpfh2.service with the name of your
gravity service.
You may see messages in /var/log/messages related to errors such as
“etcd cluster is misconfigured” and “etcd has no leader” from one of the
installation jobs, particularly gravity-site. This usually indicates that
etcd needs more compute power, needs more space or is on a slow disk.
Anaconda Enterprise is very sensitive to disk latency, so we usually recommend
using a better disk for /var/lib/gravity on target machines and/or putting
etcd data on a separate disk. For example, you can mount etcd under
/var/lib/gravity/planet/etcd on the hosts.
After a failed installation, you can uninstall Anaconda Enterprise and start over with a fresh installation.
Failed on pulling gravitational/rbac¶
If the node refuses to install and fails on pulling gravitational/rbac, create
a new directory TMPDIR before installing and provide write access
to user 1000.
Problems during air gap project migration¶
The command anaconda-project lock over-specifies the channel list resulting in a conda bug where it adds defaults from the internet to the list of channels.
Solution:
Add to the .condarc: “default_channels”. This way, when conda adds “defaults” to the command it is adding the internal repo server and not the repo.continuum.io URLs.
EXAMPLE:
default_channels:
- anaconda
channels:
- our-internal
- out-partners
- rdkit
- bioconda
- defaults
- r-channel
- conda-forge
channel_alias: https://:8086/conda
auto_update_conda: false
ssl_verify: /etc/ssl/certs/ca.2048.cer
Problems during post-install or post-upgrade steps¶
Post-install and post-upgrade steps run as Kubernetes jobs. When they complete running the pods used to run them are not removed. These and other stopped pods can be found using:
kubectl get pods -a
The logs in each of these three pods will be helpful for diagnosing issues in the following steps:
| Pod | Issues in this step |
|---|---|
ae-wagonwheel |
post-install UI |
install |
installation step |
postupdate |
post-update steps |
Problems after post-install configuration does not work¶
In order to reinitialize the post-install configuration UI, which can come in handy for regenerating temporary (self-signed) SSL certificates or reconfiguring the platform based on your domain name, you must re-create and re-expose the service on a new port.
First, recreate the ap-wagonwheel deployment:
kubectl create -f /var/lib/gravity/site/packages/unpacked/gravitational.io/AnacondaEnterprise/5.X.X/resources/wagonwheel.yaml -n kube-system
NOTE: Replace 5.X.X with your actual version number.
Then execute sudo gravity enter and run:
kubectl get deploy -n kube-system
to check the services running in the system namespace. One of these should be ae-wagonwheel, the
post-install configuration UI. To make this visible to the outside world, run:
kubectl expose deploy ae-wagonwheel --port=8000 --type=NodePort --name=post-install -n kube-system
This will run the UI on a new port, allocated by Kubernetes, under the name post-install. Run:
kubectl get svc -n kube-system | grep post-install
to find out which port it is listening under, then navigate to http://<your domain>:<this port>
to see the post-install UI again.
LDAP error in ap-auth¶
[LDAP: error code 12 - Unavailable Critical Extension]; remaining name 'dc=acme, dc=com'
This error can be caused when pagination is turned on. Pagination is a server side extension and is not supported by some LDAP servers, notably the Sun Directory server.