Troubleshoot Virtual Appliance Issues

Follow the troubleshooting steps if you face the following issues during or after installing Splunk AppDynamics On-Premises Virtual Appliance.

appd-mysql and authn mysql pods are in Unkown state

The appd-mysql and authn mysql pods can turn into Unknown state.

The kubectl get innodbcluster command displays UNKNOWN status under the following conditions:
  • A node has recently failed over or recovered.
  • The MySQL pods are active, but the application or authentication services are unable to establish a connection to the database.

Follow this procedure if appd-mysql or auth-mysql pods are stuck in the UNKNOWN state.

  1. Clear finalizers on the stuck pod.
    JSON
    kubectl patch pod <STUCK_POD_NAME> -n authn --type merge -p '{"metadata":{"finalizers":null}}'
  2. Force delete the pod.
    appd-mysql
    CODE
    kubectl delete pod <STUCK_POD_NAME> -n mysql --force --grace-period=0
    authn-mysql
    CODE
    kubectl delete pod <STUCK_POD_NAME> -n authn --force --grace-period=0

    The MySQL operator recreates the pod. Wait until the pod becomes active (Running).

Follow this procedure if a member pod is active but appears offline in Group Replication (example, the pod is stuck in the RECOVERING state or is missing from the group).

  1. identify the primary server of the group.
    CODE
    kubectl exec -n mysql appd-mysql-1 -c mysql – \
    mysql-uroot-p"$(kubectl get secret mysql-secret -n mysql -o jsonpath='{.data.rootPassword}'|base64-d)"\
    -e"SELECT MEMBER_HOST, MEMBER_PORT, MEMBER_STATE, MEMBER_ROLE FROM performance_schema.replication_group_members ORDER BY MEMBER_HOST;"
    Note: In next steps, use the host of the pod currently assigned the PRIMARYMEMBER_ROLE.
    Example:
    CODE
    appd-mysql-2.appd-mysql-instances.mysql.svc.cluster.local
  2. Rejoin the offline or recovering instance.
    1. Run this command from a MySQL pod that is not stuck or unreachable.PRIMARY_POD_HOST>
      In this command, replace <EXEC_POD>, <REJOIN_POD_HOST>, <PRIMARY_POD_HOST>
      JSON
      ROOT_PASS=$(kubectl get secret mysql-secret -n mysql -o jsonpath='{.data.rootPassword}' | base64 -d)
       
      kubectl exec -n mysql <EXEC_POD> -c mysql – \
      mysqlsh--js\
      --uri="root@<PRIMARY_POD_HOST>:3306"\
      --password="${ROOT_PASS}"\
      --execute="var c=dba.getCluster(); c.rejoinInstance('root@<REJOIN_POD_HOST>:3306'); print(c.status());"
      Rejoin appd-mysql-0 if appd-mysql-2 is PRIMARY and you exec through appd-mysql-1.
      JSON
      ROOT_PASS=$(kubectl get secret mysql-secret -n mysql -o jsonpath='{.data.rootPassword}' | base64 -d)
      
      kubectl exec -n mysql appd-mysql-1 -c mysql – \
      mysqlsh--js\
      --uri="root@appd-mysql-2.appd-mysql-instances.mysql.svc.cluster.local:3306"\
      --password="${ROOT_PASS}"\
      --execute="var c=dba.getCluster(); c.rejoinInstance('root@appd-mysql-0.appd-mysql-instances.mysql.svc.cluster.local:3306'); print(c.status());"

Follow this procedure to verify the group replication status.

Run this command.
CODE
kubectl exec -n mysql appd-mysql-1 -c mysql – \
mysql-uroot-p"$(kubectl get secret mysql-secret -n mysql -o jsonpath='{.data.rootPassword}'|base64-d)"\
-e"SELECT MEMBER_HOST, MEMBER_PORT, MEMBER_STATE, MEMBER_ROLE FROM performance_schema.replication_group_members ORDER BY MEMBER_HOST;"
The group replication status is successful if the output displays as follows:
  • MEMBER_STATE displays ONLINE for all members.
  • MEMBER_ROLE displays PRIMARY for one member and SECONDARY for the remaining.
    Sample Output:
    MEMBER_HOST MEMBER_PORT MEMBER_STATE MEMBER_ROLE
    appd-mysql-0.appd-mysql-instances.mysql.svc.cluster.local 3306 ONLINE SECONDARY
    appd-mysql-1.appd-mysql-instances.mysql.svc.cluster.local 3306 ONLINE SECONDARY
    appd-mysql-2.appd-mysql-instances.mysql.svc.cluster.local 3306 ONLINE PRIMARY

Update DNS Configuration for an Air-Gapped Environment

An air-gapped environment is a network setup that does not have Internet connectivity. In this environment, DNS may become unreachable. To fix this issue, configure a DNS server that can be reached.

Note:

Following are example details used to explain how to update DNS configuration:

The IP addresses 10.0.0.1, 10.0.0.2, and 10.0.0.3 belong to the Virtual Appliance cluster.

The 10.0.0.5 is the IP address of the standalone Controller.

standalone-controller is the DNS of the standalone on-premises Controller.

  1. Update the /etc/hosts file.
    This ensures the appdcli ping command reaches the DNS server.

    Example

    CODE
    AppDOS Cluster Hosts
    10.0.0.1 example-air-gap-va-node-3 10.0.0.1.nip.io
    10.0.0.2 example-air-gap-va-node-1 10.0.0.2.nip.io
    10.0.0.3 example-air-gap-va-node-2 10.0.0.3.nip.io
  2. Edit the coredns configmap file to add the external Controller IP address.
    CODE
    kubectl -n kube-system edit configmap/coredns
  3. In the coredns configmap file, add the following entry in the .:53 section:

    Example

    CODE
    hosts {
    		10.0.0.5 standalone-controller
    		fallthrough
    	  }
  4. Edit the globals.yaml.gotmpl file to update dnsDomain and dbHost with the DNS of the standalone on-premises Controller.

Update CIDR of the Pod

If you require to change the default CIDR of the pod, you can update the CIDR to the available subnet range. Perform the following steps to update CIDR of the pod:

  1. Log in to the node console using the appduser credentials.
  2. Stop the services:
    CODE
    appdcli stop appd
    appdcli stop operators
  3. Back up the following files:
    CODE
    /var/snap/microk8s/current/args/cni-network/cni.yaml
    /var/snap/microk8s/current/args/kube-proxy
  4. Update the cni.yaml file.
    Existing Content Updated Content
    CODE
    - name: CALICO_IPV4POOL_CIDR
         value: "10.1.0.0/16"

    Provide the available subnet range. For example: 10.2.0.0/16

    CODE
    - name: CALICO_IPV4POOL_CIDR
         value: "10.<Number>.0.0/16"
  5. Update the kube-proxy file.
    Existing Content Updated Content
    CODE
    --cluster-cidr=10.1.0.0/16

    Provide the available subnet range. For example: 10.2.0.0/16.

    CODE
    --cluster-cidr=10.X.0.0/16
  6. Run the following command to apply the changes:
    CODE
    microk8s kubectl apply -f /var/snap/microk8s/current/args/cni-network/cni.yaml
  7. Restart MicroK8s.
    CODE
    microk8s stop
    microk8s start
  8. Verify the node status.
    CODE
    microk8s status
    Note: Repeat the steps from 3 to 8 on all nodes.
  9. Delete the ippool and calico pod:
    CODE
    microk8s kubectl delete ippools default-ipv4-ippool
    microk8s kubectl rollout restart daemonset/calico-node -n kube-system

Error Appears for appdctl show boot

When you run the appdctl show boot command, the following error appears if any background processes are pending:

CODE
Error: Get "https://127.0.0.1/boot": Socket /var/run/appd-os.sock not found. Bootstrapping maybe in progress
Please check appd-os service status with following command:
systemctl status appd-os

Run the command after few minutes.

Insufficient Permissions to Access Microk8s

Sometimes this error appears if the terminal was inactive between installation steps. If you face this error, re-login to the terminal.

Restore the MySQL Service

If a Virtual Machine restarts in the cluster, the MySQL service does not automatically start. To start the MySQL services, complete the following:

  1. Run the following command:
    CODE
    $ appdcli run mysql_restore
  2. Verify the pod status.
    CODE
    appdcli run infra_inspect
    CODE
    NAME                                READY   STATUS      RESTARTS   AGE
    appd-mysqlsh-0                      1/1     Running     0          4m33s
    appd-mysql-0                        2/2     Running     0          4m33s
    appd-mysql-1                        2/2     Running     0          4m33s
    appd-mysql-2                        2/2     Running     0          4m33s
    appd-mysql-router-9f8bc6784-g7zx7   1/1     Running     0          5s
    appd-mysql-router-9f8bc6784-fhjnp   1/1     Running     0          5s
    appd-mysql-router-9f8bc6784-wrcwk   1/1     Running     0          5s

EUM Health Fails After Multiple Retries

Run the following commands to restart the Events and EUM pod:

CODE
kubectl delete pod events-ss-0 -n cisco-events
kubectl delete pod eum-ss-0 -n cisco-eum

IOException Error Occurs in the Controller UI

In the Controller UI, when you select Alert and Respond > Anomaly Detection, the following IOException error occurs:

CODE
IOException while calling 'https://pi.appdynamics.com/pi-rca/alarms/modelSensitivityType/getAll?accountId=2&controllerId=onprem&startRecordNo=0&appId=7&recordCount=1'

To workaround this issue, run the following commands:

CODE
kubectl get pods -n cisco-controller
kubectl delete pod <Controller-Pod-Name> -n cisco-controller
This error may also appear for other services as well. In such cases, run the following commands:
  1. List all the pods in the completed state.
    CODE
    Kubectl get pods -A
  2. Delete the completed pods.
    CODE
    kubectl delete pod <Controller-Pod-Name>

MySQL Router Pods Fail to Start

Sometimes, MySQL Router pods fail to start.

When you are restoring data during upgrade, MySQL pods don't start.

Restart the services using the following commands:

  1. Stop the Splunk AppDynamics services.
    CODE
    appdcli stop appd

    Wait for the pods to terminate.

  2. Stop the operators:
    CODE
    appdcli stop operators
  3. Start the Splunk AppDynamics services:
    CODE
    appdcli start appd <profile>