pavan kumar ceemala
4 min readMar 30, 2023
Photo by Andrea De Santis on Unsplash

The current AI landscape has many tools that can integrate with AI platforms like ChatGPT to improve existing setups and enhance productivity. Among these tools is K8sGPT , which positions itself as an SRE troubleshooting tool for Kubernetes setups.

I tried K8sGPT with OCI’s Kubernetes offering OKE, but it can be used on any K8s platform as long as we have functional kubeconfig file and access to the Kubernetes API server are available.

Here’s a brief overview of my experience with K8sGPT.

Some intro:

OCI OKE: OCI OKE (Oracle Cloud Infrastructure Container Engine for Kubernetes) is a fully-managed, scalable, and highly available service that simplifies the deployment, management, and scaling of containerized applications on Kubernetes.

K8sGPT: K8sGPT is an open-source tool that utilizes GPT models to troubleshoot and resolve issues in K8s environments.

ChatGPT: ChatGPT is a language model developed by OpenAI for natural language processing.

Pre-req:

<<Skip to install brew or the next section if you have brew and a working kubeconfig file>>

Install OCI cli:

https://docs.cloud.oracle.com/iaas/Content/API/SDKDocs/cliinstall.htm?tocpath=Developer%20Tools%20%7CCommand%20Line%20Interface%20(CLI)%20%7C_____1

Configure OKE:

If you have public cluster(not recommended), you can use your cloud shell to access but if it’s a private cluster you need to configure a bastion server or use OCI bastion service to access the OKE cluster, you can configure it by running these commands on your bastion server:

mkdir -p $HOME/.kube
oci ce cluster create-kubeconfig — cluster-id ocid1.cluster.oc1.my-region-1.aaaaaaaalu265emk63tou3fmiccoxxjxlq3a4wk2jqw2dcg5ocdasadsd92nepa — file $HOME/.kube/config — region <your-region> — token-version 2.0.0 — kube-endpoint PRIVATE_ENDPOINTexport KUBECONFIG=$HOME/.kube/config

Install brew:

Signup for a chatgpt account and generate and API key(this key will be used for authentication)

Install k8sGPT:

brew tap k8sgpt-ai/k8sgpt
brew install k8sgpt

K8sGPT in action!!

Run these commands on your command prompt

k8sgpt auth
Using openai as backend AI providerEnter openai Key:<enter the API key generated above >

Using openai as backend AI provider

Enter openai Key:<enter the API key generated above >

That’s it, we are good to go…

Lets analyze our whole cluster:

To get the high level view of the errors across all namespaces on your cluster:

[opc@mypbastion ~]$ k8sgpt analyze
Service default/oracle.com-oci does not exist
0% | | (0/5, 0 it/hr) [0s:0s]0 dev/my-sample-app(my-sample-app)
- Error: Service has not ready endpoints, pods: [Pod/my-sample-app-76568c776f-9vnc7 Pod/my-sampleapp-6795f64877-2qb9p], expected 2
1 dev/my-sample-app(my-sample-app)
- Error: Service has not ready endpoints, pods: [Pod/my-sample-app-cd56c8b7c-6vjpx], expected 1
2 dev/my-sampleapp-6795f64877-2qb9p(Deployment/my-sample-app)
- Error: Back-off pulling image "region.ocir.io/axnam4jb4j84/my-sample-app:1.16.0"
3 dev/my-sampleapp-76568c776f-9vnc7(Deployment/my-sample-app)
- Error: Back-off pulling image "region.ocir.io/axnam4jb4j84/my-sample-app:0.0.0"
4 dev/my-sample-app-cd56c8b7c-6vjpx(Deployment/my-sample-app)
- Error: back-off 5m0s restarting failed container=my-sample-app pod=my-sample-app-cd56c8b7c-6vjpx_dev(e084cec5-8016-4309-8518-737b4f286c69)

To get explanation of analysis in a specific Kubernetes namespace

[opc@mypbastion ~]$ k8sgpt analyze -e -f Pod -n dev -o json | jq .
100% |█████████████████████████████████████████████| (3/3, 20257 it/s)
{
"kind": "Pod",
"name": "dev/my-sample-app-cd56c8b7c-6vjpx",
"error": [
"back-off 5m0s restarting failed container=my-sample-app pod=my-sample-app-cd56c8b7c-6vjpx_dev(e084cec5-8016-4309-8518-737b4f286c69)"
],
"details": "The container named \"my-sample-app\" in the pod \"my-sample-app-cd56c8b7c-6vjpx_dev\" has failed and is attempting to restart every 5 minutes. \n\nSolution: Check the logs for the container to determine the cause of the failure and make necessary adjustments to fix the issue. This could include updating dependencies, adjusting resource limits, or modifying configuration settings. Once the issue is resolved, the container should be able to run without errors and stop restarting.",
"parentObject": "Deployment/my-sample-app"
}
{
"kind": "Pod",
"name": "dev/my-sample-app-6795f64877-2qb9p",
"error": [
"Back-off pulling image \"region.ocir.io/axnam4jb4j84/my-sample-app:1.16.0\""
],
"details": "Simplified message: Kubernetes is having difficulty downloading the image \"region.ocir.io/axnam4jb4j84/my-sample-app:1.16.0\".\n\nSolution: Verify that the image repository is accessible and that the image exists in the specified location. If the issue persists, check the Kubernetes pod configuration to ensure that the image name and tag are correctly specified. Additionally, check for network connectivity issues that may be preventing the image from downloading.",
"parentObject": "Deployment/my-sample-app"
}
{
"kind": "Pod",
"name": "dev/my-sample-app-76568c776f-9vnc7",
"error": [
"Back-off pulling image \"region.ocir.io/axnam4jb4j84/my-sample-app:0.0.0\""
],
"details": "The Kubernetes cluster is having difficulty pulling the image \"region.ocir.io/axnam4jb4j84/my-sample-app:0.0.0\" required for the deployment. \n\nSolution: \n1. Check if the image exists and is accessible in the specified repository (\"region.ocir.io/axnam4jb4j84\")\n2. Check if there are any network connectivity issues.\n3. Verify if the image name, version, and repository are provided correctly in the deployment configuration.\n4. Retry the deployment.",
"parentObject": "Deployment/my-sample-app"
}

Though the above one is a very simplistic error like pulling image from registry, and we can easily get from the events using kubectl, but the explanation from k8sGPT additionally provides information to troubleshoot/triage the issue like checking network connection, image tag/name etc.

For more details about k8sGPT project please refer: https://github.com/k8sgpt-ai/k8sgpt