## Analyzers Schema Each analyzer in the `analyzers` array is one of the analyzers defined in this section. ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: analyzers spec: collectors: [] analyzers: [] ``` An OpenAPI Schema for this type is published at: [https://github.com/replicatedhq/kots-lint/blob/main/kubernetes_json_schema/schema/troubleshoot/analyzer-troubleshoot-v1beta2.json](https://github.com/replicatedhq/kots-lint/blob/main/kubernetes_json_schema/schema/troubleshoot/analyzer-troubleshoot-v1beta2.json). ### Shared Properties The following properties are supported on all analyzers: #### `checkName` Optionally, an analyzer can specify the `checkName` property. #### `exclude` For analyzers that are optional, based on runtime available configuration, the conditional can be specified in the `exclude` property. This is useful for deployment techniques that allow templating for optional components (Helm and [KOTS](https://kots.io/vendor/packaging/template-functions/)). When this value is `true`, the analyzer will not be called. #### `strict` Optionally, an analyzer can be set to strict. When `strict: true` is set for an analyzer, tools using Troubleshoot know that that particular analyzer must not fail. When `exclude: true` is specified, `exclude` will override `strict` and the analyzer will not be executed. --- The `Ceph Status` analyzer is available to check that Ceph is reporting healthy. The analyzer's outcome `when` clause may be used to evaluate and compare against the actual Ceph health status, and supports standard comparison operators. ## Parameters **checkName:** (Optional) Analyzer name. **collectorName:** (Optional) Must match the `collectorName` specified by the Ceph collector. If this is not provided, it will default to `rook-ceph`. ## Outcomes (Optional) The `when` value in an outcome of this analyzer will be compared to the `ceph status` command `.health.status` with possible values `HEALTH_OK`, `HEALTH_WARN` and `HEALTH_ERR`. The when value can either be the desired status or can include an operator in the format `when: " "`, for example `when: "< HEALTH_OK"`. Supported operators are `<`, `<=`, `>`, `>=`, `==`, `!=`. When unspecified, outcomes will default to: ```yaml outcomes: - pass: message: "Ceph is healthy" when: "HEALTH_OK" - warn: message: "Ceph status is HEALTH_WARN" uri: "https://rook.io/docs/rook/v1.4/ceph-common-issues.html" when: "HEALTH_WARN" - fail: message: "Ceph status is HEALTH_ERR" uri: "https://rook.io/docs/rook/v1.4/ceph-common-issues.html" when: "HEALTH_ERR" ``` ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: ceph-status spec: collectors: - ceph: {} analyzers: - cephStatus: outcomes: - pass: message: "Ceph is healthy" when: "== HEALTH_OK" - warn: message: "Ceph status is unhealthy" uri: "https://rook.io/docs/rook/v1.4/ceph-common-issues.html" when: "<= HEALTH_WARN" ``` --- The `certificates` analyzer alerts users when a certificate is either invalid or nearing its expiration date. This analyzer's outcome `when` clause compares the condition specified with the resources present on the certificates. The `when` value in an outcome of this analyzer contains the certificates that match the filters, if any filters are defined. If no filters are defined, the `when` value is based on the validity of the certificate. For pass outcomes, the valid certificate is matched. For fail outcomes, the invalid certificate is matched. The conditional in the `when` value supports the following filters: | Filter Name | Description | |----|----| | `notAfter < Today` | Indicates that the expiration date of the certificate must be earlier than the current day. | | `notAfter < Today + () days` | Indicates that the expiration date of the certificate must be within a certain number of days from the current day. Expressed as a number. For example, `365`. | Collectors do not automatically include certificates because they often contain sensitive information. You can include the [certificates collector](https://troubleshoot.sh/docs/collect/certificates/) in a set of collectors to collect data about certificates. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: default spec: collectors: - certificates: secrets: - name: envoycert namespaces: - projectcontour configMaps: - name: kube-root-ca.crt namespaces: - kurl analyzers: - certificates: # Iterate through list of certificates outcomes: - pass: message: "certificate is valid" - warn: when: "notAfter < Today + 365 days" message: "certificate is about to expire" - fail: when: "notAfter < Today" message: "certificate has expired" ``` --- The `clusterContainerStatuses` analyzer is used to detect containers that have a certain status. It complements the existing [clusterPodStatuses analyzer](./cluster-pod-statuses) by allowing you to detect containers that are unhealthy. The `when` attribute supports standard comparators to compare the status of the container. The `clusterContainerStatuses` analyzer uses data from the [clusterResources collector](https://troubleshoot.sh/collect/cluster-resources). The `clusterResources` collector is automatically added and will always be present. The outcomes on this analyzer will be processed in order for each container of each pod, and execution will stop after the first outcome that is truthy. ## Parameters **namespaces**: (Optional) The namespaces to look for the pods in. If not specified, it will default to all namespaces. **restartCount**: (Optional) Only consider containers with a restart count greater than or equal to this value. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: container-statuses spec: analyzers: - clusterContainerStatuses: checkName: oom-detector namespaces: - default restartCount: 1 outcomes: - fail: when: "== OOMKilled" message: "Container {{ .ContainerName }} from pod {{ .Namespace }}/{{ .PodName }} has OOMKilled" - pass: message: "No OOMKilled containers found" ``` --- The `clusterPodStatuses` analyzer is used to detect pods that have a certain status. The `when` attribute supports standard comparators to compare the status of the pod. The `clusterPodStatuses` analyzer uses data from the [clusterResources collector](https://troubleshoot.sh/collect/cluster-resources). The `clusterResources` collector is automatically added and will always be present. The outcomes on this analyzer will be processed in order for each pod, and execution will stop after the first outcome that is truthy. ## Parameters **namespaces**: (Optional) The namespaces to look for the pods in. If not specified, it will default to all namespaces. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: pods-are-healthy spec: analyzers: - clusterPodStatuses: namespaces: - default - myapp-namespace outcomes: - fail: when: "== CrashLoopBackOff" message: Pod {{ .Namespace }}/{{ .Name }} is in a CrashLoopBackOff state. - fail: when: "== ImagePullBackOff" message: Pod {{ .Namespace }}/{{ .Name }} is in a ImagePullBackOff state. - fail: when: "== Pending" message: Pod {{ .Namespace }}/{{ .Name }} is in a Pending state. - fail: when: "== Evicted" message: Pod {{ .Namespace }}/{{ .Name }} is in a Evicted state. - fail: when: "== Terminating" message: Pod {{ .Namespace }}/{{ .Name }} is in a Terminating state. - fail: when: "== Init:Error" message: Pod {{ .Namespace }}/{{ .Name }} is in an Init:Error state. - fail: when: "== Init:CrashLoopBackOff" message: Pod {{ .Namespace }}/{{ .Name }} is in an Init:CrashLoopBackOff state. - fail: when: "!= Healthy" # Catch all unhealthy pods. A pod is considered healthy if it has a status of Completed, or Running and all of its containers are ready. # {{ .Status.Reason }} displays the current status of the pod, while {{ .Status.Message }} provides a detailed explanation of why the pod is unhealthy, based on logged events. message: Pod {{ .Namespace }}/{{ .Name }} is unhealthy with a status of {{ .Status.Reason }}. Message is {{ .Status.Message }} ``` --- The `clusterResource` analyzer can be used to check any attribute of any known resources in the cluster in a generic manner. The `clusterResource` analyzer uses data from the [clusterResources collector](/docs/collect/cluster-resources/). The `clusterResources` collector is automatically added and is always present. You must specify the `kind` and `name` attributes. There is an optional `namespace` attribute to target the Kubernetes resource. The `yamlPath` attribute is used to specify a dot-delimited YAML path of a property on the Kubernetes resource referenced in `name`. The `when` attribute supports standard arithmetic comparison operators. The outcomes on this analyzer are processed in order, and execution stops after the first outcome that is truthy. ## Parameters **checkName**: (Optional) Analyzer name. Used for uniqueness if multiple analyzers are defined with similar parameters. **kind**: (Required) The type of Kubernetes resource being targeted by `name`. supported values: - `deployment` - `statefulset` - `networkpolicy` - `pod` - `ingress` - `service` - `resourcequota` - `job` - `persistentvolumeclaim` - `pvc` - `replicaset` - `namespace` - `persistentvolume` - `pv` - `node` - `storageclass` - `configmap` **name**: (Required) The name of the resource to check. **namespace**: (Optional) The namespace to look in for the resource. If a namespace is not specified, this will configure the analyzer to search for cluster-scoped resources. **yamlPath**: (Required) The dot-delimited YAML path of a property on the Kubernetes resource. **regex**: (Optional) See [Regular Expression](/docs/analyze/regex/) documentation. **Note:** This is not supported when using arithmetic comparison in `when`. **regexGroups**: (Optional) See [Regular Expression](/docs/analyze/regex/) documentation. **Note:** This is not supported when using arithmetic comparison in `when`. ## Outcomes The `when` value in an outcome of this analyzer can accept a few variations. If the `yamlPath` points to a quantity-based value (such as the size or quota), standard arithmetic comparison operators can be used: `<`, `<=`, `>`, `>=`, `==`, `!=`. Alternatively, if `regex` specifies an expected value, using a boolean `"true"` or `"false"` in the `when` clause is acceptable. The boolean value must be written as a string in double quotes. ## Example Analyzer Definition The following example shows how to analyze a specific `PersistentVolumeClaim` size plus access mode with custom outcomes, and ensure that it is bound (attached): ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-sample-pvc spec: collectors: - cluster-resources: {} analyzers: - clusterResource: checkName: wordpress pvc size kind: pvc namespace: wordpress name: data-wordpress-mariadb-0 yamlPath: "spec.resources.requests.storage" outcomes: - pass: when: ">= 5Gi" message: you have enough storage - fail: message: there is not enough storage - clusterResource: checkName: check-access-mode kind: pvc namespace: wordpress name: data-wordpress-mariadb-0 yamlPath: "spec.accessModes" regex: ReadWriteOnce outcomes: - fail: when: "false" message: is not ReadWriteOnce - pass: when: "true" message: is ReadWriteOnce - clusterResource: checkName: check-pvc-is-bound kind: pvc namespace: wordpress name: data-wordpress-mariadb-0 yamlPath: "status.phase" regex: Bound outcomes: - fail: when: "false" message: is not bound - pass: when: "true" message: is bound - clusterResource: checkName: check-replicas-number kind: deployment namespace: default name: strapi-db yamlPath: "spec.replicas" regex: "1" outcomes: - fail: when: "false" message: replicas are not matching - pass: when: "true" message: replicas are matching - clusterResource: checkName: check-replicas-number-case-insensitive kind: Deployment namespace: default name: strapi-db yamlPath: "spec.replicas" regex: "1" outcomes: - fail: when: "false" message: replicas are not matching - pass: when: "true" message: replicas are matching - clusterResource: checkName: check-cm-confg kind: configmap namespace: default name: strapi-db-config yamlPath: "data.MYSQL_DATABASE" regex: "strapi-k8s" outcomes: - fail: when: "false" message: is not strapi-k8s - pass: when: "true" message: is strapi-k8s - clusterResource: checkName: check-storageclass-in-cluster-scope kind: storageclass name: standard yamlPath: "volumeBindingMode" regex: Immediate clusterScoped: true outcomes: - fail: when: "false" message: is not Immediate - pass: when: "true" message: is Immediate ``` --- The `clusterVersion` analyzer is used to report on the installed version of Kubernetes. This checks the cluster version, not the version of kubectl. The `when` attribute specifies a semver range to compare the running version against and supports all standard comparison operators. The `clusterVersion` analyzer uses data from the [clusterInfo collector](https://troubleshoot.sh/collect/cluster-info). The `clusterInfo` collector is automatically added and will always be present. To implement an analyzer that has a minimum version, specify that version as a fail or warn outcome first, and have a default outcome for pass. This will allow the pass outcome to always succeed when the fail or warn outcomes are not truthy. An example `clusterVersion` analyzer that reports a failure on Kubernetes less than 1.16.0, a warning when running 1.16.x, and a pass on 1.17.0 or later is included here. ## Parameters *There are no parameters available for this analyzer.* ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: check-kubernetes-version spec: analyzers: - clusterVersion: outcomes: - fail: when: "< 1.16.0" message: The application requires Kubernetes 1.16.0 or later uri: https://kubernetes.io - warn: when: "< 1.17.0" message: Your cluster meets the minimum version of Kubernetes, but we recommend you update to 1.17.0 or later. uri: https://kubernetes.io - pass: message: Your cluster meets the recommended and required versions of Kubernetes. ``` --- The ConfigMap analyzer is a available to require or warn if a specific Kubernetes ConfigMap is not present or does not contain a required key. The `when` attribute is not supported in the outcomes of this analyzer. Collectors do not automatically include ConfigMaps because these often contain sensitive information. The [configMap collector](https://troubleshoot.sh/docs/collect/configmap/), can be included in a set of collectors to include data about the ConfigMap. It's not recommend, and therefore not the default, to include the values of ConfigMaps. The most common use of this analyzer it to detect the existence of a specific key in a specific ConfigMap. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-sample spec: collectors: - configMap: name: my-app-postgres namespace: default key: uri includeValue: false # this is the default, specified here for clarity analyzers: - configMap: checkName: Postgres URI ConfigMap configMapName: my-app-postgres namespace: default key: uri outcomes: - fail: message: The `my-app-postgres` ConfigMap was not found or the `uri` key was not detected. - pass: message: The Postgres URI was found in a ConfigMap in the cluster. ``` --- The `containerRuntime` analyzer is used to analyze the container runtime(s) available in the cluster. The `when` attribute supports standard comparators to compare to the detected runtime. The `containerRuntime` analyzer uses data from the [clusterResources collector](https://troubleshoot.sh/collect/cluster-resources). The `clusterResources` collector is automatically added and will always be present. The `containerRuntime` analyzer is based on the `containerRuntimeVersion` field that is available on each Kubernetes node. This is reflected in the following support bundle example under `cluster-resources/nodes.json` as: ``` "nodeInfo": { "containerRuntimeVersion": "docker://20.10.5", ``` The value for `containerRuntimeVersion` can also be retrieved by manually running the following command: `kubectl get node [nodename] --no-headers -o=jsonpath='{.status..nodeInfo.containerRuntimeVersion}'` **Example Output:** `containerd://1.6.8` Some common container runtimes are: - `containerd` - `docker` - `cri-o` ## Parameters *There are no parameters available for this analyzer.* ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: no-gvisor spec: analyzers: - containerRuntime: outcomes: - fail: when: "== gvisor" message: The application does not support gvisor - pass: message: A supported container runtime was found ``` --- The customResourceDefinition analyzer is available to check for the existence of a Custom Resource Definition (CRD) that is expected to be installed. The `customResourceDefinition` analyzer uses data from the [clusterResources collector](https://troubleshoot.sh/collect/cluster-resources). The `clusterResources` collector is automatically added and will always be present. ## Parameters This analyzer requires exactly 1 parameter: **customResourceDefinitionName**: (Required) The name of the CRD that should be present. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: analyzer-sample spec: analyzers: - customResourceDefinition: customResourceDefinitionName: cephclusters.ceph.rook.io outcomes: - fail: message: The Rook CRD was not found in the cluster. - pass: message: Rook is installed and available. ``` --- The `deploymentStatus` analyzer is used to report on the number of replicas that are "Ready" in a deployment. The `when` attribute supports standard comparators to compare the number of ready replicas. The `deploymentStatus` analyzer uses data from the [clusterResources collector](https://troubleshoot.sh/collect/cluster-resources). The `clusterResources` collector is automatically added and will always be present. The target deployment can be identified by name. The outcomes on this analyzer will be processed in order, and execution will stop after the first outcome that is truthy. ## Parameters **name**: (Optional) The name of the deployment to check. If name is omitted, all deployments will be analyzed. **namespace**: (Optional) The namespace to look for the deployment in. If specified, analysis will be limited to deployments in this namespace. **namespaces**: (Optional) The namespaces to look for the deployment in. If specified, analysis will be limited to deployments in these namespaces. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: api-deployment-running spec: analyzers: - deploymentStatus: name: api namespace: default outcomes: - fail: when: "absent" # note that the "absent" failure state must be listed first if used. message: The API deployment is not present. - fail: when: "< 1" message: The API deployment does not have any ready replicas. - warn: when: "= 1" message: The API deployment has only a single ready replica. - pass: message: There are multiple replicas of the API deployment ready. ``` ## Example --- The `distribution` analyzer is used to check for known managed (hosted) and self-hosted versions of Kubernetes. The `when` attribute supports standard comparators to compare the result to. The `distribution` analyzer uses data from the [clusterResources collector](https://troubleshoot.sh/collect/cluster-resources). The `clusterResources` collector is automatically added and will always be present. The `distribution` analyzer supports the following distributions: * `aks` (Azure Kubernetes Services) * `digitalocean` (DigitalOcean) * `docker-desktop` (Docker Desktop) * `eks` (Amazon Elastic Kubernetes Service) * `embedded-cluster` (Replicated Embedded Cluster) * `gke` (Google Kubernetes Engine) * `ibm` (IBM Cloud) * `k0s` (Mirantis k0s) * `k3s` (K3s) * `kind` (Kind) * `kurl` (Replicated kURL) * `microk8s` (MicroK8s) * `minikube` (minikube) * `oke` (Oracle Cloud Infrastructure Container Engine for Kubernetes) * `openShift` (RedHat OpenShift) * `rke2` (Rancher RKE2) ## Parameters *There are no parameters available for this analyzer.* ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: require-hosted-k8s spec: analyzers: - distribution: outcomes: - pass: when: "== k0s" message: k0s is a supported distribution - pass: when: "== openShift" message: OpenShift is a supported distribution - fail: when: "== docker-desktop" message: The application does not support Docker Desktop - warn: when: "== microk8s" message: The application does not support Microk8s - warn: when: "== kind" message: The application does not support Kind - pass: when: "== eks" message: EKS is a supported distribution - pass: when: "== gke" message: GKE is a supported distribution - pass: when: "== aks" message: AKS is a supported distribution - pass: when: "== digitalocean" message: DigitalOcean is a supported distribution - warn: when: "== minikube" message: Minikube is not suitable for production environments - warn: when: "== ibm" message: The application does not support IBM Cloud - warn: message: Unable to determine the distribution of Kubernetes ``` --- The `Event` analyzer checks if an Event exists within the cluster resources in a given namespace. The analyzer uses data from the [clusterResources collector](https://troubleshoot.sh/collect/cluster-resources). The `clusterResources` collector is automatically added and will always be present. The target Event can be identified by `Reason`, `Kind` or a regular expression matching the Event `Message`. The outcomes on this analyzer will be processed in order, and execution will stop after the first outcome that is truthy. The analyzer also has access to all fields in the Event [object](https://kubernetes.io/docs/reference/kubernetes-api/cluster-resources/event-v1/), Go templating can be used for dynamic message. E.g. `{{ .Reason }} {{ .InvovledObject.Name }}` ## Parameters **reason**: (Required) Event Reason. For example, `InvalidDiskCapacity`. Possible reasons list can be referred from Kubernetes [source code](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/events/event.go) **namespace**: (Optional) The namespace to look for the deployment in. If specified, analysis will be limited to deployments in this namespace. **kind**: (Optional) The REST resource the Event represents. For example,`Pod` **regex**: (Optional) A regular expression pattern to test against Event `Message` ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-sample-event spec: analyzers: - event: checkName: event-oom-check namespace: default reason: "OOMKilling" kind: Node outcomes: - fail: when: "true" message: Event {{ .Reason }} by object {{ .InvolvedObject.Name }} kind {{ .InvolvedObject.Kind }} has message {{ .Message }} - pass: when: "false" message: No OOMKilling event detected ``` --- The goldpinger analyzer is used to analyze results collected by the [goldpinger collector](/docs/collect/goldpinger/). The goldpinger analyzer reports the following: - Failed outcome for each pod to pod ping that was not successful. - Warning outcome if any pod ping result is missing - Success for each pod that successfully pinged all other pods in a cluster *NOTE: A ping in goldpinger terminology involves making a http request as opposed to making an ICMP ping request* ## Parameters - ##### `filePath` (Optional) Optional parameter pointing to where goldpinger results are collected. By default, this is `goldpinger/check_all.json` ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: goldpinger spec: collectors: - goldpinger: {} analyzers: - goldpinger: {} ``` --- ## HTTP Analyzer The `http` analyzer is used to analyse information collected by the [HTTP Requests](/docs/collect/http/) collector. It supports multiple outcomes. For example: - `error`: Error occurred connecting to the URL. - `statusCode == 200`: Successfully connected to the URL. ### Examples of the collected JSON output to analyse Response received from the server will be stored in the `"response"` key of the resulting JSON file ```json { "response": { "status": 200, "body": "{\"status\": \"healthy\"}", "headers": { "Connection": "keep-alive", "Date": "Fri, 19 Jul 2019 20:13:44 GMT", "Server": "nginx/1.8.1", "Strict-Transport-Security": "max-age=31536000; includeSubDomains" } } } ``` In case a client side error occurs and no respose is received, the error text will be stored in the error key ```json { "error": { "message": "Put : unsupported protocol scheme \"\"" } } ``` ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: http spec: hostCollectors: - http: collectorName: get-replicated-app get: url: https://replicated.app hostAnalyzers: - http: checkName: Can Access Replicated API collectorName: get-replicated-app outcomes: - warn: when: "error" message: Error connecting to https://replicated.app - pass: when: "statusCode == 200" message: Connected to https://replicated.app - warn: message: "Unexpected response" ``` --- The `imagePullSecret` analyzer checks that a secret exists with credentials to pull images from a registry. It does not verify that the credentials in the secret are valid. The `imagePullSecret` analyzer uses data from the [clusterResources collector](https://troubleshoot.sh/collect/cluster-resources). The `clusterResources` collector is automatically added and will always be present. ## Parameters **registryName**: (Required) The name of the registry to check ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-sample spec: analyzers: - imagePullSecret: checkName: Pull from Quay registryName: quay.io outcomes: - fail: message: Did not find credentials to pull from Quay - pass: message: Found credentials to pull from Quay ``` --- Analyzers are YAML specifications that define a set of criteria and operations to run against data collected in a preflight check or support bundle. Each analyzer included will result in either 0 or 1 [outcomes](outcomes). If an analyzer produces zero outcomes, it will not be displayed in the results. Analyzers are specified inside either a Preflight or a SupportBundle YAML file. To build a set of analyzers, start with a Kubernetes YAML file: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: my-application-name spec: analyzers: [] ``` or ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: my-application-name spec: analyzers: [] ``` The above files contain all of the necessary scaffolding and structure needed to write analyzers, but don't contain any analyzers. Given this analyzer definition, there will be nothing on the analysis page. The troubleshoot project defines a number of built-in and easy-to-use analyzers, and many helper functions to build custom analyzers. To add additional analyzers to a manifest, read the docs in this section to understand each one, and add them as an array item below `spec`. For example, a complete Preflight check might be: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: my-application-name spec: analyzers: - imagePullSecret: checkName: Has Access to Quay.io registryName: quay.io outcomes: - fail: message: Cannot pull from quay.io - pass: message: Found credentials to pull from quay.io - clusterVersion: outcomes: - fail: when: "< 1.13.0" message: Sorry, my-application-name requires at least Kubernetes 1.14.0. Please update your Kubernetes cluster before installing. uri: https://enterprise.my-application.com/install/requirements/kubernetes - warn: when: "< 1.15.0" message: The version of Kubernetes you are running meets the minimum requirements to run my-application-name. It's recommended to run Kubernetes 1.15.0 or later. uri: https://enterprise.my-application.com/install/requirements/kubernetes - pass: message: The version of Kubernetes you have installed meets the required and recommended versions. - storageClass: checkName: Required storage classes storageClassName: "microk8s-hostpath" outcomes: - fail: message: The required storage class was not found in the cluster. - pass: message: The required storage class was found in the cluster. - customResourceDefinition: customResourceDefinitionName: cephclusters.ceph.rook.io outcomes: - fail: message: Rook is required for my-application. Rook was not found in the cluster. - pass: message: Found a supported version of Rook installed and running in the cluster. ``` --- ### Use Cases There are two use cases for the IngressClass Analyzer: - Check for the presence of a specific IngressClass by name, in which case `ingressClassName` must be provided (Example 1) - Check if there is an IngressClass set as default. The analyzer checks if there is any IngressClass with the `ingressclass.kubernetes.io/is-default-class` annotation set to `"true"`. (Examples 2 and 3) In the second case, all arguments are optional. If none are provided, default messages will indicate whether a default IngressClass was found. #### Example 1: Check for a specific IngressClass ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-sample spec: analyzers: - ingressClass: checkName: Required ingress class ingressClassName: "nginx" outcomes: - fail: message: The nginx ingress class was not found - pass: message: The nginx ingress class is available ``` #### Example 2: Check for the presence of a default IngressClass ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-sample spec: analyzers: - ingressClass: checkName: Check for default ingress class outcomes: - fail: message: No default ingress class found - pass: message: Default ingress class found ``` #### Example 3: Check for a default IngressClass using default messages Defaults for the ingressClass analyzer are: - `checkName` = 'Default Ingress Class' - Fail Message = 'No Default Ingress Class found' - Pass Message = 'Default Ingress Class found' ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-sample spec: analyzers: - ingressClass: {} ``` --- Ingress Analyzer checks if a given Ingress is listed within the cluster resources in a given namespace. > `Ingress` was introduced in Kots 1.20.0 and Troubleshoot 0.9.43. ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-sample spec: analyzers: - ingress: namespace: default ingressName: connect-to-me outcomes: - fail: message: The ingress isn't listed in the cluster - pass: message: Ingress rule found ``` --- The jobStatus analyzer is used to report on the status of a job. The `when` attribute supports standard comparators to compare the number successful and failed pods within the job. The `jobStatus` analyzer uses data from the [clusterResources collector](https://troubleshoot.sh/collect/cluster-resources). The `clusterResources` collector is automatically added and will always be present. The target job can be identified by name. The outcomes on this analyzer will be processed in order, and execution will stop after the first outcome that is truthy. Outcomes are optional in this analyzer. If no outcomes are specified, the Job's spec and status will be examined to automatically determine its status. In this case, only failed jobs will be reported in the results. ## Parameters **name**: (Optional) The name of the job to check. If name is not specified, all jobs will be analyzed. **namespace**: (Optional) The namespace to look for the job in. If specified, analysis will be limited to jobs in this namespace. **namespaces**: (Optional) The namespaces to look for the job in. If specified, analysis will be limited to jobs in these namespaces. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: post-install-job spec: analyzers: - jobStatus: name: post-install-job namespace: default outcomes: - pass: when: "succeeded > 5" message: The post-install job has succeeded. - fail: when: "failed > 1" message: Too many containers in post-install job have failed. ``` --- The JSON compare analyzer is used to compare a JSON snippet with part or all of a collected JSON file. ## Parameters **fileName**: (Required) Path of the collected file to analyze. **value**: (Required) JSON value to compare. If the value matches the collected file, the outcome that has `when` set to `"true"` will be executed. If a `when` expression is not specified, the `pass` outcome defaults to `"true"`. This value _must_ be specified as a multi-line YAML string and any string values must be in double quotes (see the examples). **path**: (Optional) Portion of the collected JSON file to compare against. The default behavior is to compare against the entire collected file. **jsonPath**: (Optional) JSONPath template of the collected JSON file to compare against. This follows the same rules and syntax as [kubectl's jsonpath support](https://kubernetes.io/docs/reference/kubectl/jsonpath/) so if the template resolves to a single result it will *not* be wrapped in an array. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: json-compare-example spec: collectors: - data: name: example.json data: | { "foo": "bar", "stuff": { "foo": "bar", "bar": true }, "morestuff": [ { "foo": { "bar": 123 } } ] } analyzers: - jsonCompare: checkName: Compare JSON Example fileName: example.json path: "morestuff.[0].foo.bar" value: | 123 outcomes: - fail: when: "false" message: The collected data does not match the value. - pass: when: "true" message: The collected data matches the value. ``` ## Example Analyzer Definition using JSONPath ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: jsonpath-compare-example spec: collectors: - data: name: example.json data: | { "foo": "bar", "stuff": { "foo": "bar", "bar": true }, "morestuff": [ { "foo": { "bar": 123 } }, { "foo": { "bar": 45 } } ] } analyzers: - jsonCompare: checkName: Compare JSONPath Example fileName: example.json jsonPath: "{$.morestuff[?(@.foo.bar>100)].foo.bar}" value: | 123 outcomes: - fail: when: "false" message: The collected data does not match the value. - pass: when: "true" message: The collected data matches the value. ``` ## Example Analyzer Definition to Check the Cluster Platform ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: json-compare-example spec: collectors: - clusterInfo: {} analyzers: - jsonCompare: checkName: Check Cluster Platform fileName: cluster-info/cluster_version.json path: "info.platform" value: | "linux/amd64" outcomes: - fail: when: "false" message: The cluster platform is not linux/amd64. - pass: when: "true" message: The cluster platform is linux/amd64. ``` ## Example Analyzer Definition using Templating in the Outcome Messages ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: jsonpath-compare-example spec: collectors: - data: name: example.json data: | { "stuff": { "status": "ready", "info": "foo" } } analyzers: - jsonCompare: checkName: Compare JSONPath Example fileName: example.json path: "stuff.status" value: | "ready" outcomes: - fail: when: "false" message: "Not Ready, Info: {{ .stuff.info }}" - pass: when: "true" message: "Ready, Info: {{ .stuff.info }}" ``` --- The longhorn analyzer runs several checks to detect problems with a [longhorn](https://longhorn.io) installation. ## Parameters **namespace:** (Optional) Namespace where longhorn is installed. Will default to `longhorn-system` if not set. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: longhorn spec: collectors: - longhorn: {} analyzers: - longhorn: {} ``` ## Included Analyzers ### nodes.longhorn.io Each nodes.longhorn.io resource in the bundle will be analyzed. Warnings will be generated for problems such as the node not being schedulable. ### replicas.longhorn.io Each replicas.longhorn.io resource in the bundle will be analyzed. Warnings will be generated for problems such as failed replicas or replicas not in the desired state. ### engines.longhorn.io Each engines.longhorn.io resource in the bundle will be analyzed. Warnings will be generated for problems such as the engine not being in the desired state. ### Replica checksums Any difference in the contents of replicas for a single volume will generate a warning. Refer to the [longhorn collector docs](/docs/collect/longhorn/) for information about when replica checksums will be collected for a volume. --- The `MS SQL` analyzer is available to check version and connection status of a Microsoft SQL Server database. It relies on the data collected by the [MS SQL collector](/docs/collect/mssql/). The analyzer's outcome `when` clause may be used to evaluate the database connection status or a version range to compare against the running version, and supports standard comparison operators. ## Parameters **checkName:** Optional name. **collectorName:** (Recommended) Must match the `collectorName` specified by the mssql collector. ## Outcomes The `when` value in an outcome of this analyzer contains the connection or version information. The conditional in the when value supports the following: **connected:** A boolean representing whether the database is connected. Can be compared to a boolean value with the `==` operator. **version:** A string representing the version of the database. Can be compared to a an assembly version string using `<`, `<=`, `>`, `>=`, `==`, `!=`, with the letter 'x' as a version wildcard (10.x). The 'x' is parsed as '0'. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: sample spec: collectors: - mssql: collectorName: mssql uri: sqlserver://username:password@hostname:1433/defaultdb analyzers: - mssql: checkName: Must be SQLServer 15.x or later collectorName: mssql outcomes: - fail: when: "connected == false" message: Cannot connect to SQLServer - fail: when: "version < 15.x" message: The SQLServer must be at least version 15 - pass: message: The SQLServer connection checks out ``` --- The `MySQL` analyzer is available to check version and connection status of a MySQL database. It relies on the data collected by the [MySQL collector](/docs/collect/mysql/). The analyzer's outcome `when` clause may be used to evaluate the database connection status or a semver range to compare against the running version, and supports standard comparison operators. ## Parameters **checkName:** Optional name. **collectorName:** (Recommended) Must match the `collectorName` specified by the MySQL collector. ## Outcomes The `when` value in an outcome of this analyzer contains the connection or version information. The conditional in the when value supports the following: **connected:** A boolean representing whether the database is connected. Can be compared to a boolean value with the `==` operator. **version:** A string representing the semantic version of the database. Can be compared to a semver string using `<`, `<=`, `>`, `>=`, `==`, `!=`, with the letter 'x' as a version wildcard (10.x). The 'x' is parsed as '0'. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: supported-mysql-version spec: collectors: - mysql: collectorName: mysql uri: ':@tcp(:)/' analyzers: - mysql: checkName: Must be MySQL 8.x or later collectorName: mysql outcomes: - fail: when: connected == false message: Cannot connect to MySQL server - fail: when: version < 8.x message: The MySQL server must be at least version 8 - pass: message: The MySQL server is ready ``` ## Test MySQL Analyzer locally If you want to test it locally, you can spin up a mysql database running the following Docker command. Be sure to specify the image version `mysql:`. In this case, the version is 8.0: ```shell docker run --rm --name mysql_db -p 3306:3306 -e MYSQL_ROOT_PASSWORD=mysecretpassword -d mysql:8.0 ``` You should use the following `uri` in the collector: ```yaml uri: 'root:mysecretpassword@tcp(localhost:3306)/mysql' ``` Once it's running, you can run preflight and test the results. --- The `nodeMetrics` analyzer is available to analyze [node metrics](https://kubernetes.io/docs/reference/instrumentation/node-metrics/) data collected by `kubelet` and served via kubernetes API server. The metrics are collected by the [nodeMetrics collector](/docs/collect/node-metrics/). The analyzer can be used in support bundles or preflights that need to report on checks such as `pvc` usage capacity violations. This analyzer's `when` outcome clause compares the condition specified with the resources present such as a `pvc`. This analyzer also supports a `filters` property. If provided, the resources analyzed are filtered to any resource that matches the filters specified. If no filters are specified, all collected metrics are inspected by the analyzer. ## Available Filters Filters can be used to limit what resources are analyzed. Filters are usually used in conjunction with a few other outcome fields. | Filter Name | Description | |-------------|-------------| | `pvc.namespace` | The namespace the PVC has been deployed. Used to filter down PVC resources to analyze. It is used in conjunction with `pvcUsedPercentage` outcome. | | `pvc.nameRegex` | A regular expression of the PVC name. Used to filter down PVC resources to analyze. It is used in conjunction with `pvcUsedPercentage` outcome. | ## Outcomes The `when` value in an outcome of this analyzer contains scalar quantities such as percentages. They are compared with generated values that are generated from various values in the raw metrics. Comparisions are done using available [logical operators](/docs/analyze/outcomes/#logical-operators). The conditional in the `when` clause can accept the following fields: | Field | Description | |-------|-------------| | `pvcUsedPercentage` | Percentage value to compare with the available space remaining in a PVC. The formula used is `(available space / capacity ) * 100` | The `message` field can contain strings represting [go text templates](https://pkg.go.dev/text/template). The analyzer supports the following template placeholders: | Field | Description | |-------|-------------| | `PVC.ConcatenatedNames` | Comma separated concatenated list of PVC names that matched the defined `when` clause from the filtered list of resources | | `PVC.Names` | List of PVC names that matched the defined `when` clause from the filtered list of resources | ## Example Analyzer Definitions ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight spec: analyzers: - nodeMetrics: checkName: Check for PVCs using more than 80% storage space in the entire cluster outcomes: - fail: when: "pvcUsedPercentage >= 80" message: "There are PVCs using more than 80% of storage: {{ .PVC.ConcatenatedNames }}" - pass: message: "No PVCs are using more than 80% of storage" ``` Example of filtering the PVC resources to analyze: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight spec: analyzers: - nodeMetrics: checkName: Check if minio pvc storage usage is less than 80% filters: pvc: nameRegex: "minio-data-ha-minio.*" namespace: "minio" outcomes: - fail: when: "pvcUsedPercentage >= 80" message: ""There are {{ len .PVC.Names }} PVCs using more than 80% of storage"" - pass: message: "No PVCs are using more than 80% of storage" ``` --- The `nodeResources` analyzer is available to determine if the nodes in the cluster have sufficient resources to run an application. This is useful in preflight checks to avoid deploying a version that will not work, and it's useful in support bundles to collect and analyze in case the available resources of a shared cluster are being reserved for cluster workloads or if an autoscaling group is changing the resources available. This analyzer's outcome `when` clause compares the condition specified with the resources present on each or all nodes. It's possible to create an analyzer to report on both aggregate values of all nodes in the cluster or individual values of any node in the cluster. This analyzer also supports a `filters` property. If provided, the nodes analyzed will be filtered to any node that matches the filters specified. ## Available Filters All filters can be integers or strings that are parsed using the Kubernetes resource standard. The fields here are from the [nodes capacity and allocatable](https://kubernetes.io/docs/concepts/architecture/nodes/#capacity). Note that allocatable is not "free" or "available", but it's the amount of the capacity that is not reserved by other pods and processes. | Filter Name | Description | |-------------------------------|----------------------------------------------------------------------------------------| | `cpuArchitecture` | The architecture of the CPU available to the node. Expressed as a string, e.g. `amd64` | | `cpuCapacity` | The amount of CPU available to the node. | | `cpuAllocatable` | The amount of allocatable CPU after the Kubernetes components have been started | | `memoryCapacity` | The amount of memory available to the node | | `memoryAllocatable` | The amount of allocatable Memory after the Kubernetes components have been started | | `podCapacity` | The number of pods that can be started on the node | | `podAllocatable` | The number of pods that can be started on the node after Kubernetes is running | | `ephemeralStorageCapacity` | The amount of ephemeral storage on the node | | `ephemeralStorageAllocatable` | The amount of ephemeral storage on the node after Kubernetes is running | | `matchLabel` | Specific selector label or labels the node must contain in its metadata | | `matchExpressions` | A list of selector label expressions that the node needs to match in its metadata | | `resourceName` | The name of the resource to filter on. This is useful for filtering on custom resources | | `resourceCapacity` | The amount of the resource available to the node. | | `resourceAllocatable` | The amount of allocatable resource after the Kubernetes components have been started | CPU and Memory units are expressed as Go [Quantities](https://pkg.go.dev/k8s.io/apimachinery/pkg/api/resource#Quantity): `16Gi`, `8Mi`, `1.5m`, `5` etc. ## Outcomes The `when` value in an outcome of this analyzer contains the nodes that match the filters, if any filters are defined. If there are no defined filters, the `when` value contains all nodes in the cluster. The conditional in the `when` value supports the following: | Aggregate | Description | |-----------|-------------| | `count`( ) | The number of nodes that match the filter (default if not specified) | | `sum(filterName)` | Sum of filterName in all nodes that match any filter specified | | `min(filterName)` | Min of filterName in all nodes that match any filter specified | | `max(filterName)` | Max of filterName in all nodes that match any filter specified | | `nodeCondition(conditionType)` | used for checking [node conditions](https://kubernetes.io/docs/reference/node/node-status/#condition) such as Ready, PIDPressure, etc | ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: sample spec: analyzers: - nodeResources: checkName: Must have at least 3 nodes in the cluster outcomes: - fail: when: "count() < 3" message: This application requires at least 3 nodes - warn: when: "count() < 5" message: This application recommends at last 5 nodes. - pass: message: This cluster has enough nodes. ``` ```yaml - nodeResources: checkName: Every node in the cluster must have at least 16Gi of memory outcomes: - fail: when: "min(memoryCapacity) <= 16Gi" message: All nodes must have at least 16 GB of memory - pass: message: All nodes have at least 16 GB of memory ``` ```yaml - nodeResources: checkName: Total CPU Cores in the cluster is 20 or greater outcomes: - fail: when: "sum(cpuCapacity) < 20" message: The cluster must contain at least 20 cores - pass: message: There are at least 20 cores in the cluster ``` ```yaml - nodeResources: checkName: Nodes that have 6 cores have at least 16 GB of memory also filters: cpuCapacity: "6" outcomes: - fail: when: "min(memoryCapacity) < 16Gi" message: All nodes that have 6 or more cores must have at least 16 GB of memory - pass: message: All nodes with 6 or more cores have at least 16 GB of memory ``` ```yaml - nodeResources: checkName: Must have 3 nodes with at least 6 cores filters: cpuCapacity: "6" outcomes: - fail: when: "count() < 3" message: This application requires at least 3 nodes with 6 cores each - pass: message: This cluster has enough nodes with enough cores ``` ```yaml - nodeResources: checkName: Must have 1 node with 16 GB (available) memory and 5 cores (on a single node) with amd64 architecture filters: memoryAllocatable: 16Gi cpuArchitecture: amd64 cpuCapacity: "5" outcomes: - fail: when: "count() < 1" message: This application requires at least 1 node with 16GB available memory and 5 cpu cores with amd64 architecture - pass: message: This cluster has a node with enough memory and cpu cores ``` ```yaml - nodeResources: checkName: Node status check outcomes: - fail: when: "nodeCondition(Ready) == False" message: "Not all nodes are online." - fail: when: "nodeCondition(Ready) == Unknown" message: "Not all nodes are online." - pass: message: "All nodes are online." ``` ### Filter by labels > Filtering by labels was introduced in Kots 1.19.0 and Troubleshoot 0.9.42. Labels are intended to be used to specify identifying attributes of objects that are meaningful and relevant to users, but do not directly imply semantics to the core system. Labels can be used to organize and to select subsets of objects. Troubleshoot allows users to analyze nodes that match one or more labels. For example, to require a certain number of nodes with certain labels as a preflight check. Multiple filters may be specified and all are required to match for the node to match. ```yaml - nodeResources: checkName: Must have Mongo running filters: memoryAllocatable: 16Gi cpuCapacity: "5" selector: matchLabel: kubernetes.io/role: database-primary-replica outcomes: - fail: when: "count() < 1" message: Must have 1 node with 16 GB (available) memory and 5 cores (on a single node) running Mongo Operator. - pass: message: This cluster has a node with enough memory and cpu capacity running Mongo Operator. ``` ```yaml - nodeResources: checkName: Must have at least 1 node with 3 cores that is not a storage, queue or control plane node filters: cpuCapacity: "3" selector: matchExpressions: # An AND operation will be applied to this list of expressions # Nodes that are not storage or queue nodes - key: node.kubernetes.io/role operator: NotIn # Other operations are In, Exists, DoesNotExist values: # An OR operation i.e any node that does not have "node.kubernetes.io/role=storage" or "node.kubernetes.io/role=queue" label - storage - queue # Nodes that are not control-plane nodes - key: node-role.kubernetes.io/control-plane operator: NotIn values: - "true" outcomes: - pass: when: "count() >= 1" message: "Found {{ .NodeCount }} nodes with at least 3 CPU cores" - fail: message: "{{ .NodeCount }} nodes do not meet the minimum requirements" ``` ### Filter by GPU resources resoucrceName is used to filter on custom resources. For example, to filter on GPU resources, you can use the resourceName filter with the resource name `nvidia.com/gpu`. resourceCapacity and resourceAllocatable filters are used to filter on the capacity and allocatable resources of the custom resource. ```yaml - nodeResources: checkName: Must have at least 1 node with 1 GPU filters: resourceName: nvidia.com/gpu resourceCapacity: "1" outcomes: - pass: when: "count() >= 1" message: "Found {{ .NodeCount }} nodes with at least 1 GPU" - fail: message: "{{ .NodeCount }} nodes do not meet the minimum requirements" ``` ```yaml - nodeResources: checkName: Must have at least 4 Intel i915 GPUs in the cluster filters: resourceName: gpu.intel.com/i915 outcomes: - pass: when: "min(resourceAllocatable) > 4" message: "This application requires at least 4 Intel i915 GPUs" - fail: message: "{{ .NodeCount }} nodes do not meet the minimum requirements" ``` ```yaml - nodeResources: filters: resourceName: nvidia.com/gpu checkName: Must have at least 3 GPU-enabled nodes in the cluster outcomes: - pass: when: "count() >= 3" message: "This application requires at least 3 GPU-enabled nodes" ``` ## Message Templating To make the outcome message more informative, you can include certain values gathered by the NodeResources collector as templates. The templates are enclosed in double curly braces with a dot separator. The following templates are available: | Template | Description | |----|----| | `.NodeCount` | The number of nodes that match the filter | | `.CPUArchitecture` | The architecture of the CPU available to the node | | `.CPUCapacity` | The amount of CPU available to the node | | `.MemoryCapacity` | The amount of memory available to the node | | `.PodCapacity` | The number of pods that can be started on the node | | `.EphemeralStorageCapacity` | The amount of ephemeral storage on the node | | `.AllocatableMemory` | The amount of allocatable Memory after the Kubernetes components have been started | | `.AllocatableCPU` | The amount of allocatable CPU after the Kubernetes components have been started | | `.AllocatablePods` | The number of pods that can be started on the node after Kubernetes is running | | `.AllocatableEphemeralStorage` | The amount of ephemeral storage on the node after Kubernetes is running | ## Example Analyzer Message Templating Definition ```yaml - nodeResources: filters: cpuArchitecture: arm64 checkName: Must have at least 3 nodes in the cluster outcomes: - fail: when: "count() < 3" message: "This application requires at least 3 nodes. {{ .CPUArchitecture }}, it should only return the {{ .NodeCount }} nodes that match that filter" - warn: when: "count() < 5" message: This application recommends at last 5 nodes. - pass: message: This cluster has enough nodes. ``` ```yaml - nodeResources: filters: cpuArchitecture: arm64 cpuCapacity: "2" checkName: Must have at least 3 nodes in the cluster outcomes: - fail: when: "count() < 3" message: "This application requires at least 3 nodes. {{ .CPUArchitecture }}-{{ .CPUCapacity }}, it should only return the {{ .NodeCount }} nodes that match that filter" - warn: when: "count() < 5" message: This application recommends at last 5 nodes. - pass: message: This cluster has enough nodes. ``` --- Analyzer Outcomes are the output of an analyzer and contain up to 4 attributes: ```yaml outcomes: - pass: message: The message to display below the title title: The title of the analyzer card uri: A link to display in the Read More icon when: A conditional to use when deciding if this analysis outcome is truthy ``` The `outcomes` attribute in an analyzer is an array of outcomes, each under a field that identifies if it's a `pass`, `warn` or `fail` result. Outcomes are evaluated in order until one returns true for the analyzer. Once an outcome returns true, no additional outcomes are evaluated for the analyzer. This allows you to write outcomes as you would a "select" or "switch" statement when programming. If there is no `when` attribute on an outcome, it will always return true when executed and be the displayed result. ## Title The title attribute contains a title to display in the analyzer card on the UI. This should be a short message since it's limited to one line. If the text provided extends over 1 line, it will be truncated with an ellipsis. The title attribute does not support markdown, it's rendered as a header element. Each analyzer contains a default title, if one is not provided in the spec. ## Message The message attribute is a text message that shows in smaller font below the title. If this is not provided, there is no built-in or automatic text that is rendered here. The message attribute supports markdown and can be used to display links, bold, emphasis and other basic formatting. ## URI When the uri attribute is present, a small "Read More" icon will be displayed on the card. This will be connected to the URI provided in this attribute. ## When Some analyzers implement the `when` attribute. The details and implementation of this attribute vary between analyzers. For example, the [cluster version](https://troubleshoot.sh/docs/analyze/cluster-version/) analyzer uses this as a semver comparator, while the [image-pull-secrets](https://troubleshoot.sh/docs/analyze/image-pull-secrets/) analyzer does not need `when`, it's a simple, binary output. In some cases, the `when` attribute can be used with the logical operators below to compare values. Which of these operators are supported depends on the specific analyzer, but they all compare scalar values. For more information about the analyzers and example definitions, see the [Troubleshoot Analyzer documentation](https://troubleshoot.sh/docs/analyze/) and select a specific analyzer from the content list. #### Logical operators | Operator | Description | |----------|-------------| | `=`, `==`, `===` | Equal to comparison | `!=`, `!==` | Not equal to comparison | `>` | Greater than comparison | `<` | Less than comparison | `<=` | Less than or equal to comparison | `>=` | Greater than or equal to comparison --- The `PostgreSQL` analyzer is available to check version and connection status of a PostgreSQL database. It relies on the data collected by the [PostgreSQL collector](/docs/collect/postgresql/). The analyzer's outcome `when` clause may be used to evaluate the database connection status or a semver range to compare against the running version, and supports standard comparison operators. ## Parameters **checkName:** Optional name. **collectorName:** (Recommended) Must match the `collectorName` specified by the postgres collector. ## Outcomes The `when` value in an outcome of this analyzer contains the connection or version information. The conditional in the when value supports the following: **connected:** A boolean representing whether the database is connected. Can be compared to a boolean value with the `==` operator. **version:** A string representing the semantic version of the database. Can be compared to a semver string using `<`, `<=`, `>`, `>=`, `==`, `!=`, with the letter 'x' as a version wildcard (10.x). The 'x' is parsed as '0'. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: supported-postgres-version spec: collectors: - postgres: collectorName: postgresql uri: 'postgresql://user:password@hostname:5432/dbname?sslmode=require' analyzers: - postgres: checkName: Must be PostgreSQL 10.x or later collectorName: postgresql outcomes: - fail: when: connected == false message: Cannot connect to PostgreSQL server - fail: when: version < 10.x message: The PostgreSQL server must be at least version 10 - pass: message: The PostgreSQL server is ready ``` ## Test PostgreSQL Analyzer locally If you want to test it locally, you can spin up a postgres database running the following Docker command. Be sure to specify the image version `postgres:`. In this case, the version is 11.9: ```shell docker run --rm --name some-postgres -p 5432:5432 -e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_USER=postgres -d postgres:11.9 ``` You should use the following `uri` in the collector: ```yaml uri: postgresql://postgres:mysecretpassword@localhost:5432/postgres?sslmode=disable ``` Once it's running, you can run preflight and test the results. --- The `Redis` analyzer is available to check version and connection status of a Redis database. It relies on the data collected by the [Redis collector](/docs/collect/redis/). The analyzer's outcome `when` clause may be used to evaluate the database connection status or a semver range to compare against the running version, and supports standard comparison operators. ## Parameters **checkName:** Optional name. **collectorName:** (Recommended) Must match the `collectorName` specified by the redis collector. ## Outcomes The `when` value in an outcome of this analyzer contains the connection or version information. The conditional in the when value supports the following: **connected:** A boolean representing whether the database is connected. Can be compared to a boolean value with the `==` operator. **version:** A string representing the semantic version of the database. Can be compared to a semver string using `<`, `<=`, `>`, `>=`, `==`, `!=`, with the letter 'x' as a version wildcard (10.x). The 'x' is parsed as '0'. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: supported-redis-version spec: collectors: - redis: collectorName: redis uri: 'redis://redis:replicated@server:6379' analyzers: - redis: checkName: Must be Redis 7.x or later collectorName: redis outcomes: - fail: when: connected == false message: Cannot connect to Redis server - fail: when: version < 7.x message: The Redis server must be at least version 7 - pass: message: The Redis server is ready ``` ## Test Redis Analyzer locally If you want to test it locally, you can spin up a redis database running the following Docker command. Be sure to specify the image version `redis:`. In this case, the version is 7.2: ```shell $ docker run --rm --name some-redis -d -p 6379:6379 redis:7.2 ``` You should use the following `uri` in the collector: ```yaml uri: redis://localhost:6379 ``` Once it's running, you can run preflight and test the results. --- The regex analyzer is used to run arbitrary regular expressions against text data collected into a bundle. You can use the regex analyzer with any text data collector, such as the `data`, `runPod`, `runDaemonSet`, `copy`, `logs`, and `exec` collectors. ## Parameters Either `regex` or `regexGroups` must be set but not both. This analyzer uses the Go library [`regexp`](https://pkg.go.dev/regexp) from the Go standard library and uses Go's [RE2 regular expression syntax](https://github.com/google/re2/wiki/Syntax) **regex**: (Optional) A regex pattern to test. If the pattern matches the file, the outcome that has set `when` to `"true"` will be executed. If no `when` expression has been specified, the `pass` outcome defaults to `"true"`. **regexGroups**: (Optional) A regex pattern to match. Matches from named capturing groups are available to `when` expressions in outcomes. The captured group names can be used as template variables in the `when` outcome message. These template variables will be replaced by the strings extracted through regular expression matching. **fileName** (Required) Path to the file in support bundle to analyze. This can be an exact name, a prefix, or a file path pattern as defined by Go's [`filepath.Match`](https://pkg.go.dev/path/filepath#Match) function. **ignoreIfNoFiles** (Optional) If no file matches, this analyzer will produce a warn outcome by default. This flag can be set to `true` in order to suppress the warning. ## Example Analyzer Definition for regex ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: my-app spec: collectors: - logs: selector: - app=my-app name: my-app analyzers: - textAnalyze: checkName: Database Authentication fileName: my-app/my-app-0/my-app.log regex: 'FATAL: password authentication failed for user' outcomes: - pass: when: "false" message: "Database credentials okay" - fail: when: "true" message: "Problem with database credentials" ``` ## Example Analyzer Definition for regexGroups ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: ping spec: collectors: - run: collectorName: "run-ping" image: busybox:1 name: ping.txt namespace: default command: ["ping"] args: ["-w", "10", "-c", "10", "-i", "0.3", "www.google.com"] imagePullPolicy: IfNotPresent analyzers: - textAnalyze: checkName: "run-ping" fileName: ping.txt/run-ping.log regexGroups: '(?P\d+) packets? transmitted, (?P\d+) packets? received, (?P\d+)(\.\d+)?% packet loss' outcomes: - pass: when: "Loss < 5" message: "{{ .Transmitted }} packets transmitted and {{ .Received }} packets received with {{ .Loss }}% packet loss" - fail: message: "High packet loss of {{ .Loss }}%" ``` --- The `registryImages` analyzer is available to check the output of [Registry Images](/docs/collect/registry-images/) collector. The analyzer provides as set of predefined results that can be used in the analyzer's outcome `when` clauses. ## Parameters **checkName:** Optional name. **collectorName:** (Recommended) Must match the `collectorName` specified by the `registryImages` collector. ## Outcomes The conditional in the when value supports the following: **missing:** An integer representing the number of missing images. **errors:** An integer representing the number of images that could not be checked due to errors. **verified:** An integer that represents the number of images that were successfully verified. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: supported-mysql-version spec: collectors: - registryImages: images: - "alpine:3.9" - "nginx:latest" analyzers: - registryImages: checkName: Private Registry Images outcomes: - fail: when: "missing > 0" message: Images are missing from registry - warn: when: "errors > 0" message: Failed to check if images are present in registry - pass: message: All images are present in registry ``` --- The `replicasetStatus` analyzer is used to report on the number of replicas that are "Ready" or "Available" in a ReplicaSet. The `when` attribute supports standard comparators to compare the number of ready replicas. The `replicasetStatus` analyzer uses data from the [clusterResources collector](https://troubleshoot.sh/collect/cluster-resources). The `clusterResources` collector is automatically added and will always be present. The target replicaset can be identified by name or a label selector. If name is specified, selectors will be ignored. The outcomes on this analyzer will be processed in order, and execution will stop after the first outcome that is truthy. Outcomes are optional in this analyzer. If no outcomes are specified, the ReplicaSet's spec and status will be examined to automatically determine its availability. In this case, only the ReplicaSets that do not satisfy its own availability requirements will be reported in the result. ## Parameters **selector** (Optional) The label selector used to find replicaset to check. If selector is specified, the analyzer will only be applied to the replicaset that match the selector. If neither selector nor name is specified, all replicasets will be analyzed. **name**: (Optional) The name of the replicaset to check. If replicaset name is specified, selector will be ignored, and only the replicaset with the matching name will be analyzed. **namespace**: (Optional) The namespace to look for the replicaset in. If specified, analysis will be limited to replicasets in this namespace. **namespaces**: (Optional) The namespaces to look for the replicasets in. If specified, analysis will be limited to replicasets in these namespaces. ## Example Analyzer Definition The example below shows how to analyze a specific ReplicaSet with custom outcomes: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: rook-replicaset-ready spec: analyzers: - replicasetStatus: selector: - app=csi-cephfsplugin-provisioner namespace: rook-ceph outcomes: - fail: when: "ready < 2" message: The csi-cephfsplugin-provisioner replicaset does not have enough ready replicas. - warn: when: "available < 2" message: The csi-cephfsplugin-provisioner replicaset does not have enough available replicas. - pass: message: There are multiple replicas of the csi-cephfsplugin-provisioner replicaset available. ``` The example below shows how to analyze all ReplicaSets: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: replicasets-ready spec: analyzers: - replicasetStatus: {} ``` --- The `s3Status` analyzer is available to check the connection status of an S3 or S3-compatible bucket. It relies on the data collected by the [S3 Status collector](/docs/collect/s3-status/). The analyzer's outcome `when` clause may be used to evaluate the bucket connection status, and supports standard comparison operators. When a fail outcome matches and the collected result contains an error, the error message is appended to the outcome message. ## Parameters **checkName:** Optional name. **collectorName:** (Recommended) Must match the `collectorName` specified by the s3Status collector. ## Outcomes The `when` value in an outcome of this analyzer contains the connection information. The conditional in the when value supports the following: **connected:** A boolean representing whether the bucket is accessible. Can be compared to a boolean value with the `==` operator. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: s3-bucket-check spec: collectors: - s3Status: collectorName: my-bucket bucketName: my-app-data endpoint: https://minio.example.com accessKeyID: minioadmin secretAccessKey: minioadmin usePathStyle: true analyzers: - s3Status: checkName: S3 Bucket Accessible collectorName: my-bucket outcomes: - fail: when: connected == false message: Cannot access the S3 bucket. - pass: when: connected == true message: S3 bucket is accessible. ``` --- The Secret analyzer is a available to require or warn if a specific Kubernetes Secret is not present or does not contain a required key. The `when` attribute is not supported in the outcomes of this analyzer. Collectors do not automatically include Secrets because these often contain sensitive information. The [secret collector](https://troubleshoot.sh/docs/collect/secret/), can be included in a set of collectors to include data about the Secret. It's not recommend, and therefore not the default, to include the values of Secrets. The most common use of this analyzer it to detect the existence of a specific key in a specific Secret. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-sample spec: collectors: - secret: name: my-app-postgres namespace: default key: uri includeValue: false # this is the default, specified here for clarity analyzers: - secret: checkName: Postgres URI Secret secretName: my-app-postgres namespace: default key: uri outcomes: - fail: message: The `my-app-postgres` Secret was not found or the `uri` key was not detected. - pass: message: The Postgres URI was found in a Secret in the cluster. ``` --- The statefulsetStatus analyzer is used to report on the number of replicas that are "Ready" in a statefulset. The `when` attribute supports standard comparators to compare the number of ready replicas. The `statefulsetStatus` analyzer uses data from the [clusterResources collector](https://troubleshoot.sh/collect/cluster-resources). The `clusterResources` collector is automatically added and will always be present. The target statefulset can be identified by name. The outcomes on this analyzer will be processed in order, and execution will stop after the first outcome that is truthy. Outcomes are optional in this analyzer. If no outcomes are specified, the statefulset's spec and status will be examined to automatically determine its status. In this case, only failed statefulsets will be reported in the results. ## Parameters **name**: (Optional) The name of the statefulset to check. If name is omitted, all statefulsets will be analyzed. **namespace**: (Optional) The namespace to look for the statefulset in. If specified, analysis will be limited to statefulsets in this namespace. **namespaces**: (Optional) The namespaces to look for the statefulset in. If specified, analysis will be limited to statefulsets in these namespaces. ## Example Analyzer Definition The example below shows how to analyze a specific StatefulSet with custom outcomes: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: redis-statefulset-running spec: analyzers: - statefulsetStatus: name: redis namespace: default outcomes: - fail: when: "absent" # note that the "absent" failure state must be listed first if used. message: The redis statefulset is not present. - fail: when: "< 1" message: The redis statefulset does not have any ready replicas. - warn: when: "= 1" message: The redis statefulset has only a single ready replica. - pass: message: There are multiple replicas of the redis statefulset ready. ``` The example below shows how to analyze all StatefulSets: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: statefulsets-running spec: analyzers: - statefulsetStatus: {} ``` --- ### Use Cases > The ability to check for a default storage class was introduced in Kots 1.19.0 and Troubleshoot 0.9.42. There are two use cases for the Storage Class Analyzer: - Check for the presence of a specific storage class, in which case ```storageClassName``` must be provided (Example 1) - Check if there is a storage class set as default. The analyzer checks if there is any storage with the ```isDefaultStorageClass``` field set to true. (Examples 2 and 3) In the second case, all arguments are optional. If none are provided, default messages will indicate that no default Storage Class was found. #### Example 1: Check for a specific storage class ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-sample spec: analyzers: - storageClass: checkName: Required storage classes storageClassName: "microk8s-hostpath" outcomes: - fail: message: The microk8s storage class was not found - pass: message: All good on storage classes ``` #### Example 2: Check for the presence of a default storage class ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-sample spec: analyzers: - storageClass: checkName: Check for default storage class outcomes: - fail: message: No default storage class found - pass: message: Default storage class found ``` #### Example 3: Check for the presence of a default storage class using default messages and checkName Defaults for storageClass analyzer are: - ```checkName``` = 'Default Storage Class' - Fail Message = 'No default storage class found' - Pass Message = 'Default Storage Class found' ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-sample spec: analyzers: - storageClass: {} ``` --- The `sysctl` analyzer checks the output of the [Sysctl](/docs/collect/sysctl/) collector. ## Parameters *There are no parameters available for this analyzer.* ## Outcomes The conditional in the `when` tests whether a `sysctl` parameter is equal to a value for at least one node. For example, the conditional `when: net.ipv4.ip_forward = 0` evaluates to `true` if at least one node is found to have IP forwarding disabled. All nodes for which the condition is `true` are prefixed to the message in the outcome. For example, if the outcome message is `IP forwarding not enabled`, and the nodes `a.example.com` and `c.example.com` were matching the condition, the result message would be `Nodes a.example.com, b.example.com: IP forwarding not enabled`. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - sysctl: image: debian:buster-slim analyzers: - sysctl: checkName: IP forwarding enabled outcomes: - fail: when: "net.ipv4.ip_forward = 0" message: "IP forwarding is not enabled" --- The `Velero` analyzer is available to check statuses of Custom Resources installed by velero, such as: backup storage locations, backup repositories, backups and restores. ## Parameters **collectorName:** (N/A) Velero currently does not require a special collector as all the Custom Resources are already collected in support bundle by the `Cluster Resources` collector. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: velero spec: collectors: - clusterResources: {} # implicitly added if not defined in a spec - logs: namespace: velero name: velero/logs analyzers: - velero: {} ``` **Note** For the logs collector: - `name` should always be `velero/logs` as it's the default path created in support bundle to be then used by velero analyzer. - `namespace` could be changed to any other namespace in case velero was installed in a different namespace. ## Included Analyzers ### backuprepositories.velero.io Checks that at least 1 backup repository is configured and available. ### backups.velero.io Warns of the following phases if one or more of the following states: - `BackupPhaseFailed` - `BackupPhasePartiallyFailed` - `BackupPhaseFailedValidation` - `BackupPhaseFinalizingPartiallyFailed` - `BackupPhaseWaitingForPluginOperationsPartiallyFailed` ### backupstoragelocations.velero.io Check that at least 1 backup storage location is configured and available. ### deletebackuprequests.velero.io Generates 'delete backup' requests summary if any found in progress. ### podvolumebackups.velero.io Generates 'pod volume backup' count summary and any failures. ### podvolumerestores.velero.io Generates 'pod volume restore' count summary and any failures. ### restores.velero.io Generates'restore' count summary and any failures. Failures if any of the following states are found: - `RestorePhaseFailed` - `RestorePhasePartiallyFailed` - `RestorePhaseFailedValidation` - `RestorePhaseWaitingForPluginOperationsPartiallyFailed` ### schedules.velero.io Generates'schedule' count summary and any failures (`SchedulePhaseFailedValidation`) ### volumesnapshotlocations.velero.io Generates 'volume snapshot location' count summary and any that are found to be unavailable. ### node-agent logs Analyzes the logs for the velero node agent. This analyzer will only run if the `logs` collector is included in the bundle spec. Checks for the following strings in `node-agent*` pod log file(s): - `error|panic|fatal` - `permission denied` --- The Weave analyzer runs several checks to detect problems with the [Weave](https://www.weave.works/docs/net/latest/kubernetes/kube-addon/) Container Network Interface (CNI) provider. ## Parameters **reportFileGlob:** Filepath in the support bundle for collected Weave reports. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: weave-sample spec: collectors: - exec: collectorName: weave-report command: - /home/weave/weave args: - --local - report containerName: weave exclude: "" name: kots/kurl/weave namespace: kube-system selector: - name=weave-net timeout: 10s analyzers: - weaveReport: reportFileGlob: 'kots/kurl/weave/kube-system/*/weave-report-stdout.txt' ``` ## Included Analyzers ### IPAM Pool Utilization A warning will be generated when at least 85% of the available IPs in the Weave subnet are in use by pods. ### IPAM Pending Allocation A warning will be generated when there are pods waiting to be allocated an IP. This indicates that there are currently no available IPs in the pool. ### Connection Not Established A warning will be generated when a connection between nodes is not in the established state. A connection in the pending state may indicate that UDP is blocked between nodes and a connection in the failed state may indicate that the Weave pod on the peer node is not ready. ### Connection Protocol Sleeve A warning will be generated when the connection between nodes is using the sleeve protocol rather than the fastdp protocol. --- The YAML compare analyzer is used to compare a YAML snippet with part or all of a collected YAML file. ## Parameters **fileName**: (Required) Path of the collected file to analyze. **value**: (Required) YAML value to compare. If the value matches the collected file, the outcome that has `when` set to `"true"` will be executed. If a `when` expression is not specified, the `pass` outcome defaults to `"true"`. **path**: (Optional) Portion of the collected YAML file to compare against. The default behavior is to compare against the entire collected file. ## Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: yaml-compare-example spec: collectors: - data: name: example.yaml data: | foo: bar stuff: foo: bar bar: foo morestuff: - foo: bar: 123 analyzers: - yamlCompare: checkName: Compare YAML Example fileName: example.yaml path: "morestuff.[0].foo" value: | bar: 123 outcomes: - fail: when: "false" message: The collected data does not match the value. - pass: when: "true" message: The collected data matches the value. ``` ## Example Analyzer Definition using Templating in the Outcome Messages ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: yaml-compare-example spec: collectors: - data: name: example.yaml data: | stuff: status: ready info: foo analyzers: - yamlCompare: checkName: Compare YAML Example fileName: example.yaml path: "stuff.status" value: | "ready" outcomes: - fail: when: "false" message: "Not Ready, Info: {{ .stuff.info }}" - pass: when: "true" message: "Ready, Info: {{ .stuff.info }}" ``` --- ## Kubernetes Cluster Info - [clusterInfo](./cluster-info): collects basic information about the cluster - [clusterResources](./cluster-resources): enumerates all available resources in the cluster ## Data and logs - [logs](./logs): collects logs (stdout and stderr) from pods and includes them in the collected output - [copy](./copy): copies files or folders from a pod into the collected output - [copy-from-host](./copy-from-host): copies files or folders from all hosts into the collected output - [data](./data): writes static or predefined data into the collected output - [configmap](./configmap): includes information about Kubernetes ConfigMaps in the collected output - [secret](./secret): includes information about Kubernetes Secrets in the collected output - [collectd](./collectd): includes collectd files from all hosts in the cluster - [dns](./dns): includes data to troubleshoot DNS Resolution issues - [etcd](./etcd): includes data to troubleshoot Kubernetes cluster's backing store etcd ## Generated and dynamic data - [runPod](./run-pod): runs new pods and includes the results in the collected output - [runDaemonSet](./run-daemonset): runs a DaemonSet and includes the results for all nodes in the collected output - [http](./http): executes http requests and includes results in the collected output - [exec](./exec): execs into existing pods and runs commands to include in the collected output ## Databases - [postgresql](./postgresql): collects information related to a postgresql server - [mysql](./mysql): collects information related to a mysql server - [redis](./redis): collects information related to a redis cluster ## CSI - [ceph](./ceph): collects information about a ceph installation - [longhorn](./longhorn): collects information about a longhorn installation ## Registry - [registryImages](./registry-images): collects information about image existence in a registry --- The data collector will add information about a Ceph cluster to a support bundle. ## Parameters The data collector has the following parameters: #### `collectorName` (Optional) The name of the collector. ##### `namespace` (Optional) The namespace of the Ceph cluster. If this is not provided, it will default to `rook-ceph`. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - ceph: {} ``` ## Included resources ### `/ceph/([collector-name]/)status.json` The output of command `ceph status -f json-pretty`. ### `/ceph/([collector-name]/)status-txt.txt` The output of command `ceph status` (plain text format). ### `/ceph/([collector-name]/)fs.json` The output of command `ceph fs status -f json-pretty`. ### `/ceph/([collector-name]/)fs-txt.txt` The output of command `ceph fs status` (plain text format). ### `/ceph/([collector-name]/)fs-ls.json` The output of command `ceph fs ls -f json-pretty`. ### `/ceph/([collector-name]/)osd-status.json` The output of command `ceph osd status -f json-pretty`. ### `/ceph/([collector-name]/)osd-tree.json` The output of command `ceph osd tree -f json-pretty`. ### `/ceph/([collector-name]/)osd-pool.json` The output of command `ceph osd pool ls detail -f json-pretty`. ### `/ceph/([collector-name]/)health.json` The output of command `ceph health detail -f json-pretty`. ### `/ceph/([collector-name]/)auth.json` The output of command `ceph auth ls -f json-pretty`. ### `/ceph/([collector-name]/)rgw-stats.json` The output of command `radosgw-admin bucket stats --rgw-cache-enabled=false`. ### `/ceph/([collector-name]/)rbd-du-txt.txt` The output of command `rbd du --pool=replicapool`. ### `/ceph/([collector-name]/)df.json` The output of command `ceph df -f json-pretty`. ### `/ceph/([collector-name]/)df-txt.txt` The output of command `ceph df` (plain text format). ### `/ceph/([collector-name]/)osd-df.json` The output of command `ceph osd df -f json-pretty`. ### `/ceph/([collector-name]/)osd-df-txt.txt` The output of command `ceph osd df` (plain text format). --- The `certificates` collector can be used to gather information about the TLS certificates from Kubernetes ConfigMaps and Secrets. This collector can be used multiple times, referencing different Secrets and ConfigMaps. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `certificates` collector accepts the following parameters: ##### `secrets` (Optional) Find matching Secrets across one or more namespaces. If specified, the Secrets in the list are collected. The `secrets` field at the collector level accepts a list of objects with the following parameters: - ##### `name` (Required) The name of the Secret. - ##### `namespaces` (Optional) The namespaces where the Secret exists. If multiple namespaces are specified, all matching Secrets from these namespaces are collected. ##### `configMaps` (Optional) Find matching ConfigMaps across one or more namespaces. If specified, the ConfigMaps in the list are collected. The `configMaps` field at the collector level accepts a list of objects with the following parameters: - ##### `name` (Required) The name of the ConfigMap. - ##### `namespaces` (Optional) The namespaces where the ConfigMap exists. If multiple namespaces are specified, all matching ConfigMaps from these namespaces are collected. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: default spec: collectors: - certificates: secrets: - name: envoycert namespaces: - kube-system - projectcontour - name: envoycert namespaces: - kube-system - projectcontour - name: kube-root-ca.crt namespaces: - default - kube-public configMaps: - name: kube-root-ca.crt namespaces: - curlie - kurl ``` ## Example ConfigMap ```yaml apiVersion: v1 kind: ConfigMap metadata: annotations: kubernetes.io/description: Contains a CA bundle that can be used to verify the kube-apiserver when using internal endpoints such as the internal service IP or kubernetes.default.svc. No other usage is guaranteed across distributions of Kubernetes clusters. name: kube-root-ca.crt namespace: kurl data: ca.crt: | -----BEGIN CERTIFICATE----- valid cert -----END CERTIFICATE----- ``` ## Included resources When this collector is executed, it includes the following file in a support bundle. All certificate metadata collected is stored in this file as a JSON array of objects. Each object in the array contains a `source` object containing the source of the certificate where the metadata was extracted. ### `/certificates/certificates.json` ```json [ { "source": { "configMap": "kube-root-ca.crt", "namespace": "kurl" }, "certificateChain": [ { "certificate": "ca.crt", "subject": "CN=kubernetes", "subjectAlternativeNames": [ "kubernetes" ], "issuer": "CN=kubernetes", "notAfter": "2033-04-13T22:09:47Z", "notBefore": "2023-04-16T22:09:47Z", "isValid": true, "isCA": true } ] }, { ... } ] ``` If an error is encountered, this collector includes the following file: ### `/certificates/certificates.json` ```json [ { "source": { "secret": "kube-root-ca.crt", "namespace": "curlie" }, "errors": [ "Either the configMap does not exist in this namespace or RBAC permissions are preventing certificate collection" ] } ] ``` --- The `clusterInfo` collector will add common information about a Kubernetes cluster. This collector is a default collector and it will be automatically included in your collector spec if you don't include it. This collector cannot be removed. ## Parameters The `clusterInfo` collector supports the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties) and no additional parameters. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - clusterInfo: {} ``` ## Included resources When the `clusterInfo` collector is executed it will include the following file(s): ### `/cluster-info/cluster_version.json` This file contains information describing the Kubernetes cluster version. ```json { "info": { "major": "1", "minor": "13", "gitVersion": "v1.13.6", "gitCommit": "abdda3f9fefa29172298a2e42f5102e777a8ec25", "gitTreeState": "clean", "buildDate": "2019-05-08T13:46:28Z", "goVersion": "go1.11.5", "compiler": "gc", "platform": "linux/amd64" }, "string": "v1.13.6" } ``` --- The `clusterResources` collector will enumerate all resources of known types that are deployed to the cluster. This will attempt to collect information from all namespaces, but if RBAC policies prevent the collector from accessing a namespace or resource, it will still include the resources that are accessible. Any RBAC policy errors will be included in the collected output. This collector is a default collector and it will be automatically included in your collector spec if you don't include it. This collector cannot be removed. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `clusterResources` collector accepts the following parameters: ##### `namespaces` (Optional) The list of namespaces from which the resources and information will be collected. If not specified, it will default to collecting information from all namespaces. ##### `ignoreRBAC` (Optional) Defaults to `false`. When set to `true`, skip checking for RBAC authorization before collecting resource information from each namespace. This is useful when your cluster uses [authorization webhooks](https://kubernetes.io/docs/reference/access-authn-authz/webhook/) that do not support SelfSubjectRuleReviews. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - clusterResources: namespaces: - default - myapp-namespace ``` ## Included resources When the `clusterResources` collector is executed, it will include the following file(s): ### `/cluster-resources/namespaces.json` This file contains information about all known namespaces in the cluster ```json [ { "metadata": { "name": "default", "selfLink": "/api/v1/namespaces/default", "uid": "5b8eebdc-70e6-11e9-a49e-42010aa8001a", "resourceVersion": "4", "creationTimestamp": "2019-05-07T16:37:09Z" }, "spec": { "finalizers": [ "kubernetes" ] }, "status": { "phase": "Active" } }, { "metadata": { "name": "kube-public", "selfLink": "/api/v1/namespaces/kube-public", "uid": "5b974803-70e6-11e9-a49e-42010aa8001a", "resourceVersion": "21", "creationTimestamp": "2019-05-07T16:37:09Z" }, "spec": { "finalizers": [ "kubernetes" ] }, "status": { "phase": "Active" } }, { "metadata": { "name": "kube-system", "selfLink": "/api/v1/namespaces/kube-system", "uid": "5b96e1a9-70e6-11e9-a49e-42010aa8001a", "resourceVersion": "19", "creationTimestamp": "2019-05-07T16:37:09Z" }, "spec": { "finalizers": [ "kubernetes" ] }, "status": { "phase": "Active" } } ] ``` ### `/cluster-resources/nodes.json` This file contains information about all of the nodes in the cluster. This is equivalent to running `kubectl get nodes -o json`. ### `/cluster-resources/storage-classes.json` This file contains information about all installed storage classes in the cluster. This is equivalent to running `kubectl get storageclasses -o json`. ```json [ { "metadata": { "name": "microk8s-hostpath", "selfLink": "/apis/storage.k8s.io/v1beta1/storageclasses/microk8s-hostpath", "uid": "024f5ccf-9ba5-11e9-8bb5-42010aa8001a", "resourceVersion": "6622060", "creationTimestamp": "2019-07-01T02:07:42Z", "annotations": { "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"storage.k8s.io/v1\",\"kind\":\"StorageClass\",\"metadata\":{\"annotations\":{\"storageclass.kubernetes.io/is-default-class\":\"true\"},\"name\":\"microk8s-hostpath\"},\"provisioner\":\"microk8s.io/hostpath\"}\n", "storageclass.kubernetes.io/is-default-class": "true" } }, "provisioner": "microk8s.io/hostpath", "reclaimPolicy": "Delete", "volumeBindingMode": "Immediate" } ] ``` ### `/cluster-resources/custom-resource-definitions.json` This file contains information about all installed CRDs in the cluster. ```json [ { "metadata": { "name": "clusters.clusters.replicated.com", "selfLink": "/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/clusters.clusters.replicated.com", "uid": "b1ff5bfe-7c9c-11e9-82ad-42010aa8001a", "resourceVersion": "1783952", "generation": 1, "creationTimestamp": "2019-05-22T14:20:05Z", "labels": { "controller-tools.k8s.io": "1.0" }, "annotations": { "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"apiextensions.k8s.io/v1beta1\",\"kind\":\"CustomResourceDefinition\",\"metadata\":{\"annotations\":{},\"creationTimestamp\":null,\"labels\":{\"controller-tools.k8s.io\":\"1.0\"},\"name\":\"clusters.clusters.replicated.com\"},\"spec\":{\"group\":\"clusters.replicated.com\",\"names\":{\"kind\":\"Cluster\",\"plural\":\"clusters\"},\"scope\":\"Namespaced\",\"validation\":{\"openAPIV3Schema\":{\"properties\":{\"apiVersion\":{\"description\":\"APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#resources\",\"type\":\"string\"},\"kind\":{\"description\":\"Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kinds\",\"type\":\"string\"},\"metadata\":{\"type\":\"object\"},\"spec\":{\"properties\":{\"shipApiServer\":{\"type\":\"string\"},\"token\":{\"type\":\"string\"}},\"required\":[\"shipApiServer\",\"token\"],\"type\":\"object\"},\"status\":{\"type\":\"object\"}}}},\"version\":\"v1alpha1\"},\"status\":{\"acceptedNames\":{\"kind\":\"\",\"plural\":\"\"},\"conditions\":[],\"storedVersions\":[]}}\n" } }, "spec": { "group": "clusters.replicated.com", "version": "v1alpha1", "names": { "plural": "clusters", "singular": "cluster", "kind": "Cluster", "listKind": "ClusterList" }, "scope": "Namespaced", "validation": { "openAPIV3Schema": { "properties": { "apiVersion": { "description": "APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#resources", "type": "string" }, "kind": { "description": "Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kinds", "type": "string" }, "metadata": { "type": "object" }, "spec": { "type": "object", "required": [ "shipApiServer", "token" ], "properties": { "shipApiServer": { "type": "string" }, "token": { "type": "string" } } }, "status": { "type": "object" } } } }, "versions": [ { "name": "v1alpha1", "served": true, "storage": true } ], "conversion": { "strategy": "None" } }, "status": { "conditions": [ { "type": "NamesAccepted", "status": "True", "lastTransitionTime": "2019-05-22T14:20:05Z", "reason": "NoConflicts", "message": "no conflicts found" }, { "type": "Established", "status": "True", "lastTransitionTime": null, "reason": "InitialNamesAccepted", "message": "the initial names have been accepted" } ], "acceptedNames": { "plural": "clusters", "singular": "cluster", "kind": "Cluster", "listKind": "ClusterList" }, "storedVersions": [ "v1alpha1" ] } } ] ``` ### `/cluster-resources/daemonsets/[namespace]/[name].json` This file contains information about all daemonsets, separated by namespace. ### `/cluster-resources/deployments/[namespace]/[name].json` This file contains information about all deployments, separated by namespace. ### `/cluster-resources/cronjobs/[namespace]/[name].json` This file contains information about all cronjobs, separated by namespace. ### `/cluster-resources/jobs/[namespace]/[name].json` This file contains information about all jobs, separated by namespace. ### `/cluster-resources/replicasets/[namespace]/[name].json` This file contains information about all replicasets, separated by namespace. ### `/cluster-resources/statefulsets/[namespace]/[name].json` This file contains information about all statefulsets, separated by namespace. ### `/cluster-resources/services/[namespace]/[name].json` This file contains information about all services, separated by namespace. ### `/cluster-resources/endpoints/[namespace]/[name].json` This file contains information about all endpoints, separated by namespace. ### `/cluster-resources/pods/[namespace]/[name].json` This file contains information about all pods, separated by namespace. ### `/cluster-resources/pods/logs/[namespace]/[name].json` This file contains information about all pods, separated by namespace. The maximum file size limit for a pods logfile is 5MB. ### `/cluster-resources/pods/logs/[namespace]/[pod]/[container].log` This file contains logs from current containers for pods that have terminated with an error or are crash-looping. The maximum file size limit for a pods logfile is 5MB. ### `/cluster-resources/pods/logs/[namespace]/[pod]/[container]-previous.log` This file contains logs from previous containers for pods that have terminated with an error or are crash-looping. The maximum file size limit for a pods logfile is 5MB. ### `/cluster-resources/ingress/[namespace]/[name].json` ### `/cluster-resources/configmaps/[namespace]/[name].json` This file contains information about all configmaps, separated by namespace. ### `/cluster-resources/serviceaccounts/[namespace]/[name].json` This file contains information about all serviceaccounts, separated by namespace. ### `/cluster-resources/leases/[namespace]/[name].json` This file contains information about all leases, separated by namespace. This file contains information about all ingresses, separated by namespace. ### `/cluster-resources/groups.json` This file contains information about all Kubernetes API resource groups in the cluster. The below is a partial example only. Actual results will be significantly longer. ```json [ { "name": "", "versions": [ { "groupVersion": "v1", "version": "v1" } ], "preferredVersion": { "groupVersion": "v1", "version": "v1" } }, { "name": "apiregistration.k8s.io", "versions": [ { "groupVersion": "apiregistration.k8s.io/v1", "version": "v1" }, { "groupVersion": "apiregistration.k8s.io/v1beta1", "version": "v1beta1" } ], "preferredVersion": { "groupVersion": "apiregistration.k8s.io/v1", "version": "v1" } }, ... ``` ### `/cluster-resources/resources.json` This file contains information about all Kubernetes API resources in the cluster. The below is a partial example only. Actual results will be significantly longer. ```json [ { "kind": "APIResourceList", "groupVersion": "v1", "resources": [ { "name": "bindings", "singularName": "", "namespaced": true, "kind": "Binding", "verbs": [ "create" ] }, { "name": "componentstatuses", "singularName": "", "namespaced": false, "kind": "ComponentStatus", "verbs": [ "get", "list" ], "shortNames": [ "cs" ] }, { "name": "configmaps", "singularName": "", "namespaced": true, "kind": "ConfigMap", "verbs": [ "create", "delete", "deletecollection", "get", "list", "patch", "update", "watch" ], "shortNames": [ "cm" ] }, ... ``` ### `/cluster-resources/events/\.json` > Collection of Kubernetes events was introduced in Kots 1.19.0 and Troubleshoot 0.9.42. Each file contains information about Kubernetes events in each namespace of the cluster. The below is a partial example only. Actual results will be significantly longer. ```json [ { "metadata": { "name": "coredns-5644d7b6d9-dqt6l.1630b6076a8d13b4", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/events/coredns-5644d7b6d9-dqt6l.1630b6076a8d13b4", "uid": "f0e347ac-910f-4a14-bb54-e6805425e09b", "resourceVersion": "325449", "creationTimestamp": "2020-09-01T16:33:30Z" }, "involvedObject": { "kind": "Pod", "namespace": "kube-system", "name": "coredns-5644d7b6d9-dqt6l", "uid": "6e57304c-af69-4d91-a0e3-bb15112a0e94", "apiVersion": "v1", "resourceVersion": "100939", "fieldPath": "spec.containers{coredns}" }, "reason": "Unhealthy", "message": "Readiness probe failed: Get http://***HIDDEN***:8181/ready: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)", "source": { "component": "kubelet", "host": "docker-desktop" }, "firstTimestamp": "2020-09-01T16:33:30Z", "lastTimestamp": "2020-09-01T16:33:30Z", "count": 1, "type": "Warning", "eventTime": null, "reportingComponent": "", "reportingInstance": "" } ] ``` ### `/cluster-resources/serviceaccounts/[namespace]/[name].json` This file contains information about all service accounts, separated by namespace. ### `/cluster-resources/leases/[namespace]/[name].json` This file contains information about all leases, separated by namespace. --- The `collectd` collector can be used to run a DaemonSet in the cluster with the parameters provided. The collector will delete and clean up this DaemonSet and any artifacts after it completes. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `collectd` collector accepts the following parameters: ##### `namespace` (Optional) The namespace where the DaemonSet will be created. If not specified, it will assume the "current" namespace that the kubectl context is set to. ##### `image` (Required) The image to use for the pods controlled by this DaemonSet. This image should be accessible to the nodes in the cluster. The commands `sleep` and `tar` must be available in the image. ##### `hostPath` (Required) Location of the files on the host systems. ##### `timeout` (Optional) A [duration](https://golang.org/pkg/time/#Duration) that will be honored when collecting files. Timer starts after all pods in the DaemonSet become ready. If not specified, no timeout will be used. ##### `imagePullPolicy` (Optional) A valid, string representation of the policy to use when pulling the image. If not specified, this will be set to IfNotPresent. #### `imagePullSecret` (Optional) Troubleshoot offers two possibilities to use ImagePullSecrets, either using the name of a pre-existing secret in the cluster or dynamically creating a temporary secret to extract the image and destroy it after the collector is done. The ImagePullSecret field accepts the following parameters: - If a pre-existing ImagePullSecret is used: - ##### `name` (required): The name of the pre-existing secret. ```yaml imagePullSecret: name: my-image-pull-secret ``` - If an ImagePullSecret will be created for the collector to pull the image: - ##### `name` (optional) - ##### `data` - ###### `.dockerconfigjson` (required) A string containing a valid base64-encoded docker config.json file. - ##### `type` (required) A string indicating that the secret is of type "kubernetes.io/dockerconfigjson". ```yaml imagePullSecret: name: mysecret data: .dockerconfigjson: ewoJICJhdXRocyI6IHsKCQksHR0cHM6Ly9pbmRleC5kb2NrZXIuaW8vdjEvIjoge30KCX0sCgkiSHR0cEhlYWRlcnMiOiB7CgkJIlVzZXItQWdlbnQiOiAiRG9ja2VyLUNsaWVudC8xOS4wMy4xMiAoZGFyd2luKSIKCX0sCgkiY3JlZHNTdG9yZSI6ICJkZXNrdG9wIiwKCSJleHBlcmltZW50YWwiOiAiZGlzYWJsZWQiLAoJInN0YWNrT3JjaGVzdHJhdG9yIjogInN3YXJtIgp9 type: kubernetes.io/dockerconfigjson ``` Further information about config.json file and dockerconfigjson secrets may be found [here](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/). See the examples below for use cases. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - collectd: collectorName: "collectd" image: busybox:1 namespace: default hostPath: "/var/lib/collectd/rrd" imagePullPolicy: IfNotPresent imagePullSecret: name: my-temporary-secret data: .dockerconfigjson: ewoJICJhdXRocyI6IHsKzCQksHR0cHM6Ly9pbmRleC5kb2NrZXIuaW8vdjEvIjoge30KCX0sCgkiSHR0cEhlYWRlcnMiOiB7CgkJIlVzZXItQWdlbnQiOiAiRG9ja2VyLUNsaWVudC8xOS4wMy4xMiAoZGFyd2luKSIKCX0sCgkiY3JlZHNTdG9yZSI6ICJkZXNrdG9wIiwKCSJleHBlcmltZW50YWwiOiAiZGlzYWJsZWQiLAoJInN0YWNrT3JjaGVzdHJhdG9yIjogInN3YXJtIgp9 type: kubernetes.io/dockerconfigjson ``` ## Included resources When this collector is executed, it will include the following files in a support bundle: ### `/collectd/rrd` This will contain a tar archive with rrd files or files with error information if the collector fails. --- ## Collectors Schema Each collector in the `collectors` array is one of the collectors defined in this section. ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: collectors spec: collectors: [] ``` An OpenAPI Schema for this type is published at: [https://github.com/replicatedhq/kots-lint/blob/main/kubernetes_json_schema/schema/troubleshoot/collector-troubleshoot-v1beta2.json](https://github.com/replicatedhq/kots-lint/blob/main/kubernetes_json_schema/schema/troubleshoot/collector-troubleshoot-v1beta2.json). ### Shared Properties The following properties are supported on all collectors: #### `collectorName` Optionally, a collector can specify the `collectorName` property. In some collectors this controls the path where result files will be stored in the support bundle. #### `exclude` For collectors that are optional, based on runtime available configuration, the conditional can be specified in the `exclude` property. This is useful for deployment techniques that allow templating for optional components (Helm and [KOTS](https://kots.io/vendor/packaging/template-functions/)). When this value is `true`, the collector will not be included. --- The `configMap` collector can be used to include metadata about ConfigMaps (and optionally the value) in the collected data. This collector can be included multiple times, referencing different ConfigMaps. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `configMap` collector accepts the following parameters: ##### `name` (Required if no selector) The name of the ConfigMap. ##### `selector` (Required if no name) The selector to use to locate the ConfigMaps. Specified as a list of labels. If multiple labels are specified, only resources which match ALL of the labels will be collected. > Example: ```yaml collectors: - configMap: selector: - app.kubernetes.io/name=nginx - app.kubernetes.io/component=frontend ``` ##### `namespace` (Required) The namespace where the ConfigMap exists. ##### `key` (Optional) A key within the ConfigMap. Required if `includeValue` is `true`. ##### `includeValue` (Optional) Whether to include the key value. Defaults to false. ##### `includeAllData` (Optional) Whether to include all of the key-value pairs from the ConfigMap data. Defaults to false. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - configMap: namespace: default name: my-configmap includeValue: true key: password includeAllData: true ``` ## Example ConfigMap ``` apiVersion: v1 kind: ConfigMap metadata: name: my-configmap namespace: default data: other-key: other-value password: mypass ``` ## Included resources When this collector is executed, it will include the following file in a support bundle: ### `/configmaps/[namespace]/[name]/[key].json` ```json { "namespace": "default", "name": "my-configmap", "key": "password", "configMapExists": true, "keyExists": true, "value": "mypass", "data:": { "other-key": "other-value", "password": "mypass" } } ``` If `key` is not set in the collector spec, the file will be created at: ### `/configmaps/[namespace]/[name].json` If there is an error encountered, it will include the following file: ### `/configmaps-errors/[namespace]/[name].json` ```json [ "configmaps \"my-configmap\" not found" ] ``` --- The `copyFromHost` collector can be used to copy files or an entire directory from hosts and include the contents in the collected data. This collector will collect files from all hosts in the cluster. This collector can be included multiple times to copy different files or directories. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `copyFromHost` collector accepts the following parameters: ##### `namespace` (Optional) The namespace where the DaemonSet will be created. If not specified, it will assume the "current" namespace that the kubectl context is set to. ##### `name` (Optional) The path to store the collected files. This is optional, and if not provided will default to `hostPath`. ##### `image` (Required) The image to use for the pods controlled by this DaemonSet. This image should be accessible to the nodes in the cluster. The commands `sleep` and `tar` must be available in the image. ##### `hostPath` (Required) Location of the files on the host systems. ##### `extractArchive` (Optional) By default the archive will not be extracted. Set to `true` to extract. ##### `timeout` (Optional) A [duration](https://golang.org/pkg/time/#Duration) that will be honored when collecting files. Timer starts after all pods in the DaemonSet become ready. If not specified, no timeout will be used. ##### `imagePullPolicy` (Optional) A valid, string representation of the policy to use when pulling the image. If not specified, this will be set to IfNotPresent. #### `imagePullSecret` (Optional) Troubleshoot offers two possibilities to use ImagePullSecrets, either using the name of a pre-existing secret in the cluster or dynamically creating a temporary secret to extract the image and destroy it after the collector is done. The ImagePullSecret field accepts the following parameters: - If a pre-existing ImagePullSecret is used: - ##### `name` (required): The name of the pre-existing secret. ```yaml imagePullSecret: name: my-image-pull-secret ``` - If an ImagePullSecret will be created for the collector to pull the image: - ##### `name` (optional) - ##### `data` - ###### `.dockerconfigjson` (required) A string containing a valid base64-encoded docker config.json file. - ##### `type` (required) A string indicating that the secret is of type "kubernetes.io/dockerconfigjson". ```yaml imagePullSecret: name: my-temporary-secret data: .dockerconfigjson: ewoJICJhdXRocyI6IHsKCQksHR0cHM6Ly9pbmRleC5kb2NrZXIuaW8vdjEvIjoge30KCX0sCgkiSHR0cEhlYWRlcnMiOiB7CgkJIlVzZXItQWdlbnQiOiAiRG9ja2VyLUNsaWVudC8xOS4wMy4xMiAoZGFyd2luKSIKCX0sCgkiY3JlZHNTdG9yZSI6ICJkZXNrdG9wIiwKCSJleHBlcmltZW50YWwiOiAiZGlzYWJsZWQiLAoJInN0YWNrT3JjaGVzdHJhdG9yIjogInN3YXJtIgp9 type: kubernetes.io/dockerconfigjson ``` Further information about config.json file and dockerconfigjson secrets may be found [here](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/). See the examples below for use cases. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - copyFromHost: collectorName: "copy os-release" image: busybox:1 hostPath: "/etc/os-release" imagePullPolicy: IfNotPresent imagePullSecret: name: my-temporary-secret data: .dockerconfigjson: ewoJICJhdXRocyI6IHsKzCQksHR0cHM6Ly9pbmRleC5kb2NrZXIuaW8vdjEvIjoge30KCX0sCgkiSHR0cEhlYWRlcnMiOiB7CgkJIlVzZXItQWdlbnQiOiAiRG9ja2VyLUNsaWVudC8xOS4wMy4xMiAoZGFyd2luKSIKCX0sCgkiY3JlZHNTdG9yZSI6ICJkZXNrdG9wIiwKCSJleHBlcmltZW50YWwiOiAiZGlzYWJsZWQiLAoJInN0YWNrT3JjaGVzdHJhdG9yIjogInN3YXJtIgp9 type: kubernetes.io/dockerconfigjson ``` ## Included resources When this collector is executed, it will include the following files in a support bundle: ### `/[name or hostPath]/[node-name]/archive.tar` When `extractArchive` is `false` (default), this will contain tar archives of the directory or file from all nodes. ### `/[name or hostPath]/[node-name]/[extracted-files]` When `extractArchive` is set to `true`, individual files are extracted and placed at this path instead of creating an archive. --- > The ability to copy folders was introduced in Kots 1.19.0 and Troubleshoot 0.9.42. The `copy` collector can be used to copy files or an entire folder from pods and include the contents in the collected data. This collector can be included multiple times to copy different files or folders from different pods. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `copy` collector accepts the following parameters: ##### `selector` (Required) The selector to use to locate the pod when copying files. If this selector matches more than 1 pod replica, the files will be copied out of each replica that matches the selector. ##### `namespace` (Optional) The namespace to look for the selector in. This is optional, and if not provided will default to the current namespace from the context. ##### `containerPath` (Required) The path in the container of the file(s) to copy. This supports glob syntax but can only support copying a single file. All glob patterns should match exactly one file. ##### `containerName` (Optional) When specified, this will collect files from the requested container name. For single container pods, this is not required. If a pod has multiple containers and this parameter is not provided, the files will be copied from the first container in pod that matches the selector. ##### `extractArchive` (Optional) By default the archive will not be extracted. Set to `true` to extract. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: #Copies resolv.conf file - copy: selector: - app=api namespace: default containerPath: /etc/resolv.conf containerName: api #Copies htdocs folder - copy: selector: - app=myhttpd namespace: default containerPath: /usr/local/apache2/htdocs ``` ## Included resources When this collector is executed, it will include the following files in a support bundle: ### `/[name]/[namespace]/[pod-name]/[container-name]/[path]` When a `name` is specified in the collector, it is used as a prefix to the output path. If `name` is not specified, the path starts directly with the namespace. This will contain the pod's folder or file specified in the collector. --- The `customMetrics` collector can be used to include value lists for custom metrics in the collected data. This collector can be included multiple times, requesting sets of metrics exposed through `/apis/k8s.custom.metrics.io/v1beta1/` ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `customMetrics` collector accepts the following parameters: ##### `metricRequests` (Required) A list of metrics to be collected, each request is of the following format: - ###### `namespace` For which to collect the metric values, empty for non-namespaces resources. - ###### `objectName` For which to collect metric values, all resources are considered when empty, for namespaced resources a Namespace has to be supplied regardless. - ###### `resourceMetricName` Name of the `MetricValueList` as per the `APIResourceList` from "custom.metrics.k8s.io/v1beta1" ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - customMetrics: metricRequests: - namespace: default resourceMetricName: pods/cpu_usage # objectName is empty, thus, all pods in the namespace will have their metric values collected. - namespace: my-namespace objectName: my-service resourceMetricName: services/node_memory_HugePages_Free # objectName is non-empty, thus, only the service specified will have its metric values collected. - objectName: node1 # For nodes and namespaces no namespace is specified. resourceMetricName: nodes/node_cpu_guest ``` ## Included resources When this collector is executed, it will include the following files in a support bundle: ```metrics |_ # can be Namespace, Pod, Node etc. |_ # raw metric name truncated from the resource name as per custom.metrics.k8s.io/v1beta1/ e.g. "namespaces/cpu_usage" would result in metric_name "cpu_usage" |_ .json or .json # values for namespaced resources metrics are saved together in a file named after the namespace. For non-namespaced resources, each resource has their metric values in a separate file named after the resource. ``` ### `/metrics/Pod/cpu_usage/default.json` ```json [ { "DescribedObject": { "Kind": "Pod", "Namespace": "default", "Name": "alertmanager", "UID": "", "APIVersion": "/v1", "ResourceVersion": "", "FieldPath": "" }, "Metric": { "Name": "cpu_usage", "Selector": null }, "Timestamp": "2023-05-23T14:04:48Z", "WindowSeconds": null, "Value": "1m" }, ... ] ``` ### `/metrics/Service/node_memory_HugePages_Free/my-namespace.json` ```json [ { "DescribedObject": { "Kind": "Service", "Namespace": "my-service", "Name": "kube-prometheus-stack-prometheus-node-exporter", "UID": "", "APIVersion": "/v1", "ResourceVersion": "", "FieldPath": "" }, "Metric": { "Name": "node_memory_HugePages_Free", "Selector": null }, "Timestamp": "2023-05-23T14:04:48Z", "WindowSeconds": null, "Value": "0" } ] ``` ### `/metrics/Node/node_cpu_guest/node1.json` ```json [ { "DescribedObject": { "Kind": "Node", "Namespace": "", "Name": "ip-10-0-71-154.us-west-2.compute.internal", "UID": "", "APIVersion": "/v1", "ResourceVersion": "", "FieldPath": "" }, "Metric": { "Name": "node_cpu_guest", "Selector": null }, "Timestamp": "2023-06-05T16:45:24Z", "WindowSeconds": null, "Value": "0" }, ... ] ``` If there is an error encountered, it will include the following file: ### `/metrics/errors.json` ```json [ "could not query endpoint /apis/custom.metrics.k8s.io/v1beta1/namespaces/my-namespace/services/my-service/node_memory_HugePages_Free: the server could not find the requested resource", // service specified doesn't exist "could not query endpoint /apis/custom.metrics.k8s.io/v1beta1/nodes/node1/node_cpu_guest: the server could not find the requested resource" // no node with such a name ] ``` --- The `data` collector can be used to add static content to the collected data. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `data` collector accepts the following parameters: ##### `name` (Optional) This will be used as part of the output path in the support bundle. This field is required if `collectorName` is not set. If both are set then `collectorName` will be appended to `name`. ##### `data` (Required) The contents of this field will be written to the file. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - data: name: static/data.txt data: | any static data can be used here ``` ## Included resources When the `data` collector is executed it will include the file named using the `name` property of the collector. --- > Deprecated as of v0.33.0. See the new [Run Pod](https://troubleshoot.sh/docs/collect/run-pod) collector The `run` collector can be used to run a pod in the cluster with the parameters provided. The collector will delete and clean up this pod and any artifacts after it's created. This collector can be included multiple times, each defining different commands to run. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `run` collector accepts the following parameters: ##### `name` (Optional) The name of the collector. This will be prefixed to the path that the output is written to in the support bundle. ##### `namespace` (Optional) The namespace to schedule the pod in. If not specified, it will assume the "current" namespace that the kubectl context is set to. ##### `image` (Required) The image to run when starting the pod. This should be accessible to the nodes in the cluster. ##### `command` (Required) An array of strings containing the command to use when starting the pod. ##### `args` (Optional) An array of strings containing the arguments to pass to the command when starting. ##### `timeout` (Optional) A [duration](https://golang.org/pkg/time/#Duration) that will be honored when running the pod. This cannot be greater than 30 seconds (30s) and if not specified, the default is 20s. #### `serviceAccountName` (Optional) A service account to be used as the identity for processes running in the pod. If not specified, it will assume the "default" service account. ##### `imagePullPolicy` (Optional) A valid, string representation of the policy to use when pulling the image. If not specified, this will be set to IfNotPresent. #### `imagePullSecret` (Optional) > `imagePullSecret` support was introduced in Kots 1.19.0 and Troubleshoot 0.9.42. Troubleshoot offers two possibilities to use ImagePullSecrets, either using the name of a pre-existing secret in the cluster or dynamically creating a temporary secret to extract the image and destroy it after run-collector is done. ImagePullSecret field accepts the following parameters: - If a pre-existing ImagePullSecret is used: - ##### `name` (required): The name of the pre-existing secret. ```yaml imagePullSecret: name: my-image-pull-secret ``` - If an ImagePullSecret will be created for the run collector to pull the image: - ##### `name` (optional) - ##### `data` - ###### `.dockerconfigjson` (required) A string containing a valid base64-encoded docker config.json file. - ##### `type` (required) A string indicating that the secret is of type "kubernetes.io/dockerconfigjson". ```yaml imagePullSecret: name: mysecret data: .dockerconfigjson: ewoJICJhdXRocyI6IHsKCQksHR0cHM6Ly9pbmRleC5kb2NrZXIuaW8vdjEvIjoge30KCX0sCgkiSHR0cEhlYWRlcnMiOiB7CgkJIlVzZXItQWdlbnQiOiAiRG9ja2VyLUNsaWVudC8xOS4wMy4xMiAoZGFyd2luKSIKCX0sCgkiY3JlZHNTdG9yZSI6ICJkZXNrdG9wIiwKCSJleHBlcmltZW50YWwiOiAiZGlzYWJsZWQiLAoJInN0YWNrT3JjaGVzdHJhdG9yIjogInN3YXJtIgp9 type: kubernetes.io/dockerconfigjson ``` Further information about config.json file and dockerconfigjson secrets may be found [here](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/). See the examples below for use cases. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - run: collectorName: "run-ping" image: busybox:1 namespace: default command: ["ping"] args: ["-w", "5", "www.google.com"] imagePullPolicy: IfNotPresent serviceAccountName: default ``` ## Examples using private images with `imagePullSecret` ### Using a pre-existing secret If a pull secret already exists in the cluster, you can use it by providing the run collector with the name of the secret. ```yaml spec: collectors: - run: collectorName: "myPrivateApp" image: my-private-repository/myRestApi namespace: default args: ["go", "run", "main.go"] imagePullSecret: name: mysecret ``` ### Using dockerconfigjson secrets Troubleshoot will create a temporary secret, use it to pull the image from the private repository and delete it after the run collector is completed. ```yaml spec: collectors: - run: collectorName: "myPrivateApp" image: my-private-repository/myRestApi namespace: default args: ["go", "run", "main.go"] imagePullSecret: name: my-temporary-secret data: .dockerconfigjson: ewoJICJhdXRocyI6IHsKzCQksHR0cHM6Ly9pbmRleC5kb2NrZXIuaW8vdjEvIjoge30KCX0sCgkiSHR0cEhlYWRlcnMiOiB7CgkJIlVzZXItQWdlbnQiOiAiRG9ja2VyLUNsaWVudC8xOS4wMy4xMiAoZGFyd2luKSIKCX0sCgkiY3JlZHNTdG9yZSI6ICJkZXNrdG9wIiwKCSJleHBlcmltZW50YWwiOiAiZGlzYWJsZWQiLAoJInN0YWNrT3JjaGVzdHJhdG9yIjogInN3YXJtIgp9 type: kubernetes.io/dockerconfigjson ``` ## Included resources When this collector is executed, it will include the following files in a support bundle: ### `/[name]/[collector-name].log` This will contain the pod output (up to 10000 lines). --- The `dns` collector can be used to help diagnose DNS resolution problems, such as detecting search domain misconfiguration. During execution, the collector does the following: - Output `Kubernetes` Service Cluster IP retrieved from kube-apiserver - Run a test pod of image `registry.k8s.io/e2e-test-images/agnhost:2.39`, and run `dig` command - to `kubernetes` Service and output content of `/etc/resolv.conf` - to a non-resolveable domain to check for potential wildcard DNS issue - Check if DNS pods are running - Check if DNS service is up - Check if DNS endpoints are populated - Output CoreDNS/KubeDNS config ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `dns` collector accepts the following parameters: ##### `image` (Optional) Utility image to run `dig` command. Must have `dig` installed. Defaults to `registry.k8s.io/e2e-test-images/agnhost:2.39`. ##### `nonResolvable` (Optional) A non-resolveable domain. The collector will make a DNS query to this domain. Defaults to `*`. See the examples below for use cases. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - dns: image: registry.k8s.io/e2e-test-images/agnhost:2.39 nonResolvable: "*" ``` ## Included resources When this collector is executed, it includes the following file in a support bundle: ### `/dns/debug.txt` ``` === Kubernetes Cluster IP from API Server: 10.43.0.1 === Test DNS resolution in pod registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3: === /etc/resolv.conf === search default.svc.cluster.local svc.cluster.local cluster.local nameserver 10.43.0.10 options ndots:5 === dig kubernetes === 10.43.0.1 === dig non-existent-domain === === Running kube-dns pods: coredns-77ccd57875-76dt4 === Running kube-dns service: 10.43.0.10 === kube-dns endpoints: 10.42.0.6:53 === CoreDNS config: .:53 { errors health ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa } hosts /etc/coredns/NodeHosts { ttl 60 reload 15s fallthrough } prometheus :9153 forward . /etc/resolv.conf cache 30 loop reload loadbalance import /etc/coredns/custom/*.override } ``` ### `/dns/debug.json` ```json { "kubernetesClusterIP": "10.43.0.1", "podResolvConf": "search default.svc.cluster.local svc.cluster.local cluster.local\nnameserver 10.43.0.10\noptions ndots:5\n", "query": { "kubernetes": { "name": "kubernetes", "address": "10.43.0.1" }, "nonResolvableDomain": { "name": "*", "address": "" } }, "kubeDNSPods": ["coredns-77ccd57875-76dt4"], "kubeDNSService": "10.43.0.10", "kubeDNSEndpoints": "10.42.0.6:53" } ``` --- The `etcd` collector gathers essential data to troubleshoot etcd cluster problems in Kubernetes environments. It executes a series of `etcdctl` commands to assess the health and status of your etcd cluster. During execution, the collector runs the following `etcdctl` commands on the existing etcd cluster: ```bash etcdctl endpoint health etcdctl endpoint status etcdctl member list etcdctl alarm list etcdctl version ``` This collector is compatible with the following Kubernetes distributions: - kURL - k0s ## Parameters The `etcd` collector supports the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties) and the following parameters. ##### `image` (optional) The image for the pod to run `etcdctl` commands. Default to `quay.io/coreos/etcd:latest` See the examples below for use cases. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: etcd-collector spec: collectors: - etcd: image: quay.io/coreos/etcd:latest ``` ## Included resources When this collector is executed, it includes the following files in a support bundle: ### `/etcd/endpoint-health.json` Contains the output of the command `etdctl endpoint health --write-out json`. ```json [ { "endpoint": "https://127.0.0.1:2379", "health": true, "took": "19.292721ms" } ] ``` ### /etcd/endpoint-status.json Contains the output of the command `etcdctl endpoint status --write-out json`. ```json [ { "Endpoint": "https://127.0.0.1:2379", "Status": { "header": { "cluster_id": 6711744062120372582, "member_id": 8657109746518078165, "revision": 12298, "raft_term": 2 }, "version": "3.5.12", "dbSize": 15659008, "leader": 8657109746518078165, "raftIndex": 13128, "raftTerm": 2, "raftAppliedIndex": 13128, "dbSizeInUse": 6660096 } } ] ``` ### /etcd/member-list.json Contains the output of the command `etcdctl member list --write-out json`. ```json { "header": { "cluster_id": 6711744062120372582, "member_id": 8657109746518078165, "raft_term": 2 }, "members": [ { "ID": 8657109746518078165, "name": "eff45d970", "peerURLs": ["https://:2380"], "clientURLs": ["https://:2379"] } ] } ``` ### /etcd/alarm-list.json Contains the output of the command `etcdctl alarm list --write-out json`. ```json { "header": { "cluster_id": 6711744062120372582, "member_id": 8657109746518078165, "revision": 12310, "raft_term": 2 } } ``` ### /etcd/version.json ```json etcdctl version: 3.5.1 API version: 3.5 ``` --- The `exec` collector can be used to run a command in an existing pod and include the output in the collected data. The pod to execute the command is specified as a selector in the collector definition. When the selector refers to more than one replica of a pod, the exec collector will execute in only one of the pods. This spec can be included multiple times, each defining different commands and/or label selectors to use. ## Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), the `exec` collector accepts the following parameters: ##### `name` (Optional) The name of the collector. This will be the path prefix that the output is written to in the support bundle. ##### `selector` (Required) The selector to use when locating the pod. The exec command will execute in the first pod that is returned from the Kubernetes API when queried for this label selector. ##### `namespace` (Optional) The namespace to look for the pod selector in. If not specified, it will assume the "current" namespace that the kubectl context is set to. ##### `containerName` (Optional) The name of the container in which to execute the command. If not specified, the first container in the pod will be used. This field is used only for container selection — it does not affect the output file naming. To control output file names, use `collectorName`. ##### `command` (Required) An array of strings containing the command to execute in the pod. ##### `args` (Optional) An array of strings containing the arguments to pass to the command when executing. ##### `timeout` (Optional) A [duration](https://golang.org/pkg/time/#Duration) that will be honored when executing the command. This cannot be greater than 20 seconds (20s) and if not specified, the default is 20s. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - exec: name: mysql-version selector: - app=mysql namespace: default command: ["mysql"] args: ["-V"] timeout: 5s ``` ## Included Resources When this collector is executed, it will include the following file in a support bundle. The `[collector-name]` in the paths below refers to the `collectorName` field (not `name` or `containerName`). The `name` field is used as the top-level directory prefix. ### `/[name]/[namespace]/[pod-name]/[collector-name]-stdout.txt` ``` mysql Ver 14.14 Distrib 5.6.44, for Linux (x86_64) using EditLine wrapper ``` The result of running a command in a container may produce the following files if there was an error: ### `/[name]/[namespace]/[pod-name]/[collector-name]-stderr.txt` ``` Warning: Using a password on the command line interface can be insecure.\nERROR 1064 (42000) at line 1: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'process list' at line 1 ``` ### `/[name]/[namespace]/[pod-name]/[collector-name]-errors.json` ```json [ "command terminated with exit code 1" ] ``` --- The `goldpinger` collector is used to collect pod-ping checks gathered by a [goldpinger service](https://github.com/bloomberg/goldpinger) installed in a kubernetes cluster. The collector makes a request to `/check_all` endpoint. Periodically, this service will ping pods running on every node (daemonset pods) to ensure that nodes can reach all other nodes in that cluster. It caches these information and surfaces it via http endpoints. If this collector is run within a kubernetes cluster, the collector will directly make the http request to the goldpoinger endpoint (`http://goldpinger..svc.cluster.local:80/check_all`). If not, the collector attempts to launch a pod in the cluster, configured with the `podLaunchOptions` parameter, and makes the request within the running container. If goldpinger is not installed, the collector will attempt to temporarily install it, and uninstall goldpinger once the collector has completed. ## Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), it accepts the following parameters - ##### `namespace` (Optional) The namespace where goldpinger is installed. This value is used to form the goldpinger service endpoint i.e `http://goldpinger..svc.cluster.local:80`. Defaults to the `default` namespace. - ##### `image` (optional) The image to use for the goldpinger daemonset pods if Troubleshoot has to deploy them - ##### `collectDelay` (optional) Delay collection to allow goldpinger time to start sending requests. Defaults to 0s if an existing goldpinger installation is detected, and 6s if troubleshoot installs the temporary goldpinger service. - ##### `podLaunchOptions` (Optional) Pod launch options to start a pod - ##### `namespace` (Optional) Namespace to launch the pod in. Defaults to the `default` namespace. - ##### `image` (Optional) Image to use to launch the container with. The image needs to have [wget](https://www.gnu.org/software/wget/) which is used to query the goldpinger http endpoint. Defaults to `alpine` image - ##### `imagePullSecret` (Optional) Image pull secret containing the image registry credentials. No credentials are used by default - ##### `serviceAccountName` (Optional) Name of the service account to use to when running the pod. Defaults to `default` service account ## Example Collector Definitions Collector with defaults ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: goldpinger spec: collectors: - goldpinger: {} analyzers: - goldpinger: {} ``` Collector with pod launch options ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: goldpinger spec: collectors: - goldpinger: namespace: kurl podLaunchOptions: namespace: ns-to-launch-pod image: my-tools-image:v1 imagePullSecret: reg-secret serviceAccountName: tools-account analyzers: - goldpinger: {} ``` ## Included resources Result of each collector will be stored in `goldpinger/` directory of the support bundle. ### `goldpinger/check_all.json` This file will contain the response of `/check_all` endpoint ```json { "hosts": [ { "hostIP": "100.64.0.1", "podIP": "10.32.0.9", "podName": "goldpinger-4hctt" }, { ... "podName": "goldpinger-tbdsb" }, { ... "podName": "goldpinger-jj9mw" } ], "responses": { "goldpinger-4hctt": { "HostIP": "100.64.0.1", "OK": true, "PodIP": "10.32.0.9", "response": { "podResults": { "goldpinger-4hctt": { "HostIP": "100.64.0.1", "OK": true, "PingTime": "2023-12-06T16:45:41.971Z", "PodIP": "10.32.0.9", "response": { "boot_time": "2023-12-06T14:13:58.540Z" }, "status-code": 200 }, "goldpinger-jj9mw": { ... }, "goldpinger-tbdsb": { ... } } } }, "goldpinger-jj9mw": { ... }, "goldpinger-tbdsb": { ... } } } ``` In case there is an error fetching results `goldpinger/error.txt` will contain the error message. Resulting file will contain either `goldpinger/check_all.json` or `goldpinger/error.txt` but never both. --- The helm collector will collect details about helm releases and their history. ## Parameters The helm collector has the following parameters: #### `collectorName` (Recommended) The name of the collector. #### `namespace` (Optional) The namespace of the helm release. If not specified, all namespaces will be searched or collected. #### `releaseName` (Optional) The name of the helm release. If not specified, all releases will be searched or collected. **Note:** if both `namespace` and `releaseName` are not specified, all releases in all namespaces will be collected. #### `collectValues`(Optional) If set to `true`, the values of the helm release will be collected. Defaults to `false`. ## Example Collector Definitions Collect All Helm Releases in All Namespaces: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - helm: {} ``` Collect All Helm Releases in a Specific Namespace: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - helm: namespace: "default" ``` Collect a Specific Helm Release in a Specific Namespace: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - helm: releaseName: mysql-1692919203 namespace: "default" ``` Collect a Specific Helm Release in All Namespaces: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - helm: releaseName: mysql-1692919203 ``` Collect All Helm Releases in All Namespaces with Helm Values: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - helm: collectValues: true ``` ## Included resources When this collector is executed, it will include the following files in a support bundle: `/helm/[namespace].json` ```json [ { "releaseName": "mysql-1692919203", "chart": "mysql", "chartVersion": "9.10.9", "appVersion": "8.0.34", "namespace": "default", "releaseHistory": [ { "revision": "1", "date": "2023-08-25 11:20:05.153483 +1200 NZST", "status": "deployed", "values": { "affinity": {}, "image": { "digest": "", "pullPolicy": "IfNotPresent", "pullSecrets": [], "registry": "docker.io", "repository": "bitnami/git", "tag": "2.41.0-debian-11-r76" }, }, } } ] ``` ### Fields #### `releaseName` The name of the helm release #### `chart` The name of the helm chart #### `chartVersion` The version of the helm chart #### `appVersion` The version of the helm chart application #### `namespace` The namespace of the helm release #### `releaseHistory` The history of the helm release --- The `http` collector can be used to execute HTTP requests at collection time. The collector makes requests from the network context of the process running the support bundle CLI. If the CLI runs inside a pod, requests use cluster networking (e.g. `*.svc.cluster.local` DNS resolves). If the CLI runs outside the cluster (CI runners, local machines), requests use the host network and in-cluster DNS names will not resolve. The response code and response body will be included in the collected data. The http collector can be specified multiple times in a collector spec. ## Parameters The `http` collector can be either of `get`, `post` or `put`. In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), it accepts the following parameters: - ##### `url` string (Required) The URL to make the HTTP request against. - ##### `tls.cacert` string (Optional) When present, the CA certificate to use for verifying the server's certificate. Valid options are a string containing the certificate in PEM format, or a path to a file or direcotry on disk containing the certificate. - ##### `proxy` string (Optional) When present, the proxy to use for the request. This parameter will also read from the `HTTPS_PROXY` environment variable set in the shell. The proxy address must be a valid URL in the format `scheme://[user:password@]host:port`. - ##### `insecureSkipVerify` boolean (Optional) When set to true, this will make connections to untrusted or self-signed certs. This defaults to false. - ##### `headers` string (Optional) When present, additional headers to send with the request. By default, there are no headers added to the request. - ##### `body` string (Optional) When present, the body to be send with the request. By default, there is no body included in a request. This parameter is not supported if the method is `get`. - ##### `timeout` string (Optional) When present, the timeout for the request. Expressed as a string duration, such as `30s`, `5m`, `1h`. - ##### `name` string (Optional) When present, used as a directory prefix for the output file path in the support bundle. For example, if `name` is set to `my-app`, the output file will be saved at `my-app/[collector-name].json`. ## Example Collector Definition ### GET The `get` method will issue an HTTP or HTTPS request to the specified URL The body parameter is not supported in the get method. ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - http: collectorName: healthz get: url: http://api:3000/healthz timeout: 5s - http: collectorName: proxy get: url: https://replicated.app proxy: https://proxy.example.com:3130 timeout: 15s tls: cacert: |- -----BEGIN CERTIFICATE----- -----END CERTIFICATE----- ``` ### POST/PUT The `post` and `put` methods will issue an HTTP or HTTPS POST or PUT request to the specified URL. ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - http: collectorName: service-control post: url: https://api:3000/service-control insecureSkipVerify: true body: '{"disable-service": true}' headers: Content-Type: 'application/json' ``` ## Included resources ### `[name]/[collector-name].json` The output file is stored at `[name]/[collector-name].json` in the support bundle. If the `name` field is not set, the file is stored in the root directory of the bundle. If the `collectorName` field is unset, the file will be named `result.json`. Response received from the server will be stored in the `"response"` key of the resulting JSON file: ```json { "response": { "status": 200, "body": "{\"status\": \"healthy\"}", "headers": { "Connection": "keep-alive", "Date": "Fri, 19 Jul 2019 20:13:44 GMT", "Server": "nginx/1.8.1", "Strict-Transport-Security": "max-age=31536000; includeSubDomains" }, "raw_json": { "status": "healthy" } } } ``` If the body of the response is valid JSON, it will also be saved under the key named `"raw_json"`. In case a client side error occurs and no respose is received, the error text will be stored in the error key: ```json { "error": { "message": "Put : unsupported protocol scheme \"\"" } } ``` Resulting file will contain either `response` or `error` but never both. --- Every support bundle or preflight check starts with a collect phase. During this time, information is collected from the cluster, the environment, the application and other sources to be used later in the analysis or preflight results. When designing a support bundle or preflight check, be sure to add all necessary data to the collectors. The [analyze phase](/docs/analyze/) can only use the output of the collect phase to perform analysis and provide results, however a large set of collectors are included automatically. ## Including Collectors To specify the data to include for later phases, the collect phase accepts a Kubernetes custom resource as defined here. A [full reference of the collectors](/docs/collect/collectors/) is also available. Collectors are specified inside either a Preflight or a SupportBundle YAML file. To build a set of collectors, start with a Kubernetes YAML file: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: my-application-name spec: collectors: [] ``` or ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: my-application-name spec: collectors: [] ``` The above file is a simple, but valid set of collectors. It will collect only the default data. To add additional collectors, specify each one in the `collectors` array. Each collector can be included multiple times, if there are different sets of options to use. For example, a complete spec might be: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: my-application-name spec: collectors: - clusterInfo: {} - clusterResources: {} - logs: selector: - app=api namespace: default limits: maxAge: 720h maxLines: 1000 - http: name: healthz get: url: http://api:3000/healthz - exec: name: mysql-version selector: - app=mysql namespace: default command: ["mysql"] args: ["-V"] timeout: 5s ``` --- The `logs` collectors can be used to include logs from running pods. This collector can be included multiple times with different label selectors and/or namespaces. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `logs` collector accepts the following parameters: ##### `selector` (Required) The selector to use to find matching pods. If this selector returns more than one pod, all matching pods will be collected. Multiple selectors are possible, but the results are a logical `AND`, not a logical `OR`, meaning that the resulting set of pod may not contain the expected results (possibly no results). It is best to use separate collectors to generate individual logs for different service pods. **Example** The pod labels in `service1-deployment.yaml`: ```yaml metadata: name: service1 labels: app.kubernetes.io/name: service1 ``` The collector YAML in `support-bundle.yaml`: ```yaml collectors: - logs: selector: - app.kubernetes.io/name=service1 name: service1/logs ``` ##### `namespace` (Optional) The namespace to search for the pod selector in. If this is not provided, it will default to the current namespace of the context. ##### `name` (Required) Name will be used to create a folder in the support bundle where logs will be saved. Name can contain slashes to create a path in the support bundle. ##### `containerNames` (Optional) ContainerNames is an array of container names. If specified, logs for each container in the list will be collected. If omitted, logs for **all** containers in the pod are collected, including init containers. ##### `limits` (Optional) Provided to limit the size of the logs. By default, this is set to `maxLines: 10000`. Either `maxAge` or `maxLines` can be provided, but not both. ##### `limits.maxAge` The duration of the maximum oldest log to include. Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h". For duration string format see [time.ParseDuration](https://pkg.go.dev/time#ParseDuration). ##### `limits.maxLines` The number of lines to include, starting from the newest. ##### `limits.maxBytes` The maximum file size of a collected pod log. Defaults to an integer value of `5000000` bytes, which is 5MB. The value can only be set as an integer value for the `maxBytes` limit. ##### `timestamps` (Optional) When set to `true`, each log line will be prefixed with an RFC3339 timestamp from the Kubernetes API. Defaults to `false`. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - logs: selector: - app=api namespace: default name: api/container/logs containerNames: - api - node limits: maxAge: 720h - logs: selector: - app=backend namespace: default name: backend/container/logs containerNames: - db limits: maxLines: 1000 maxBytes: 5000000 ``` ## Included resources When this collector is executed, it will include the following files in a support bundle: ### `/cluster-resources/pods/logs/[namespace]/[pod-name]/[container-name].log` The actual log files are stored under `cluster-resources/pods/logs/`. A symlink is created at `/[name]/[pod-name]/[container-name].log` pointing to the actual file, so logs can be accessed via either path. This will be created for each pod that matches the selector. If any errors are encountered, the following file will be created: ### `/[name]/[pod-name]/[container-name]-errors.json` ```json [ "failed to get log stream: container node is not valid for pod api-6fd69d8f78-tmtf7" ] ``` --- This collector will add information about the [longhorn](https://longhorn.io/) storage provider in a cluster. ## Parameters The longhorn collector has the following parameters: #### `collectorName` (Optional) The name of the collector. #### `namespace` (Optional) The namespace where longhorn is running. If this is not provided, it will default to `longhorn-system`. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - longhorn: {} ``` ## Included resources ### `/longhorn/[namespace]/settings.yaml` A copy of all settings.longhorn.io will be aggregated into a single yaml file. ### `/longhorn/[namespace]/logs/[pod]` A separate file for the logs of every container of every pod in the namespace will be found in the logs directory. ### `/longhorn/[namespace]/volumes/[volume-name]/replicachecksums/[replica].txt` Checksums of each replica of a volume will be collected if the following conditions are met: 1. The volume is in a detached state. To put a volume into detached state delete any pods consuming the volume's PVC prior to running the collector. 1. The volume has multiple replicas. Volumes in single-node Kubernetes clusters will not have multiple replicas. [More information](https://longhorn.io/docs/1.1.1/advanced-resources/data-recovery/corrupted-replica/) ### Custom Resources #### `/longhorn/[namespace]/nodes/[node-name].yaml` Each nodes.longhorn.io resource in the namespace will be in a separate yaml file. #### `/longhorn/[namespace]/volumes/[volume-name].yaml` Each volumes.longhorn.io resource in the namespace will be in a separate yaml file. #### `/longhorn/[namespace]/replicas/[replica-name].yaml` Each replicas.longhorn.io resource in the namespace will be in a separate yaml file. #### `/longhorn/[namespace]/engines/[engine-name].yaml` Each engines.longhorn.io resource in the namespace will be in a separate yaml file. #### `/longhorn/[namespace]/engineimages/[engineimage-name].yaml` Each engineimages.longhorn.io resource in the namespace will be in a separate yaml file. #### `/longhorn/[namespace]/instancemanagers/[instancemanager-name].yaml` Each instancemanagers.longhorn.io resource in the namespace will be in a separate yaml file. #### `/longhorn/[namespace]/backingimagemanagers/[backingimagemanager-name].yaml` Each backingimagemanagers.longhorn.io resource in the namespace will be in a separate yaml file. #### `/longhorn/[namespace]/backingimages/[backingimages-name].yaml` Each backingimages.longhorn.io resource in the namespace will be in a separate yaml file. #### `/longhorn/[namespace]/sharemanagers/[sharemanagers-name].yaml` Each sharemanagers.longhorn.io resource in the namespace will be in a separate yaml file. --- The data collector will validate and add information about a MS SQL server to a support bundle. ## Parameters The data collector has the following parameters: #### `collectorName` (Recommended) The name of the collector. This is recommended to set to a string identifying the MS SQL instance, and can be used to refer to this collector in analyzers and preflight checks. If unset, this will be set to the string "mssql". #### `uri` (Required) The connection URI to use when connecting to the MS SQL server. ## Example Collector Definitions Plain text connection to a server: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: sample spec: collectors: - mssql: collectorName: mssql uri: sqlserver://username:password@hostname:1433/defaultdb ``` ## Included resources A single JSON file will be added to the support bundle, in the path `/mssql/[collector-name].json`: ```json { "isConnected": false, "error": "invalid password", "version": "15.0.2000.1565", } ``` ### Fields #### `isConnected` a boolean indicating if the collector was able to connect and authenticate using the connection string provided. #### `error` a string that indicates the connection error, if there was one #### `version` when connected, a string indicating the version of MS SQL that's running --- The `mysql` collector will validate and add information about a MySQL server to a support bundle. ## Parameters The `mysql` collector has the following parameters: #### `collectorName` (Recommended) The name of the collector. This is recommended to set to a string identifying the MySQL instance, and can be used to refer to this collector in analyzers and preflight checks. If unset, this will be set to the string "mysql". #### `uri` (Required) The connection URI to use when connecting to the MySQL server. #### `parameters` (Optional) A list of variables to return as a result of the `SHOW VARIABLES` query. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - mysql: collectorName: mysql uri: 'testuser:password@tcp(mysql:3306)/dbname?tls=false' parameters: - character_set_server - collation_server - init_connect - innodb_file_format - innodb_large_prefix - innodb_strict_mode - log_bin_trust_function_creators ``` ## Included resources A single JSON file will be added to the support bundle, in the path `/mysql/[collector-name].json`: ```json { "isConnected": false, "error": "invalid password", "version": "5.6.49", "variables": { "character_set_server": "utf8mb4", "collation_server": "utf8mb4_0900_ai_ci", "init_connect": "", "innodb_strict_mode": "ON", "log_bin_trust_function_creators": "OFF" } } ``` ### Fields #### `isConnected` a boolean indicating if the collector was able to connect and authenticate using the connection string provided. #### `error` a string that indicates the connection error, if there was one #### `version` when connected, a string indicating the version of MySQL that's running #### `variables` A filtered list of variables returned from the `SHOW VARIABLES` query. --- The `nodeMetrics` collector is used to gather [node metrics](https://kubernetes.io/docs/reference/instrumentation/node-metrics/) collected by the `kubelet` and stored in Kubernetes. These metrics include resource utilization stats of pods reported by container runtimes and node stats collected by kubelet processes running on nodes. These metrics are collected by calling the equivalent of `kubectl get --raw "/api/v1/nodes//proxy/stats/summary"`. ## Parameters By default, if no parameters are defined, the collector collects metrics for all nodes. In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `nodeMetrics` collector accepts the following parameters: ##### `nodeNames` (Optional) List of nodes to filter by. ##### `selector` (Optional) Label selector to filter nodes by. ## Example Collector Definition Without any parameter, the collector collects metrics for all nodes: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle spec: collectors: - nodeMetrics: {} ``` The following example shows filtering by a list of node names and label selectors. In this case, the results of the filters are added up and metrics are collected for all the nodes found: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle spec: collectors: - nodeMetrics: nodeNames: - worker-1 selector: - node-role.kubernetes.io/control-plane=true ``` ## Included resources When this collector is executed, it includes `/node-metrics/.json` file for metrics for each node. Below is a minified output: ```json { "node": { "nodeName": "k3d-mycluster-server-0", "cpu": { "usageNanoCores": 92206401, ... }, "memory": { "availableBytes": 7584931840, ... }, "network": { "interfaces": [ { "name": "tunl0", ... }, ... ] }, "pods": [ { "podRef": { "name": "svclb-traefik-e4d64c5d-ngq4m", "namespace": "kube-system", "uid": "ea9bb709-63c4-482e-a484-ab6acab11afa" }, "startTime": "2024-03-28T12:40:51Z", "containers": [ { "name": "lb-tcp-443", "startTime": "2024-03-28T12:40:54Z", "cpu": { "usageCoreNanoSeconds": 9622000 ... }, "memory": { "usageBytes": 311296, ... }, }, ... ], "cpu": { "usageCoreNanoSeconds": 27588000 ... }, "ephemeral-storage": { "availableBytes": 52204228608, ... }, ] } --- The data collector will validate and add information about a PostgreSQL server to a support bundle. ## Parameters The data collector has the following parameters: #### `collectorName` (Recommended) The name of the collector. This is recommended to set to a string identifying the PostgreSQL instance, and can be used to refer to this collector in analyzers and preflight checks. If unset, this will be set to the string "postgres". #### `uri` (Required) The connection URI to use when connecting to the PostgreSQL server. The PostgreSQL collector uses Golang's [`pgx.ParseConfig()`](https://pkg.go.dev/github.com/jackc/pgx/v4#ParseConfig) which expects URL-encoded connection strings. If your password contains special characters, like `@`, `#`, `&`, etc., you may need to URL-encode the password. See the [URL encoding](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING) documentation for more details. #### `tls` (Optional) TLS parameters are required whenever connections to the target postgres server are encrypted using TLS. The server can be configured to authenticate clients (`mTLS`) or to secure the connection (`TLS`). In `mTLS` mode, the required parameters are `client certificate`, `private key` and a `CA certificate`. If the server is configured to encrypt only the connection, then only the `CA certificate` is required. When the `skipVerify` option is set to `true`, then verifying the server certificate can be skipped. The `skipVerify` option is available only in `TLS` mode. **Note:** Parameters to pass in Certificate Revocation Lists (CRL) and Online Certificate Status Protocol (OSCP) links are not supported. ## Example Collector Definitions Plain text connection to a server: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - postgres: collectorName: pg uri: postgresql://user:password@hostname:5432/defaultdb?sslmode=require ``` URL-encoded password with special characters, using [Helm's `urlquery` function](https://helm.sh/docs/chart_template_guide/function_list/#urlquery): ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - postgres: collectorName: pg uri: 'postgresql://{{ $db_user | urlquery }}:{{ $db_pass | urlquery }}@{{ $db_host }}:{{ $db_port }}/{{ $db_name }}' ``` Secured (`mTLS`) connection to a server with inline TLS parameter configurations. The parameters must be in `PEM` format: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - postgres: collectorName: pg uri: postgresql://user:password@hostname:5432/defaultdb?sslmode=require tls: cacert: | -----BEGIN CERTIFICATE----- ... ... -----END CERTIFICATE----- clientCert: | -----BEGIN CERTIFICATE----- ... ... -----END CERTIFICATE----- clientKey: | -----BEGIN RSA PRIVATE KEY----- ... ... -----END RSA PRIVATE KEY----- ``` Secured (`mTLS`) connection to a server with TLS parameters stored in a Kubernetes secret as `stringData`. The parameters must be in `PEM` format: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: sample spec: collectors: - postgres: collectorName: my-db uri: postgresql://user:password@hostname:5432/defaultdb?sslmode=require tls: secret: # This secret must contain the following keys: # cacert: # clientCert: if mTLS # clientKey: if mTLS name: pg-tls-secret namespace: default ``` Encrypted (`TLS`) connection to a server with TLS parameters inline. The parameters must be in `PEM` format. In this case, the server is configured not to authenticate clients: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: dbs-collector spec: collectors: - postgres: collectorName: my-db uri: postgresql://user:password@hostname:5432/defaultdb?sslmode=require tls: cacert: | -----BEGIN CERTIFICATE----- ... ... -----END CERTIFICATE----- ``` Skip verification of the server certificate when creating an encrypted connection. This works only if the postgres server is configured not to authenticate clients. The connection remains encrypted: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: dbs-collector spec: collectors: - postgres: collectorName: my-db uri: postgresql://user:password@hostname:5432/defaultdb?sslmode=require tls: skipVerify: true ``` ## Included resources A single JSON file will be added to the support bundle, in the path `/postgres/[collector-name].json`: ```json { "isConnected": false, "error": "invalid password", "version": "10.12", } ``` ### Fields #### `isConnected` a boolean indicating if the collector was able to connect and authenticate using the connection string provided. #### `error` a string that indicates the connection error, if there was one #### `version` when connected, a string indicating the version of PostgreSQL that's running --- The data collector will validate and add information about a Redis server to a support bundle. ## Parameters The data collector has the following parameters: #### `collectorName` (Recommended) The name of the collector. This is recommended to set to a string identifying the Redis instance, and can be used to refer to this collector in analyzers and preflight checks. If unset, this will be set to the string "redis". #### `uri` (Required) The connection URI to use when connecting to the Redis server. You can use `redis://` for standard connections and `rediss://` for SSL connections. #### `tls` (Optional) TLS parameters are required whenever connections to the target redis server are encrypted using TLS. The server can be configured to authenticate clients (`mTLS`) or to secure the connection (`TLS`). In `mTLS` mode, the required parameters are `client certificate`, `private key` and a `CA certificate`. If the server is configured to encrypt only the connection, then only the `CA certificate` is required. When the `skipVerify` option is set to `true`, then verifying the server certificate can be skipped. The `skipVerify` option is available only in `TLS` mode. **Note:** Parameters to pass in Certificate Revocation Lists (CRL) and Online Certificate Status Protocol (OSCP) links are not supported. ## Example Collector Definitions Plain text connection to a server: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - redis: collectorName: redis uri: redis://default:password@hostname:6379 ``` Secured (`mTLS`) connection to a server with inline TLS parameter configurations. The parameters must be in `PEM` format: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - redis: collectorName: redis uri: rediss://default:password@server:6379 tls: cacert: | -----BEGIN CERTIFICATE----- ... ... -----END CERTIFICATE----- clientCert: | -----BEGIN CERTIFICATE----- ... ... -----END CERTIFICATE----- clientKey: | -----BEGIN RSA PRIVATE KEY----- ... ... -----END RSA PRIVATE KEY----- ``` Secured (`mTLS`) connection to a server with TLS parameters stored in a Kubernetes secret as `stringData`. The parameters must be in `PEM` format: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: sample spec: collectors: - redis: collectorName: redis uri: rediss://default:replicated@server:6379 tls: secret: # This secret must contain the following keys: # cacert: # clientCert: if mTLS # clientKey: if mTLS name: redis-tls-secret namespace: default ``` Encrypted (`TLS`) connection to a server with TLS parameters inline. The parameters must be in `PEM` format. In this case, the server is configured not to authenticate clients: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: dbs-collector spec: collectors: - redis: collectorName: my-redis uri: rediss://default:replicated@server:6380 tls: cacert: | -----BEGIN CERTIFICATE----- ... ... -----END CERTIFICATE----- ``` Skip verification of the server certificate when creating an encrypted connection. This works only if the redis server is configured not to authenticate clients. The connection remains encrypted: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: dbs-collector spec: collectors: - redis: collectorName: my-redis uri: rediss://default:replicated@server:6380 tls: skipVerify: true ``` ## Included resources A single JSON file will be added to the support bundle, in the path `/redis/[collector-name].json`: ```json { "isConnected": false, "error": "invalid password", "version": "10.12.0", } ``` ### Fields #### `isConnected` a boolean indicating if the collector was able to connect and authenticate using the connection string provided. #### `error` a string that indicates the connection error, if there was one #### `version` when connected, a string indicating the version of Redis that's running --- The `registryImages` collector will attempt to get image manifest to validate its existence. ## Parameters The `registryImages` collector has the following parameters: #### `collectorName` The name of the collector. This is a string used to generate the output file name. If unset, this will be set to the string "images", and the output will be stored in `/registry/images.json` file. #### `images` (Required) The list of images to validate. #### `imagePullSecret` (Optional) Image to be used with private images. If no pull secret is provided, private images cannot be validated and the resulting report will contain corresponding errors. #### `namespace` (Optional) If the `imagePullSecret` parameter specifies a secret name, this parameter can be used to specify the namespace where the secret is located. If not specified, the `default` namespace will be used. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - registryImages: namespace: test imagePullSecret: type: kubernetes.io/dockerconfigjson name: test-secret images: - "alpine:3.9" - "private-registry.someorg.com/ns/private-image:latest" ``` ## Included resources A single JSON file will be added to the support bundle, in the path `/registry/images.json`: ```json { "images": { "alpine:3.9": { "exists": true }, "private-registry.someorg.com/ns/private-image:latest": { "exists": false } } } ``` ### Fields For each image in the map, the `exists` flag will be set to `true` or `false` depending in the image status. If image existance could not be determined due to an error, the `error` field will be present for corresponding images. --- The `runDaemonSet` collector can be used to run a DaemonSet in the cluster with the parameters provided. The collector is designed as run-once DaemonSet, able to collect information on every node in the cluster. After all pods in the DaemonSet stop, the collection is completed. The collector deletes and cleans up the DaemonSet and any artifacts after it's created. This collector can be included multiple times, each defining different commands to run. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `runDaemonSet` collector accepts the following parameters: ##### `name` (Optional) The name of the collector. The collector name is prefixed to the path that the output is written to in the support bundle. This is also used as the name of the pod and must meet pod naming criteria. ##### `namespace` (Optional) The namespace to schedule the pod in. If not specified, the "default" namespace is used. ##### `podSpec` (Required) The `corev1.PodSpec` for the `runDaemonSet` collector. See the [Kubernetes API Reference](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec) for all available properties. ##### `timeout` (Optional) A [duration](https://golang.org/pkg/time/#Duration) that is honored when running the pod. If the timeout elapses, the pod is terminated. Troubleshoot waits for a maximum of `60s` for the DaemonSet to safely terminate, then forcefully deletes it. #### `imagePullSecret` (Optional) Troubleshoot offers the ability to use ImagePullSecrets, either using the name of a pre-existing secret in the `podSpec` or dynamically creating a temporary secret to extract the image and destroy it after run-collector is done. The ImagePullSecret field at the collector level accepts the following parameters: - ##### `name` (optional) - ##### `data` - ###### `.dockerconfigjson` (required) A string containing a valid base64-encoded docker config.json file. - ##### `type` (required) A string indicating that the secret is of type "kubernetes.io/dockerconfigjson". To let Troubleshoot create an ImagePullSecret for the run collector to pull the image: ```yaml imagePullSecret: name: mysecret data: .dockerconfigjson: ewoJICJhdXRocyI6IHsKCQksHR0cHM6Ly9pbmRleC5kb2NrZXIuaW8vdjEvIjoge30KCX0sCgkiSHR0cEhlYWRlcnMiOiB7CgkJIlVzZXItQWdlbnQiOiAiRG9ja2VyLUNsaWVudC8xOS4wMy4xMiAoZGFyd2luKSIKCX0sCgkiY3JlZHNTdG9yZSI6ICJkZXNrdG9wIiwKCSJleHBlcmltZW50YWwiOiAiZGlzYWJsZWQiLAoJInN0YWNrT3JjaGVzdHJhdG9yIjogInN3YXJtIgp9 type: kubernetes.io/dockerconfigjson ``` For more information about the config.json file and dockerconfigjson secrets, see [Pull an Image from a Private Registry ](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/) in the Kubernetes documentation. To use an existing ImagePullSecret: ```yaml podSpec: containers: - args: ["go", "run", "main.go"] image: my-private-repository/myRestApi imagePullSecrets: - name: my-image-pull-secret ``` See the examples below for use cases. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - runDaemonSet: name: "connectivity" namespace: default podSpec: containers: - name: connectivity-test image: curlimages/curl args: ["-IsL", "www.google.com"] ``` ## Example using a private image with `imagePullSecret` ### Using dockerconfigjson secrets Troubleshoot creates a temporary secret, uses it to pull the image from the private repository, and deletes it after the run collector is completed. ```yaml spec: collectors: - runDaemonSet: name: "myPrivateApp" namespace: default imagePullSecret: name: my-temporary-secret data: .dockerconfigjson: ewoJICJhdXRocyI6IHsKzCQksHR0cHM6Ly9pbmRleC5kb2NrZXIuaW8vdjEvIjoge30KCX0sCgkiSHR0cEhlYWRlcnMiOiB7CgkJIlVzZXItQWdlbnQiOiAiRG9ja2VyLUNsaWVudC8xOS4wMy4xMiAoZGFyd2luKSIKCX0sCgkiY3JlZHNTdG9yZSI6ICJkZXNrdG9wIiwKCSJleHBlcmltZW50YWwiOiAiZGlzYWJsZWQiLAoJInN0YWNrT3JjaGVzdHJhdG9yIjogInN3YXJtIgp9 type: kubernetes.io/dockerconfigjson podSpec: containers: - args: ["go", "run", "main.go"] image: my-private-repository/myRestApi ``` ## Included resources When this collector is executed, it includes the following files in a support bundle: ### `/[collector-name]/[node-name].log` This contains the pod log for every node in the DaemonSet. --- > Looking for the old Run collector? See: [Run (Deprecated)](https://troubleshoot.sh/docs/collect/deprecated/run) The `runPod` collector can be used to run a pod in the cluster with the parameters provided. The collector will delete and clean up this pod and any artifacts after it's created. This collector can be included multiple times, each defining different commands to run. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `runPod` collector accepts the following parameters: ##### `name` (Optional) The name of the collector. This will be prefixed to the path that the output is written to in the support bundle. This is also used as the name of the pod and must meet pod naming criteria ##### `namespace` (Optional) The namespace to schedule the pod in. If not specified, the "default" namespace will be used ##### `annotations` (Optional) Annotations to add to the pod. These can be used to attach metadata to the pod, which can be useful for various purposes such as monitoring, logging, or custom workflows. ##### `podSpec` (Required) > Introduced in Troubleshoot v0.33.0. The `corev1.PodSpec` for the `runPod` collector. See the [Kubernetes API Reference](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec) for all available properties. ##### `timeout` (Optional) A [duration](https://golang.org/pkg/time/#Duration) that will be honored when running the pod. If the timeout elapses, the pod is terminated. Troubleshoot waits for a maximum of `60s` for the pod to safely terminate, then forcefully deletes it. If the timeout is not specified, the collector will run for as long as the underlying command does. ##### `allowImagePullRetries` (Optional) A boolean field that controls whether ImagePullBackOff conditions should respect the configured timeout instead of failing immediately. - When `false` (default): Maintains existing behavior - fails immediately on ImagePullBackOff - When `true`: Waits for the configured timeout, allowing image pull retries to potentially succeed #### `imagePullSecret` (Optional) Troubleshoot offers the ability to use ImagePullSecrets, either using the name of a pre-existing secret in the `podSpec` or dynamically creating a temporary secret to extract the image and destroy it after run-collector is done. The ImagePullSecret field at the collector level accepts the following parameters: - ##### `name` (optional) - ##### `data` - ###### `.dockerconfigjson` (required) A string containing a valid base64-encoded docker config.json file. - ##### `type` (required) A string indicating that the secret is of type "kubernetes.io/dockerconfigjson". To let Troubleshoot create an ImagePullSecret for the run collector to pull the image: ```yaml imagePullSecret: name: mysecret data: .dockerconfigjson: ewoJICJhdXRocyI6IHsKCQksHR0cHM6Ly9pbmRleC5kb2NrZXIuaW8vdjEvIjoge30KCX0sCgkiSHR0cEhlYWRlcnMiOiB7CgkJIlVzZXItQWdlbnQiOiAiRG9ja2VyLUNsaWVudC8xOS4wMy4xMiAoZGFyd2luKSIKCX0sCgkiY3JlZHNTdG9yZSI6ICJkZXNrdG9wIiwKCSJleHBlcmltZW50YWwiOiAiZGlzYWJsZWQiLAoJInN0YWNrT3JjaGVzdHJhdG9yIjogInN3YXJtIgp9 type: kubernetes.io/dockerconfigjson ``` Further information about config.json file and dockerconfigjson secrets may be found [here](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/). - To use an existing ImagePullSecret: ```yaml podSpec: containers: - args: ["go", "run", "main.go"] image: my-private-repository/myRestApi imagePullSecrets: - name: my-image-pull-secret ``` See the examples below for use cases. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - runPod: name: "run-ping" namespace: default annotations: example.com/annotation-key: "annotation-value" podSpec: containers: - name: run-ping image: busybox:1 command: ["ping"] args: ["-w", "5", "www.google.com"] ``` ## Example using a private images with `imagePullSecret` ### Using dockerconfigjson secrets Troubleshoot will create a temporary secret, use it to pull the image from the private repository and delete it after the run collector is completed. ```yaml spec: collectors: - runPod: name: "myPrivateApp" namespace: default imagePullSecret: name: my-temporary-secret data: .dockerconfigjson: ewoJICJhdXRocyI6IHsKzCQksHR0cHM6Ly9pbmRleC5kb2NrZXIuaW8vdjEvIjoge30KCX0sCgkiSHR0cEhlYWRlcnMiOiB7CgkJIlVzZXItQWdlbnQiOiAiRG9ja2VyLUNsaWVudC8xOS4wMy4xMiAoZGFyd2luKSIKCX0sCgkiY3JlZHNTdG9yZSI6ICJkZXNrdG9wIiwKCSJleHBlcmltZW50YWwiOiAiZGlzYWJsZWQiLAoJInN0YWNrT3JjaGVzdHJhdG9yIjogInN3YXJtIgp9 type: kubernetes.io/dockerconfigjson podSpec: containers: - args: ["go", "run", "main.go"] image: my-private-repository/myRestApi ``` ## Included resources When this collector is executed, it will include the following files in a support bundle: ### `/[collector-name]/[collector-name].log` This will contain the pod output (up to 10000 lines). ### `/[collector-name]/[collector-name].json` This will contain the pod status details in JSON format. ### `/[collector-name]/[collector-name]-events.json` This will contain Kubernetes events related to the pod in JSON format. --- The `s3Status` collector validates connectivity to an S3 or S3-compatible (e.g. MinIO) bucket using the provided credentials and adds the result to a support bundle. ## Parameters The `s3Status` collector has the following parameters: #### `collectorName` (Recommended) The name of the collector. This is recommended to set to a string identifying the S3 bucket, and can be used to refer to this collector in analyzers and preflight checks. If unset, this will be set to the string "s3Status". #### `bucketName` (Required) The name of the S3 bucket to check. #### `endpoint` (Optional) The endpoint URL for the S3-compatible service. Required for non-AWS S3-compatible services such as MinIO. #### `region` (Optional) The AWS region where the bucket is located. Defaults to `us-east-1` if not specified. #### `accessKeyID` (Optional) The access key ID for authenticating with the S3 service. #### `secretAccessKey` (Optional) The secret access key for authenticating with the S3 service. #### `usePathStyle` (Optional) When set to `true`, forces the use of path-style addressing (e.g. `https://endpoint/bucket`) instead of virtual-hosted-style (e.g. `https://bucket.endpoint`). This is required for most S3-compatible services such as MinIO. #### `insecure` (Optional) When set to `true`, TLS certificate verification is skipped. Use this for testing with self-signed certificates. ## Example Collector Definitions Check an AWS S3 bucket: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - s3Status: collectorName: my-s3-bucket bucketName: my-app-data region: us-west-2 accessKeyID: AKIAIOSFODNN7EXAMPLE secretAccessKey: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY ``` Check a MinIO bucket: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - s3Status: collectorName: my-minio-bucket bucketName: my-app-data endpoint: https://minio.example.com region: us-east-1 accessKeyID: minioadmin secretAccessKey: minioadmin usePathStyle: true ``` Check a MinIO bucket with a self-signed certificate: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - s3Status: collectorName: my-minio-bucket bucketName: my-app-data endpoint: https://minio.example.com accessKeyID: minioadmin secretAccessKey: minioadmin usePathStyle: true insecure: true ``` ## Included resources A single JSON file will be added to the support bundle, in the path `/s3Status/[collector-name].json`: ```json { "bucketName": "my-app-data", "endpoint": "https://minio.example.com", "region": "us-east-1", "isConnected": false, "error": "operation error S3: HeadBucket, StatusCode: 403, Forbidden" } ``` ### Fields #### `bucketName` The name of the S3 bucket that was checked. #### `endpoint` The endpoint URL that was used, if provided. #### `region` The AWS region that was used. #### `isConnected` A boolean indicating if the collector was able to connect to and access the bucket using the credentials provided. #### `error` A string that indicates the connection error, if there was one. --- The `secret` collector can be used to include metadata about Secrets (and optionally the value) in the collected data. This collector can be included multiple times, referencing different Secrets. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `secret` collector accepts the following parameters: ##### `name` (Required if no selector) The name of the Secret. ##### `selector` (Required if no name) The selector to use to locate the Secrets. ##### `namespace` (Required) The namespace where the Secret exists. ##### `key` (Optional) A key within the Secret. Required if `includeValue` is `true`. ##### `includeValue` (Optional) Whether to include the key value. Defaults to false. ##### `includeAllData` (Optional) Whether to include all of the key-value pairs from the Secret data. When set to `true`, all secret key-value pairs are collected and converted from `[]byte` to `string`. This takes precedence over key-specific collection. Defaults to false. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - secret: namespace: default name: my-secret includeValue: true key: password ``` ## Example with includeAllData ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - secret: namespace: default name: my-app-config includeAllData: true ``` ## Usage Examples Collect all key-value pairs from a specific secret: ```yaml - secret: name: my-app-config namespace: default includeAllData: true ``` Collect all data from secrets matching a selector: ```yaml - secret: namespace: default selector: ["app=my-app"] includeAllData: true ``` ## Included resources When this collector is executed, it will include the following file in a support bundle: ### `/secrets/[namespace]/[name]/[key].json` When collecting a specific key: ```json { "namespace": "default", "name": "my-secret", "key": "password", "secretExists": true, "keyExists": true, "value": "mypass" } ``` ### `/secrets/[namespace]/[name].json` When `includeAllData` is set to `true`, the JSON output includes a `data` field: ```json { "namespace": "default", "name": "my-app-config", "secretExists": true, "data": { "database-password": "supersecret123", "api-key": "abc123xyz", "jwt-secret": "my-signing-key" } } ``` If `key` is not set in the collector spec and `includeAllData` is not enabled, the file will be created at: ### `/secrets/[namespace]/[name].json` If there is an error encountered, it will include the following file: ### `/secrets-errors/[namespace]/[name].json` ```json [ "secrets \"my-secret\" not found" ] ``` --- The `sonobuoy` collector is used to download [sonobuoy](https://sonobuoy.io/) results tarballs from the cluster. Sonobuoy must have already been run in the cluster and the results available. This collector is equivalent to running the [sonobuoy retrieve](https://sonobuoy.io/docs/v0.57.1/cli/sonobuoy_retrieve/) cli command. This tarball can later be analyzed with the [sonobuoy results](https://sonobuoy.io/docs/v0.57.1/cli/sonobuoy_results/) cli command. ## Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), it accepts the following parameters - ##### `namespace` (Optional) The namespace where sonobuoy is installed. ## Example Collector Definitions Collector with defaults ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sonobuoy spec: collectors: - sonobuoy: {} ``` Collector with namespace options ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sonobuoy spec: collectors: - sonobuoy: namespace: sonobuoy-custom ``` ## Included resources Result tarballs from the collector will be stored in `sonobuoy/` directory of the support bundle. --- The `supportBundleMetadata` collector reads all key-value pairs from the `replicated-support-metadata` Kubernetes Secret and includes them in the support bundle. This is useful for including details about the installation in support bundles. For example, you could include metadata about the customer's license entitlements, their support tier, or their installation environment. The `supportBundleMetadata` secret is created automatically by Replicated SDK versions 1.18.0 and later. For more information about distributing the Replicated SDK with your application, see [About the Replicated SDK](https://docs.replicated.com/vendor/replicated-sdk-overview) in the Replicated documentation. You can also create the secret manually. For information about the required format of the secret, see [Secret Format]({#secret-format}) on this page. The secret name `replicated-support-metadata` is fixed and cannot be changed. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `supportBundleMetadata` collector accepts the following parameters: ##### `namespace` (Required) The namespace where the `replicated-support-metadata` Secret exists. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - supportBundleMetadata: namespace: example ``` ## Included resources When this collector is executed, it includes the following file in a support bundle: ### `/metadata/cluster.json` ```json { "appVersion": "1.2.3", "enabledFeatures": "[\"feature1\",\"experimental1\"]", "environment": "staging" } ``` The file contains all key-value pairs from the `replicated-support-metadata` Secret's `data` field, with byte values converted to strings. ### Secret Format The following yaml demonstrates the format of the `replicated-support-metadata` secret. ```yaml apiVersion: v1 kind: Secret metadata: name: replicated-support-metadata namespace: example type: Opaque data: appVersion: MS4yLjM= # "1.2.3" enabledFeatures: WyJmZWF0dXJlMSIsImV4cGVyaW1lbnRhbDEiXQ== # ["feature1","experimental1"] environment: c3RhZ2luZw== # "staging" ``` --- The `sysctl` collector reads kernel parameter settings from /proc/sys/net/ipv4, /proc/sys/net/bridge, and /proc/sys/vm on all nodes. This collector schedules a pod on every node using the specified image. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `sysctl` collector accepts the following parameters: ##### `namespace` (Optional) The namespace where the pods will be created. If not specified, the namespace that is currently set for the kubectl context is used. ##### `image` (Required) The image to use for the pods scheduled on each node. This image should be accessible to the nodes in the cluster. The image must have a shell with the `find`, `cat`, and `echo` commands available. ##### `timeout` (Optional) A [duration](https://golang.org/pkg/time/#Duration) that will be honored when collecting data. The timer should allow enough time to pull images if needed. Default: 1 minute ##### `imagePullPolicy` (Optional) A valid, string representation of the policy to use when pulling the image. Default: IfNotPresent #### `imagePullSecret` (Optional) The same [image pull secret options](/docs/collect/copy-from-host/#imagepullsecret-optional) available to the `copyFromHost` collector are supported for the `sysctl` collector. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: collectors: - sysctl: collectorName: "sysctl network parameters" image: debian:buster-slim namespace: default imagePullPolicy: IfNotPresent imagePullSecret: name: my-temporary-secret data: .dockerconfigjson: ewoJICJhdXRocyI6IHsKzCQksHR0cHM6Ly9pbmRleC5kb2NrZXIuaW8vdjEvIjoge30KCX0sCgkiSHR0cEhlYWRlcnMiOiB7CgkJIlVzZXItQWdlbnQiOiAiRG9ja2VyLUNsaWVudC8xOS4wMy4xMiAoZGFyd2luKSIKCX0sCgkiY3JlZHNTdG9yZSI6ICJkZXNrdG9wIiwKCSJleHBlcmltZW50YWwiOiAiZGlzYWJsZWQiLAoJInN0YWNrT3JjaGVzdHJhdG9yIjogInN3YXJtIgp9 type: kubernetes.io/dockerconfigjson ``` ## Included resources When this collector is executed, it includes the following files in a support bundle: ### `/sysctl/[node name]` The sysctl parameters collected for each node are aggregated into a single file with the following format: ``` /proc/sys/net/ipv4/cipso_cache_bucket_size = 10 /proc/sys/net/ipv4/cipso_cache_enable = 1 /proc/sys/net/ipv4/cipso_rbm_optfmt = 0 ... ``` --- ## All Host Collectors and Analyzers ### CPU - [cpu](./cpu): Collects and analyzes information about the number of CPU cores. ### Memory - [memory](./memory): Collects and analyzes information about the total amount of memory on the machine. ### Storage - [blockDevices](./blockDevices): Collects and analyzes information about the block devices. - [diskUsage](./diskUsage): Collects and analyzes information about disk usage on a specified path. - [filesystemPerformance](./filesystemPerformance): Benchmarks sequential write latency on a single file. ### Networking - [certificate](./certificate): Collects and analyzes information about the TLS certificate at the specified path. - [httpLoadBalancer](./httpLoadBalancer): Collects and analyzes information about the ability to connect to the specified HTTP load balancer address. - [ipv4Interfaces](./ipv4Interfaces) Collects and analyzes information about the host system ipv4 interfaces. - [subnetAvailable](./subnetAvailable): Collects and analyzes information about checking for an available (IPv4) subnet. - [tcpConnect](./tcpConnect): Collects and analyzes information about the ability to connect to the the specified TCP address. - [tcpLoadBalancer](./tcpLoadBalancer): Collects and analyzes information about the ability to connect to the specified TCP load balancer address. - [tcpPortStatus](./tcpPortStatus): Collects and analyzes information about the specified TCP port. - [udpPortStatus](./udpPortStatus): Collects and analyzes information about the specified UDP port. - [dns](./dns): Collects information about DNS resolution. - [networkNamespaceConnectivity](./networkNamespaceConnectivity): Collects and analyzes connectivity between ditinct network namespaces. - [subnetContainsIp](./subnetcontainsip): Analyzes if a specified IP address is contained in a subnet. ### Generated and Dynamic Data - [run](./run): Runs a specified command and includes the results in the collected output. ### Other - [hostServices](./hostServices): Collects and analyzes information about the available host system services. - [hostOS](./hostOS): Collects and analyzes information about the operating system installed on the machine. - [sysctl](./sysctl): Collects and analyzes information about the host kernel parameters at runtime using `sysctl` - [systemPackages](./systemPackages) Collects and analyzes information about the host system packages for the specified operating system. - [time](./time): Collects and analyzes information about the system clock. - [kernelConfigs](./kernelConfigs): Collects and analyzes information about available Kernel Configs on the machine. - [journald](./journald): Collects journal entries from journald service. --- ## Block Devices Collector To collect information about all of the block devices on a host, use the `blockDevices` collector. ### Parameters None. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: block-devices spec: hostCollectors: - blockDevices: {} ``` ### Included Resources The results of the `blockDevices` collector are stored in the `host-collectors/system` directory of the support bundle. #### `block_devices.json` Example of the resulting JSON file: ```json { "name":"sda1", "kernel_name":"sda1", "parent_kernel_name":"sda", "type":"part", "major":8, "minor":1, "size":85782937088, "filesystem_type":"ext4", "mountpoint":"/", "serial":"", "read_only":false, "removable":false }, { "name":"sda14", "kernel_name":"sda14", "parent_kernel_name":"sda", "type":"part", "major":8, "minor":14, "size":4194304, "filesystem_type":"", "mountpoint":"", "serial":"", "read_only":false, "removable":false } ``` ## Block Devices Analyzer The `blockDevices` analyzer supports multiple outcomes. It accepts "` `", for example `"sdb > 0"`. The following block devices are not counted: * Devices with a filesystem * Partitioned devices * Read-only devices * Loopback devices * Removable devices ### Parameters #### `includeUnmountedPartitions` (Optional) Includes unmounted partitions in the analysis. Disabled by default. #### `minimumAcceptableSize` (Optional) The minimum acceptable size to filter the available block devices during analysis. Disabled by default. #### `additionalDeviceTypes` (Optional) A list of extra lsblk TYPE values (e.g. `loop`, `lvm`) that should count toward outcomes, in addition to whole disks and (when `includeUnmountedPartitions` is set) partitions. By default, only `disk` type devices are counted. Types in this list are eligible regardless of the `includeUnmountedPartitions` setting. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: block-devices spec: hostCollectors: - blockDevices: {} hostAnalyzers: - blockDevices: includeUnmountedPartitions: true minimumAcceptableSize: 10737418240 # 1024 ^ 3 * 10, 10GiB additionalDeviceTypes: - lvm outcomes: - pass: when: ".* == 1" message: One available block device - pass: when: ".* > 1" message: Multiple available block devices - fail: message: No available block devices ``` --- ## TLS Certificate Collector To collect information about a certificate key pair on the host, use the `certificate` collector. ### Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), the `certificate` collector accepts the following parameters: #### `certificatePath` (Required) The path to the TLS certificate file on the host (e.g. `/etc/ssl/corp.crt`). #### `keyPath` (Required) The path to the private key file on the host (e.g. `/etc/ssl/corp.key`). ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: certificate spec: hostCollectors: - certificate: certificatePath: /etc/ssl/corp.crt keyPath: /etc/ssl/corp.key ``` ### Included Resources The results of the `certificate` collector are stored in the `host-collectors/certificate` directory of the support bundle. #### `[collector-name].json` If the `collectorName` field is unset, it will be named `certificate.json`. Example of the resulting file: ``` key-pair-valid ``` ## TLS Certificate Analyzer The `certificate` analyzer supports multiple outcomes. For example: - `key-pair-missing`: Key pair fails do not exist. - `key-pair-switched`: PEM inputs may have been switched. - `key-pair-encrypted`: Key pair is encrypted, could not read the key. - `key-pair-mismatch`: Private key does not match the public key. - `key-pair-invalid`: Key pair is invalid. - `key-pair-valid`: Key pair is valid. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: certificate spec: hostCollectors: - certificate: certificatePath: /etc/ssl/corp.crt keyPath: /etc/ssl/corp.key hostAnalyzers: - certificate: outcomes: - fail: when: "key-pair-missing" message: Certificate key pair not found in /etc/ssl - fail: when: "key-pair-switched" message: Cert and key pair are switched - fail: when: "key-pair-encrypted" message: Private key is encrypted - fail: when: "key-pair-mismatch" message: Cert and key do not match - fail: when: "key-pair-invalid" message: Certificate key pair is invalid - pass: when: "key-pair-valid" message: Certificate key pair is valid ``` --- ## SSL/TLS Certificates Collection Collector To collect certificate chain data on the host, use the `certificatesCollection` collector. Unlike the [`certificate`](/docs/host-collect-analyze/certificate/) collector, which is designed to collect a specific certificate key pair, the `certificatesCollection` collector focuses on collecting a collection of certificates from multiple file paths. ### Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), the `certificatesCollection` collector accepts the following parameters: #### `paths` (Required) Includes multiple file paths for certificates on the host. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: certificates spec: hostCollectors: - certificatesCollection: paths: - /Users/ubuntu/apiserver-kubelet-client.crt - /etc/ssl/corp.crt ``` ### Included Resources The results of the `certificatesCollection` collector are stored in the `host-collectors/certificatesCollection` directory of the support bundle. #### `[collector-name].json` If the `collectorName` field is not specified, it will be named `certificatesCollection.json`. Example of the resulting file: ``` [ { "certificatePath": "/Users/ubuntu/apiserver-kubelet-client.crt", "certificateChain": [ { "certificate": "", "subject": "CN=kubernetes", "subjectAlternativeNames": [ "kubernetes" ], "issuer": "CN=kubernetes", "notAfter": "2033-04-17T06:11:21Z", "notBefore": "2023-04-20T06:11:21Z", "isValid": true, "isCA": true } ], "message": "cert-valid" }, { "certificatePath": "/etc/ssl/corp.crt", "message": "cert-missing" } ] ``` ## SSL Certificatess Collection Analyzer The certificates analyzer validates certificates and checks the expiration day, and can provide multiple outcomes such as: - `Certificate is valid`: The certificate is valid and not expired. - `notAfter < Today + 4 days`: The certificate is about to expired in 4 days. - `notAfter < Today`: The certificate has expired. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: certificate spec: hostAnalyzers: - certificatesCollection: outcomes: - pass: message: Certificate is valid - warn: when: "notAfter < Today + 4 days" message: Certificate is about to expire - fail: when: "notAfter < Today" message: Certificate is expired ``` --- The `cgroups` is used to gather [Control Group](https://www.man7.org/linux/man-pages/man7/cgroups.7.html) configuration from a linux based system and stores this information on a json file. The collector expects an optional `mountPoint` parameter where the cgroup virtual filesystem is expected to be mounted. The collector checks for the version of `cgroups` and all enabled controllers. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `cgroups` collector accepts the following parameters: ##### `mountPoint` (Optional) Mount point path of the cgroup virtual filesystem. If it's not provided, the collector will default to `/sys/fs/cgroup`. This the default mount point used in most systems and and the default for `systemd`. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: HostPreflight metadata: name: sample spec: collectors: - cgroups: collectorName: cgroups mountPoint: /sys/fs/cgroup ``` ## Included resources ### `/host-collectors/system/cgroups.json` This will contain the JSON results of the collected cgroups configuration. There will be a `"cgroup-enabled"` boolean field that will be true whenever the collector detects either `cgroup v1` or `cgroup v2` in the mounted filesystem. There is also `"allControllers"` array which will contain all detected controllers. This field is meant to make it easier for analyzers to test if a controller is enabled or not regardless of the cgroup version. Example of the resulting JSON files #### Version 1 ```json { "cgroup-enabled": true, "cgroup-v1": { "enabled": false, "mountPoint": "", "controllers": [] }, "cgroup-v2": { "enabled": true, "mountPoint": "/sys/fs/cgroup", "controllers": [ "cpuset", "cpu", "io", "memory", "hugetlb", "pids", "rdma", "misc", "freezer", "devices" ] }, "allControllers": [ "cpu", "cpuset", "devices", "freezer", "hugetlb", "io", "memory", "misc", "pids", "rdma" ] } ``` #### Version 2 ```json { "cgroup-enabled": true, "cgroup-v1": { "enabled": false, "mounts": null, "controllers": null }, "cgroup-v2": { "enabled": true, "mountPoint": "/sys/fs/cgroup", "controllers": [ "cpuset", "cpu", "io", "memory", "hugetlb", "pids", "rdma", "misc", "freezer", "devices" ] } } ``` --- > The ability to copy folders was introduced in Troubleshoot v0.60.0. The `copy` collector can be used to copy files or an entire folder from a host and include the contents in the collected data. This collector can be included multiple times to copy different files or folders from different host paths. This collector does not require kubernetes to be running. ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `copy` collector accepts the following parameters: ##### `path` (Required) The path in the host containing the file(s) and folder(s) to copy. This supports glob matching patterns. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sample spec: hostCollectors: - copy: collectorName: copy-nginx-logs path: /var/log/nginx/acc* # glob pattern to collect access logs ``` ## Included resources ### `/host-collectors/[collector-name OR copy]/...` This will contain the collected folders and files --- ## CPU Collector To collect information about the number of CPU cores and their features on a host, use the `cpu` collector. ### Parameters None. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: cpu spec: hostCollectors: - cpu: {} ``` ### Included Resources The results of the cpu collector are stored in the `host-collectors/system` directory of the support bundle. #### `cpu.json` Example of the resulting JSON file: ```json {"logicalCount":4,"physicalCount":2,"flags": ["cmov", "cx8", "fpu", "fxsr" ]} ``` ## CPU Analyzer The `cpu` analyzer supports multiple outcomes by validating the number of CPU cores, for example: - `count < 32`: Less than 32 CPU cores were detected. - `count > 4`: More than 4 CPU cores were detected. This analyzer also supports validating the presence of specific CPU features, for example: - `supports x86-64-v2`: The CPU supports the x86-64-v2 feature set. - `supports x86-64-v3`: The CPU supports the x86-64-v3 feature set. Supported CPU features (microarchitectures) set are: - `x86-64` - `x86-64-v2` - `x86-64-v3` - `x86-64-v4` Check for individual CPU flags is also supported. The `HostPreflight` below exemplifies how to check for specific CPU flags: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: HostPreflight metadata: name: ec-cluster-preflight spec: collectors: - cpu: {} analyzers: - cpu: checkName: CPU outcomes: - pass: when: hasFlags cmov,cx8,fpu,fxsr,mmx message: CPU supports all required flags - fail: message: CPU not supported ``` ### Examples Analyzer Definition Collecting and analyzing the number of CPU cores: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: cpu spec: hostCollectors: - cpu: {} hostAnalyzers: - cpu: checkName: "Number of CPUs" outcomes: - fail: when: "count < 2" message: At least 2 CPU cores are required, and 4 CPU cores are recommended - warn: when: "count < 4" message: At least 4 CPU cores are recommended - pass: message: This server has at least 4 CPU cores ``` Collecting and analyzing the presence of specific CPU features: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: cpu spec: hostCollectors: - cpu: {} hostAnalyzers: - cpu: checkName: "Supports x86-64-v2" outcomes: - pass: when: "supports x86-64-v2" message: This server cpu suports the x86-64-v2 features - fail: message: This server does not support the x86-64-v2 features ``` Collecting and analyzing the presence of specific CPU architecture: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: cpu spec: hostCollectors: - cpu: {} hostAnalyzers: - cpu: checkName: "Check machine architecture" outcomes: - fail: when: "machineArch == x86_64" message: x86_64 machine architecture is not supported - pass: when: "machineArch == arm64" message: It is recommended to use arm64 machine architecture - warn: message: Supported machine architecture was not detected ``` --- ## Disk Usage Collector The `diskUsage` collector returns the disk usage of a specified directory in bytes. ### Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), the `diskUsage` collector accepts the following parameters: #### `path` (Required) Path host filesystem to evaluate disk usage. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: disk-usage spec: hostCollectors: - diskUsage: collectorName: var-lib-kubelet path: /var/lib/kubelet ``` ### Included Resources The results of the `diskUsage` collector are stored in the `host-collectors/diskUsage` directory of the support bundle. #### `[collector-name].json` If the `collectorName` field is unset, it will be named `diskUsage.json`. Example of the resulting JSON file: ```json {"total_bytes":83067539456,"used_bytes":35521277952} ``` ## Disk Usage Analyzer The diskUsage analyzer supports multiple outcomes by validating the disk usage of the directory. For example: - `total < 30Gi`: The disk containing the directory has less than 30Gi of total space. - `used/total > 80%`: The disk containing the directory is more than 80% full. - `available < 10Gi`: The disk containing the directory has less than 10Gi of disk space available. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: disk-usage spec: hostCollectors: - diskUsage: collectorName: var-lib-kubelet path: /var/lib/kubelet hostAnalyzers: - diskUsage: checkName: "Ephemeral Disk Usage /var/lib/kubelet" collectorName: var-lib-kubelet outcomes: - fail: when: "total < 30Gi" message: The disk containing directory /var/lib/kubelet has less than 30Gi of total space - fail: when: "used/total > 80%" message: The disk containing directory /var/lib/kubelet is more than 80% full - warn: when: "used/total > 60%" message: The disk containing directory /var/lib/kubelet is more than 60% full - warn: when: "available < 10Gi" message: The disk containing directory /var/lib/kubelet has less than 10Gi of disk space available - pass: message: The disk containing directory /var/lib/kubelet has at least 30Gi of total space, has at least 10Gi of disk space available, and is less than 60% full ``` --- The `dns` host collector can be used to help diagnose DNS resolution problems on the host machine. During execution, the collector performs various DNS record queries to troubleshoot DNS resolution. It does the following: - Reads the contents of `/etc/resolv.conf` - Performs DNS `A`, `AAAA`, `CNAME` lookups for specified hostnames - Outputs query results including IP addresses (if any) ## Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `dns` host collector accepts the following parameters: ##### `collectorName` (Required) The name of the collector. No spaces or special characters are allowed because the collector name is used as a directory name. ##### `hostnames` (Required) A list of hostnames to query. These can include both resolvable domains and non-resolvable domains to test various scenarios (for example, wildcard DNS). ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: HostPreflight metadata: name: sample spec: collectors: - dns: collectorName: wildcard-check hostnames: - '*' - dns: collectorName: valid-check hostnames: - replicated.app analyzers: - jsonCompare: checkName: Detect wildcard DNS fileName: host-collectors/dns/wildcard-check/result.json path: 'resolvedFromSearch' value: | "" outcomes: - fail: when: 'false' message: 'Possible wildcard DNS detected at: {{ .resolvedFromSearch }}. Please remove the search domain OR remove the wildcard DNS entry.' - pass: when: 'true' message: No wildcard DNS detected. ``` ## Included resources When this collector is executed, it includes the following files in a support bundle: ### `/host-collectors/dns//resolv.conf` This file contains the contents of the host's `/etc/resolv.conf` file. ``` nameserver 8.8.8.8 nameserver 8.8.4.4 search mydomain.com ``` ### `/host-collectors/dns//result.json` This file contains the results of the DNS queries in JSON format. #### Example of result for DNS queries that detect wildcard DNS ```json { "query": { "*": [ { "server": "127.0.0.53:53", "search": ".foo.testcluster.net.", "name": "*.foo.testcluster.net.", "answer": "*.foo.testcluster.net.\t60\tIN\tA\t192.1.2.3", "record": "192.1.2.3" }, { "server": "127.0.0.53:53", "search": ".artifactory.testcluster.net.", "name": "*.artifactory.testcluster.net.", "answer": "*.artifactory.testcluster.net.\t300\tIN\tCNAME\tartifactory-elb-506539455.us-west-2.elb.amazonaws.com.", "record": "" }, { "server": "127.0.0.53:53", "search": "", "name": "*.c.replicated-qa.internal.", "answer": "", "record": "" }, { "server": "127.0.0.53:53", "search": "", "name": "*.google.internal.", "answer": "", "record": "" }, { "server": "127.0.0.53:53", "search": "", "name": "*.", "answer": "", "record": "" } ] }, "resolvedFromSearch": ".foo.testcluster.net., .artifactory.testcluster.net." ``` The `resolvedFromSearch` attribute contains the list of search domains that resolved the hostnames. #### Example of a normal DNS resolution ```json { "query": { "replicated.app": [ { "server": "127.0.0.53:53", "search": "", "name": "replicated.app.", "answer": "replicated.app.\t300\tIN\tA\t162.159.134.41", "record": "162.159.134.41" }, { "server": "127.0.0.53:53", "search": "", "name": "replicated.app.c.replicated-qa.internal.", "answer": "", "record": "" }, { "server": "127.0.0.53:53", "search": "", "name": "replicated.app.google.internal.", "answer": "", "record": "" } ] }, "resolvedFromSearch": "" ``` --- ## Filesystem Performance Collector The `filesystemPerformance` collector uses the [`fio` tool](https://github.com/axboe/fio) to benchmark sequential write latency on a single file. The optional background IOPS feature attempts to mimic real-world conditions by running read and write workloads prior to and during benchmark execution. Note: the `fio` binary must be installed and available in your system's `$PATH` ### Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), the `filesystemPerformance` collector accepts the following parameters: #### `timeout` (Required) Specifies the total timeout, including background IOPS setup and warmup if enabled. #### `directory` (Required) Specifies the directory where the benchmark will create files. #### `fileSize` (Required) Specifies the size of the file used in the benchmark. The number of IO operations for the benchmark will be fileSize / operationSizeBytes. This parameter accepts valid Kubernetes resource units, such as Mi. #### `operationSize` (Required) Specifies the size of each write operation performed while benchmarking. This parameter does not apply to the background IOPS feature if enabled, since those must be fixed at 4096. #### `sync` (Optional) Specifies whether to call sync on the file after each write. This does not apply to background IOPS task. #### `datasync` (Optional) Specifies whether to call datasync on the file after each write. This is skipped if sync is also true. It does not apply to background IOPS task. #### `runTime` (Optional) Limits the runtime. The test will run until it completes the configured I/O workload or until it has run for this specified amount of time, whichever occurs first. When the unit is omitted, the value is interpreted in seconds. Defaults to 120 seconds. Set to "0" to disable. #### `enableBackgroundIOPS` (Optional) Enables the background IOPS feature. #### `backgroundIOPSWarmupSeconds` (Optional) Specifies how long to run the background IOPS read and write workloads prior to starting the benchmarks. #### `backgroundWriteIOPS` (Optional) Specifies the target write IOPS to run while benchmarking. This is a limit and there is no guarantee it will be reached. This is the total IOPS for all background write jobs. #### `backgroundReadIOPS` (Optional) Specifies the target read IOPS to run while benchmarking. This is a limit and there is no guarantee it will be reached. This is the total IOPS for all background read jobs. #### `backgroundWriteIOPSJobs` (Optional) Specifies the number of threads to use for background write IOPS. This value should be set high enough to reach the target specified in `backgroundWriteIOPS`. For example, if `backgroundWriteIOPS` is 100 and write latency is 10ms, then a single job would barely be able to reach 100 IOPS, so this value should be at least 2. #### `backgroundReadIOPSJobs` (Optional) Specifies the number of threads to use for background read IOPS. This should be set high enough to reach the target specified in `backgroundReadIOPS`. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: filesystem-performance spec: hostCollectors: - filesystemPerformance: collectorName: filesystem-latency-two-minute-benchmark timeout: 3m directory: /var/lib/etcd fileSize: 22Mi operationSize: 2300 datasync: true enableBackgroundIOPS: true backgroundIOPSWarmupSeconds: 10 backgroundWriteIOPS: 300 backgroundWriteIOPSJobs: 6 backgroundReadIOPS: 50 backgroundReadIOPSJobs: 1 runTime: "120" ``` ### Included Resources The results of the `filesystemPerformance` collector are stored in the `host-collectors/filesystemPerformance` directory of the support bundle. #### `[collector-name].json` If the `collectorName` field is unset, it will be named `filesystemPerformance.json`. Example of the resulting JSON file: ```json "fio version" : "fio-3.28", "timestamp" : 1691679955, "timestamp_ms" : 1691679955590, "time" : "Thu Aug 10 15:05:55 2023", "global options" : { "rw" : "write", "ioengine" : "sync", "fdatasync" : "1", "directory" : "/var/lib/etcd", "size" : "23068672", "bs" : "1024" }, ... "sync" : { "total_ios" : 0, "lat_ns" : { "min" : 200, "max" : 1000000000, "mean" : 55000, "stddev" : 12345.6789, "N" : 32400, "percentile" : { "1.000000" : 1000, "5.000000" : 5000, "10.000000" : 10000, "20.000000" : 20000, "30.000000" : 30000, "40.000000" : 40000, "50.000000" : 50000, "60.000000" : 60000, "70.000000" : 70000, "80.000000" : 80000, "90.000000" : 90000, "95.000000" : 95000, "99.000000" : 99000, "99.500000" : 995000, "99.900000" : 999000, "99.950000" : 5000000, "99.990000" : 9000000 } } }, ``` ## Filesystem Performance Analyzer The `filesystemPerformance` analyzer supports multiple outcomes by validating the filesystem latency results. For example: - `p99 < 10ms`: The p99 write latency is less than 10ms. - `p90 > 20ms`: The p90 write latency is greater than 20ms. You can override the default 'file not found' error by setting the first outcome condition to `fileNotCollected` and a message of your choice. This can be useful when `fio` may not be present on the system being analyzed but you do not wish for that to result in a failure. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: filesystem-performance spec: hostCollectors: - filesystemPerformance: collectorName: filesystem-latency-two-minute-benchmark timeout: 3m directory: /var/lib/etcd fileSize: 22Mi operationSize: 2300 datasync: true enableBackgroundIOPS: true backgroundIOPSWarmupSeconds: 10 backgroundWriteIOPS: 300 backgroundWriteIOPSJobs: 6 backgroundReadIOPS: 50 backgroundReadIOPSJobs: 1 runTime: "120" hostAnalyzers: - filesystemPerformance: collectorName: filesystem-latency-two-minute-benchmark outcomes: - warn: when: "fileNotCollected" message: "Filesystem Performance was not collected" - pass: when: "p99 < 10ms" message: "Write latency is ok (p99 target < 10ms)" - warn: message: "Write latency is high. p99 target >= 10ms)" ``` --- ## Host OS Collector To collect information about the operating system installed on the machine, you can use the `hostOS` collector. ### Parameters None. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: hostOS spec: hostCollectors: - hostOS: {} ``` ### Included Resources The results of the hostOS collector are stored in the `host-collectors/system` directory of the support bundle. #### `hostos_info.json` Example of the resulting JSON file: ```json { "name": "localhost", "kernelVersion": "5.13.0-1024-gcp", "platformFamily": "debian", "platformVersion": "20.04", "platform": "ubuntu" } ``` ## Host OS Analyzer The `hostOS` analyzer supports multiple outcomes by validating the name and version of the detected operating system. For example: - `ubuntu = 20.04`: The detected OS is Ubuntu 20.04. - `centos >= 7 && < 8`: The detected OS is CentOS 7, which might be anything from `7.0` to `7.9`, and so requires a range. Multiple comparisons can be joined by `&&` or `||`. - `rhel >= 8 && < 9`: The detected platform family is RHEL with a `8.x` version. `rhel` includes RedHat Enterprise Linux, CentOS, Oracle Linux, Alma Linux, Rocky Linux, and more. The mapping of platform family to platform can be viewed [here](https://github.com/shirou/gopsutil/blob/8e62971/host/host_linux.go#L293). - `kernelVersion > 5.12.0`: Check if `kernelVersion` value in the JSON output, regardless of OS, is greater than 5.12.0 - `ubuntu-16.04-kernel >= 4.14`: Detect whether Ubuntu 16.04 has a kernel version greater or equal to `4.14`. This string follows `--kernel = ` format. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: hostOS spec: hostCollectors: - hostOS: {} hostAnalyzers: - hostOS: outcomes: - pass: when: "centos >= 7 && < 8" message: "CentOS 7 is supported" - pass: when: "centos >= 8 && < 9" message: "CentOS 8 is supported" - fail: when: "ubuntu = 16.04" message: "Ubuntu 16.04 is not supported" - pass: when: "ubuntu = 18.04" message: "Ubuntu 18.04 is supported" - pass: when: "ubuntu = 20.04" message: "Ubuntu 20.04 is supported" - pass: when: "kernelVersion > 5.12.0" message: "kernel version is supported" ``` ### `RunHostCollectorsInPod` enabled If the spec has `runHostCollectorsInPod: true`, the `hostcollectors` will be run in a privileged pod. The collector and analyzer will collect and analyze the results from multiple nodes in the cluster. It will be categorized by each node. Example: If a cluster has 2 nodes running this support bundle spec, the output will be categorized by each node. ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sb spec: runHostCollectorsInPod: true # default is false hostCollectors: - hostOS: {} hostAnalyzers: - hostOS: outcomes: - pass: when: "ubuntu >= 22.04" message: "Ubuntu 22.04 is supported" - fail: when: "ubuntu <= 16.04" message: "Ubuntu 16.04 is not supported" ``` The result: ```json [ { "name": "host.os.info.node.multinode.demo.m02", "labels": { "desiredPosition": "1", "iconKey": "", "iconUri": "" }, "insight": { "name": "host.os.info.node.multinode.demo.m02", "labels": { "iconKey": "", "iconUri": "" }, "primary": "Host OS Info - Node multinode-demo-m02", "detail": "Ubuntu 22.04 is supported", "severity": "debug" }, "severity": "debug", "analyzerSpec": "" }, { "name": "host.os.info.node.multinode.demo.m02.node.multinode.demo", "labels": { "desiredPosition": "1", "iconKey": "", "iconUri": "" }, "insight": { "name": "host.os.info.node.multinode.demo.m02.node.multinode.demo", "labels": { "iconKey": "", "iconUri": "" }, "primary": "Host OS Info - Node multinode-demo-m02 - Node multinode-demo", "detail": "Ubuntu 22.04 is supported", "severity": "debug" }, "severity": "debug", "analyzerSpec": "" } ] ``` --- ## Host Services Collector To collect information about the available host system services, you can use the `hostServices` collector. ### Parameters None. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: hostServices spec: hostCollectors: - hostServices: {} ``` ### Included Resources The results of the `hostServices` collector are stored in the `host-collectors/system` directory of the support bundle. #### `systemctl_services.json` Example of the resulting JSON file: ```json [ { "Unit":"accounts-daemon.service", "Load":"loaded", "Active":"active", "Sub":"running" }, { "Unit":"apparmor.service", "Load":"loaded", "Active":"active", "Sub":"exited" } ] ``` ## Host Services Analyzer The `hostServices` analyzer supports multiple outcomes by validating the status of certain host system services. For example: - `ufw = active`: UFW system service is active. - `connman = inactive`: ConnMan system service is inactive. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: hostServices spec: hostCollectors: - hostServices: {} hostAnalyzers: - hostServices: checkName: "Host UFW status" outcomes: - fail: when: "ufw = active" message: UFW is active ``` --- ## HTTP Load Balancer Collector To collect information about the ability to connect to a specified HTTP load balancer address, you can use the `httpLoadBalancer` collector. This collector listens on a host port on `0.0.0.0` and then attempts to connect through an HTTP load balancer. A successful connection requires sending and receiving a random token through the load balancer to the test server. ### Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), the `httpLoadBalancer` collector accepts the following parameters: #### `port` (Required) The port number to use. #### `address` (Required) The address to check the connection to. #### `timeout` (Optional) Specifies the total timeout. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: httpLoadBalancer spec: hostCollectors: - httpLoadBalancer: collectorName: httploadbalancer port: 80 address: http://app.corporate.internal timeout: 10s ``` ### Included Resources The results of the `httpLoadBalancer` collector are stored in the `host-collectors/httpLoadBalancer` directory of the support bundle. #### `[collector-name].json` If the `collectorName` field is unset, it will be named `httpLoadBalancer.json`. Example of the resulting file: ``` address-in-use ``` ## HTTP Load Balancer Analyzer The `httpLoadBalancer` analyzer supports multiple outcomes: - `invalid-address`: The load balancer address is not valid. - `connection-refused`: Connection to the load balancer address was refused. - `connection-timeout`: Timed out connecting to the load balancer address. - `address-in-use`: Specified port is unavailable. - `connected`: Successfully connected to the load balancer address. - `bind-permission-denied`: Failed to bind to the address:port. - `error`: Unexpected error connecting to the load balancer address. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: httpLoadBalancer spec: hostCollectors: - httpLoadBalancer: collectorName: httploadbalancer port: 80 address: http://app.corporate.internal timeout: 10s hostAnalyzers: - httpLoadBalancer: collectorName: httploadbalancer outcomes: - fail: when: "connection-refused" message: Connection to port 80 via load balancer was refused. - fail: when: "address-in-use" message: Another process was already listening on port 80. - fail: when: "connection-timeout" message: Timed out connecting to port 80 via load balancer. Check your firewall. - fail: when: "bind-permission-denied" message: Bind permission denied. Try running as root. - fail: when: "error" message: Failed to connect to port 80 via load balancer. - pass: when: "connected" message: Successfully connected to port 80 via load balancer. ``` --- ## IPv4 Interfaces Collector To collect information about the IPv4 interfaces on the host, you can use `ipv4Interfaces` collector. ### Parameters None. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: ipv4Interfaces spec: hostCollectors: - ipv4Interfaces: {} ``` ### Included Resources The results of the `ipv4Interfaces` collector are stored in the `host-collectors/system` directory of the support bundle. #### `ipv4Interfaces.json` Example of the resulting JSON file: ```json [ { "Index":2, "MTU":1460, "Name":"ens4", "HardwareAddr":"QgEKlgAo", "Flags":19 }, { "Index":3, "MTU":1500, "Name":"docker0", "HardwareAddr":"AkJSnADP", "Flags":19 }, { "Index":7, "MTU":1376, "Name":"weave", "HardwareAddr":"WgtTyJAI", "Flags":19 } ] ``` ## IPv4 Interfaces Analyzer The ipv4Interfaces analyzer supports multiple outcomes: - `count ==`: Number of interfaces is equal to - `count >=`: Number of interfaces is greater than equal to - `count <=`: Number of interfaces is less than or equal to - `count >`: Number of interfaces is greater than - `count <`: Number of interfaces is less than ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: ipv4Interfaces spec: hostCollectors: - ipv4Interfaces: {} hostAnalyzers: - ipv4Interfaces: outcomes: - fail: when: "count == 0" message: No IPv4 interfaces detected - warn: when: "count >= 2" message: Multiple IPv4 interfaces detected - pass: when: "count == 1" message: IPv4 interface detected ``` --- ## Journald Collector To collect log entries from the journald service, you can use the `journald` collector. ### Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), the `journald` collector accepts the following parameters to filter journal records: #### `system` (Optional) Show messages from system services and the kernel. Default to `false` #### `dmesg` (Optional) Include messages from the kernel ring buffer. Default to `false` #### `units` (Optional) A list of systemd units to include messages from. If empty, messages from all units will be included. #### `since` (Optional) Specify a starting point for the journal entries. This can be a timestamp or a relative time (for example, `"1 day ago"` for the previous day). #### `until` (Optional) Specify an endpoint for the journal entries. This can be a timestamp or a relative time. #### `output` (Optional) Specify the format for the collected logs. Default is `"short"`. #### `lines` (Optional) Limit the number of lines to fetch from the journal. If set to `0`, all lines will be fetched. Default is `0`. #### `reverse` (Optional) Show the newest entries first. Default to `false`. #### `utc` (Optional) Show timestamps in UTC. Default to `false`. #### `timeout` (Optional) Specify a timeout for collecting the logs. Default to `"30s"`. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: logs spec: hostCollectors: - journald: collectorName: "k0s" system: false dmesg: false units: - k0scontroller - containerd since: "1 day ago" output: "json" lines: 1000 reverse: true utc: true timeout: "30s" ``` ### Included Resources The results of the `journald` collector are stored in the `host-collectors/journald` directory of the bundle. Two files per collector execution will be stored in this directory. - `[collector-name].txt` - output of the logs from `journalctl` - `[collector-name]-info.json` - the command that was executed, its exit code and any output read from `stderr`. See the example below: ```json { "command": "/usr/bin/journalctl -u k0scontroller -n 100 --reverse --utc --no-pager", "exitCode": "0", "error": "", "outputDir": "", "input": "", "env": null } ``` --- ## Kernel Configs Collector To collect information about the available Kernel Configs, you can use the `kernelConfigs` collector. Only config options with values `=y (built into kernel)`, `=m (loadable module)` or `=n (feature is disabled)` will be collected. ### Parameters None. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: kernel-configs spec: hostCollectors: - kernelConfigs: {} ``` ### Included Resources The results of the `kernelConfigs` collector are stored in the `host-collectors/system` directory of the support bundle. #### `kernel-configs.json` Example of the resulting JSON file: ```json { "CONFIG_9P_FS": "y", "CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE": "y", "CONFIG_ARCH_HAS_STRICT_MODULE_RWX": "y", "CONFIG_ARCH_HAVE_TRACE_MMIO_ACCESS": "y", "CONFIG_ARCH_SUPPORTS_UPROBES": "y", "CONFIG_ARCH_WANT_DEFAULT_BPF_JIT": "y", "CONFIG_AUTOFS_FS": "y", "CONFIG_BLK_CGROUP": "y", "CONFIG_BLK_CGROUP_PUNT_BIO": "y" } ``` ## Kernel Configs Analyzer The `kernelConfigs` analyzer supports only `pass` or `fail` outcomes. In the case of a `fail` outcome, the placeholder `{{ .ConfigsNotFound }}` can be used to render a list of missing kernel configurations in the message. ### Parameters #### `selectedConfigs` (Required) List of kernel config parameters that must be available. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: kernel-configs spec: hostCollectors: - kernelConfigs: {} hostAnalyzers: - kernelConfigs: collectorName: "Kernel Configs Test" strict: true selectedConfigs: - CONFIG_CGROUP_FREEZER=y - CONFIG_NETFILTER_XTABLES=m outcomes: - pass: message: "required kernel configs are available" - fail: message: "missing kernel config(s): {{ .ConfigsNotFound }}" ``` --- ## Memory Collector To collect information about the total amount of memory on the machine in bytes, you can use the `memory` collector. ### Parameters None. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: memory spec: hostCollectors: - memory: {} ``` ### Included Resources The results of the `memory` collector are stored in the `host-collectors/system` directory of the support bundle. #### `memory.json` Example of the resulting JSON file: ```json {"total":16777601024} ``` ## Memory Analyzer The `memory` analyzer supports multiple outcomes by validating the total amount of memory. For example: - `< 32G`: Less than 32G of memory was detected. - `> 4G`: More than 4G of memory was detected. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: memory spec: hostCollectors: - memory: {} hostAnalyzers: - memory: checkName: "Amount of Memory" outcomes: - fail: when: "< 4G" message: At least 4G of memory is required, and 8G of memory is recommended - warn: when: "< 8G" message: At least 8G of memory is recommended - pass: message: The system has at least 8G of memory ``` --- ## Network Namespace Connectivity Collector To collect information about network namespace connectivity, use the `NetworkNamespaceConnectivity` collector. This collector creates two distinct network namespaces and verifies if TCP and UDP traffic can traverse between them. ### Parameters In addition to the [shared collector properties](https://troubleshoot.sh/docs/collect/collectors/#shared-properties), the `networkNamespaceConnectivity` collector accepts the following parameters: ##### `fromCIDR` (Required) The CIDR to be used on the first network namespace. ##### `toCIDR` (Required) The CIDR to be used on the second network namespace. ##### `port` (Optional) The port to use for the UDP and TCP connections. Defaults to `8080`. ##### `timeout` (Optional) The time to wait for the UDP and TCP connections to be established. This parameter is expressed with a string following the [Go duration format](https://pkg.go.dev/time#ParseDuration). Defaults to `5s`. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: namespace-connectivity spec: hostCollectors: - networkNamespaceConnectivity: collectorName: check-network-connectivity fromCIDR: 10.0.0.0/24 toCIDR: 10.0.1.0/24 port: 9090 timeout: 10s ``` ### Included Resources The results of the `NetworkNamespaceConnectivity` collector are stored in the `host-collectors/system` directory of the support bundle. #### `check-network-connectivity.json` Example of the resulting JSON file: ```json { "from_cidr": "10.0.0.0/24", "to_cidr": "10.0.1.0/24", "errors": { "from_cidr_creation": "", "to_cidr_creation": "", "udp_client": "error reading from udp socket: read udp 10.0.0.1:60767->10.0.1.1:8888: i/o timeout", "udp_server": "error reading from udp socket: read udp 10.0.1.1:8888: i/o timeout", "tcp_client": "error dialing tcp: dial tcp 10.0.1.1:8888: i/o timeout", "tcp_server": "error accepting connection: accept tcp 10.0.1.1:8888: i/o timeout" }, "output": { "logs": [ "[2024-11-07T13:29:26Z] creating network namespace \"from\" with cidr \"10.0.0.0/24\"", "[2024-11-07T13:29:26Z] network namespace \"from\" address range: \"10.0.0.1\" - \"10.0.0.254\"", "[2024-11-07T13:29:26Z] creating interface pair \"from-in\" and \"from-out\"", "[2024-11-07T13:29:26Z] attaching interface \"from-in\" to namespace \"from\"", "[2024-11-07T13:29:26Z] setting default gateway \"10.0.0.254\" for namespace \"from\"", "[2024-11-07T13:29:26Z] creating network namespace \"to\" with cidr \"10.0.1.0/24\"", "[2024-11-07T13:29:26Z] network namespace \"to\" address range: \"10.0.1.1\" - \"10.0.1.254\"", "[2024-11-07T13:29:26Z] creating interface pair \"to-in\" and \"to-out\"", "[2024-11-07T13:29:26Z] attaching interface \"to-in\" to namespace \"to\"", "[2024-11-07T13:29:26Z] setting default gateway \"10.0.1.254\" for namespace \"to\"", "[2024-11-07T13:29:26Z] starting udp echo server on namespace \"to\"(\"10.0.1.1:8888\")", "[2024-11-07T13:29:26Z] starting tcp echo server on namespace \"to\"(\"10.0.1.1:8888\")", "[2024-11-07T13:29:26Z] reaching to \"10.0.1.1\" from \"10.0.0.1\" with udp", "[2024-11-07T13:29:31Z] reaching to \"10.0.1.1\" from \"10.0.0.1\" with tcp", "[2024-11-07T13:29:36Z] network namespace connectivity test finished" ] }, "success": false } ``` ## Network Namespace Connectivity Analyzer The `NetworkNamespaceConnectivity` analyzer supports two outcomes, one for `pass` and one for `fail`. An example is provided below. ## Message Templating To make the outcome message more informative, you can include certain values gathered by the `NetworkNamespaceConnectivity` collector as templates. The templates are enclosed in double curly braces with a dot separator. The following templates are available: | Template | Description | |----|----| |`{{ .ErrorMessage }}` | Show all error messages found during the collection | |`{{ .FromCIDR }}` | The CIDR provided in the collector `fromCIDR` property | |`{{ .ToCIDR }}` | The CIDR provided in the collector `toCIDR` property | ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: namespace-connectivity spec: hostCollectors: - networkNamespaceConnectivity: collectorName: check-network-connectivity fromCIDR: 10.0.0.0/24 toCIDR: 10.0.1.0/24 port: 9090 timeout: 10s hostAnalyzers: - networkNamespaceConnectivity: collectorName: check-network-connectivity outcomes: - pass: message: "Communication between {{ .FromCIDR }} and {{ .ToCIDR }} is working" - fail: message: "{{ .ErrorMessage }}" ``` --- > New in v0.40.0 of Troubleshoot! ## Introduction If you need to collect and analyze information that is not available when using in-cluster collectors, you can use host collectors to gather information about the environment, such as CPU, memory, available block devices, and so on. This is especially useful when you need to debug a Kubernetes cluster that is down. ### Differences Between In-Cluster and Host Collectors [In-cluster collectors](https://troubleshoot.sh/collect/collectors), specified with the `collectors` property in the `SupportBundle` specification, collect information from a running Kubernetes cluster or schedule a resource in the cluster to dynamically generate data. Host collectors gather information directly from the host that they are run on and do not have Kubernetes as a dependency. They can be used to test network connectivity, collect information about the operating system, and gather the output of provided commands. ## Getting Started 1. Download the support bundle binary from Github: ``` curl -L https://github.com/replicatedhq/troubleshoot/releases/download/v0.40.0/support-bundle_linux_amd64.tar.gz | tar xzvf - ``` **Note**: You can see the latest available releases at https://github.com/replicatedhq/troubleshoot/releases 2. Create a YAML file using `kind: SupportBundle` and specify all of your host collectors and analyzers. You can use the following example as a test: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: host-collectors spec: hostCollectors: - cpu: {} - memory: {} hostAnalyzers: - cpu: checkName: "Number of CPUs" outcomes: - fail: when: "count < 2" message: At least 2 CPU cores are required, and 4 CPU cores are recommended - pass: message: This server has at least 4 CPU cores - memory: checkName: "Amount of Memory" outcomes: - fail: when: "< 4G" - pass: message: The system has at least 8G of memory ``` 3. Generate the support bundle: ``` ./support-bundle --interactive=false support-bundle.yaml ``` ## Known Limitations and Considerations 1. Root access is not required to run any of the host collectors. However, depending on what you want to collect, you must run the binary with elevated permissions. For example, if you run the `filesystemPerformance` host collector against `/var/lib/etcd` and the user running the binary does not have permissions on this directory, collection fails. ## Included Resources Each collector generates results specific to its function, with detailed information available on the respective collector's documentation page. When collectors run remotely from a pod (with `runHostCollectorsInPod`), the output files are prefixed with the name of the node where the collector executed. E.g. ```bash host-collectors/ ├── run-host │ ├── node-1 │ │ ├── ping-google-info.json │ │ └── ping-google.txt │ └── node-2 │ ├── ping-google-info.json │ └── ping-google.txt ├── subnetAvailable │ ├── node-1 │ │ └── result.json │ └── node-2 │ └── result.json ├── system │ ├── node-1 │ │ ├── block_devices.json │ │ ├── cpu.json │ │ ├── hostos_info.json │ │ ├── ipv4Interfaces.json │ │ ├── kernel-configs.json │ │ ├── packages-packages.json │ │ ├── systemctl_services.json │ │ └── time.json │ ├── node-2 │ │ ├── block_devices.json │ │ ├── cpu.json │ │ ├── hostos_info.json │ │ ├── ipv4Interfaces.json │ │ ├── kernel-configs.json │ │ ├── packages-packages.json │ │ ├── systemctl_services.json │ │ └── time.json │ └── node_list.json ``` --- The regex analyzer is used to run arbitrary regular expressions against text data collected into a bundle. You can use the regex analyzer with any text data collector, such as the `data`, `runPod`, `runDaemonSet`, `copy`, `logs`, and `exec` collectors. ## Parameters Either `regex` or `regexGroups` must be set, but not both. This analyzer uses the Go library [`regexp`](https://pkg.go.dev/regexp) from the Go standard library and uses Go's [RE2 regular expression syntax](https://github.com/google/re2/wiki/Syntax) **regex**: (Optional) A regex pattern to test. If the pattern matches the file, the outcome that has set `when` to `"true"` is executed. If no `when` expression is specified, the `pass` outcome defaults to `"true"`. **regexGroups**: (Optional) A regex pattern to match. Matches from named capturing groups are available to `when` expressions in outcomes. **fileName** (Required) Path to the file in support bundle to analyze. This can be an exact name, a prefix, or a file path pattern as defined by Go's [`filepath.Match`](https://pkg.go.dev/path/filepath#Match) function. **ignoreIfNoFiles** (Optional) If no file matches, this analyzer produces a warn outcome by default. This flag can be set to `true` to suppress the warning. ## Example Analyzer Definition for regex ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: example spec: hostCollectors: - run: collectorName: "localhost-ips" command: "sh" args: ["-c", "host localhost"] hostAnalyzers: - textAnalyze: checkName: Check if localhost resolves to 127.0.0.1 fileName: host-collectors/run-host/localhost-ips.txt regex: 'localhost has address 127.0.0.1' outcomes: - fail: when: "false" message: "'localhost' does not resolve to 127.0.0.1 ip address" - pass: when: "true" message: "'localhost' resolves to 127.0.0.1 ip address" ``` ## Example Analyzer Definition for regexGroups ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: ping spec: hostCollectors: - run: collectorName: "ping-google" command: "ping" args: ["-c", "5", "google.com"] hostAnalyzers: - textAnalyze: checkName: "run-ping" fileName: host-collectors/run-host/ping-google.txt regexGroups: '(?P\d+) packets? transmitted, (?P\d+) packets? received, (?P\d+)(\.\d+)?% packet loss' outcomes: - pass: when: "Loss < 5" message: Solid connection to google.com - fail: message: High packet loss ``` --- ## Run Collector The `run` collector runs the specified command and includes the results in the collected output. By default, it will inherit all of the environment variables from the parent process. ### Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), the `run` collector accepts the following parameters: ##### `command` (Required) The command to execute on the host. The command gets executed directly and is not processed by a shell. You can specify a shell if you want to use constructs like pipes, redirection, loops, etc. Note that if you want to run your command in a shell, then your command should be a single string argument passed to something like `sh -c`. See the `run-with-shell` example. ##### `args` (Required) The arguments to pass to the specified command. ##### `env` (Optional) Extra environment variables to pass to the specified command. This is a list of values that are key/value pairs separated by "=". e.g `MY_ENV_VAR=my-value` ##### `ignoreParentEnvs` (Optional) Whether the command should run with the environment variables of the parent process or not. When not specified, it defaults to `false`. Note that `PATH`, `KUBECONFIG` and `PWD` variables will always be present in the spawned process of the command to run, even if this is set to `true`. ##### `inheritEnvs` (Optional) A subset of envirnoment variables to inherit from the parent process, if you don't want to inherit all of them. By default and when `ignoreParentEnvs` is `false`, it inherits all environment variables from the parent process. Note that if you specify this and set `ignoreParentEnvs` to `true`, the value of `inheritEnvs` will still be ignored. ##### `outputDir` (Optional) The directory that your command can write output to if you want to include your command run's file output into your bundle. If defined, an environment variable `TS_OUTPUT_DIR` will be available for your command to write output to. NOTE: `stdout` and/or `stderr` (if the command fails) output will be written to separate files. ##### `input` (Optional) Input files (e.g configuration files or sample data) that you wish to feed into your command run. If defined, an environment variable `TS_INPUT_DIR` which is a directory to store these files, will be available to your command run. The value is a simple map where keys are file names of the files created in `TS_INPUT_DIR` and values are the contents written to these files. ##### `timeout` (Optional) A [duration](https://golang.org/pkg/time/#Duration) that will be honored when running the command. If the timeout elapses, the command is terminated with exit code `signal: killed` or `-1`. Default to no timeout. ## Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: run spec: hostCollectors: - run: collectorName: "ping-google" command: "ping" args: ["-c", "5", "google.com"] - run: collectorName: "run-with-shell" command: "sh" args: ["-c", "du -sh | sort -rh | head -5"] # Multiline shell script - run: collectorName: "hostnames" command: "sh" args: - -c - | echo "hostname = $(hostname)" echo "/proc/sys/kernel/hostname = $(cat /proc/sys/kernel/hostname)" echo "uname -n = $(uname -n)" # Redirect stderr to stdout - run: collectorName: "docker-logs-etcd" command: "sh" args: ["-c", "docker logs $(docker ps -a --filter label=io.kubernetes.container.name=etcd -q -l) 2>&1"] ``` ## Example Collector Definition With Analyzer ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: run spec: hostCollectors: - run: collectorName: "ping-google" command: "ping" args: ["-c", "5", "google.com"] analyzers: - textAnalyze: checkName: "run-ping" fileName: host-collectors/run-host/ping-google.txt regexGroups: '(?P\d+) packets? transmitted, (?P\d+) packets? received, (?P\d+)(\.\d+)?% packet loss' outcomes: - pass: when: "Loss < 5" message: Solid connection to google.com - fail: message: High packet loss ``` ## Example Collector Definition With Command Run File Input and Output Saving to the Bundle ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: HostCollector metadata: name: run-host-cmd-and-save-output spec: collectors: - run: collectorName: "my-custom-run" command: "sh" # this is for demonstration purpose only -- you probably don't want to drop your input to the bundle! args: - "-c" - "cat $TS_INPUT_DIR/dummy.yaml > $TS_OUTPUT_DIR/dummy_content.yaml" outputDir: "myCommandOutputs" env: - AWS_REGION=us-west-1 # if ignoreParentEnvs is true, it will not inherit envs from parent process. # values specified in inheritEnv will not be used either # ignoreParentEnvs: true inheritEnvs: - USER input: dummy.conf: |- [hello] hello = 1 [bye] bye = 2 dummy.yaml: |- username: postgres password: dbHost: map: key: value list: - val1 - val2 ``` ### Included Resources The results of the `run` collector are stored in the `host-collectors/run-host` directory of the bundle. Two files per collector execution will be stored in this directory and an optional outputs directory. - `[collector-name].txt` - output of the command read from `stdout` - `[collector-name]-info.json` - the command that was executed, its exit code and any output read from `stderr`. See example below ```json { "command": "/sbin/ping -c 5 google.com", "exitCode": "0", "error": "" } ``` - `host-collectors/[collector-name]/[outputDir]` - Optional directory containing output files written by the command ran. _NOTE: If the `collectorName` field is unset, it will default to `run-host`._ Example of the resulting files: ``` # ping-google.txt PING google.com (***HIDDEN***) 56(84) bytes of data. 64 bytes from bh-in-f113.1e100.net (***HIDDEN***): icmp_seq=1 ttl=118 time=2.17 ms 64 bytes from bh-in-f113.1e100.net (***HIDDEN***): icmp_seq=2 ttl=118 time=1.29 ms 64 bytes from bh-in-f113.1e100.net (***HIDDEN***): icmp_seq=3 ttl=118 time=1.36 ms 64 bytes from bh-in-f113.1e100.net (***HIDDEN***): icmp_seq=4 ttl=118 time=1.25 ms 64 bytes from bh-in-f113.1e100.net (***HIDDEN***): icmp_seq=5 ttl=118 time=1.31 ms --- google.com ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4006ms rtt min/avg/max/mdev = 1.252/1.478/2.171/0.348 ms ``` and ``` # run-with-shell.txt 3.4G /var/lib/kurl/assets 1.3G /var/lib/kurl/assets/rook-1.5.9.tar.gz 1.1G /var/log/apiserver 897M /var/lib/kurl/assets/kubernetes-1.19.16.tar.gz 812M /var/lib/kurl/assets/docker-20.10.5.tar.gz ``` and also after running `run-host-cmd-and-save-output` collector, the file below will be in the support bundle ```yaml # host-collectors/my-custom-run/myCommandOutputs/dummy_content.yaml username: postgres password: dbHost: map: key: value list: - val1 - val2 ``` --- ## Subnet Available Collector To check if there is an available (IPv4) subnet on a node, you can use the `subnetAvailable` collector. This is useful for Pod/Service CIDR ranges. This collector searches for overlap with the routing table of the node to help avoid conflicts. ### Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), the `subnetAvailable` collector accepts the following parameters: #### `CIDRRangeAlloc` (Required) The overarching subnet range to search for available CIDR blocks to use. The format must be `"x.x.x.x/y"`, with an IPv4 network and `y` being a CIDR mask between 1 and 32. #### `desiredCIDR` (Required) An integer between 1 and 32. Searches in `CIDRRangeAlloc` for an IP subnet of this CIDR block size. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: HostPreflight metadata: name: subnet-available spec: collectors: # would output yes/no depending if there is a /22 available in 10.0.0.0/8 - subnetAvailable: CIDRRangeAlloc: "10.0.0.0/8" desiredCIDR: 22 ``` ### Included Resources The results of the `subnetAvailable` collector are stored in the `host-collectors/subnetAvailable` directory of the support bundle. #### `[collector-name].json` If the `collectorName` field is not set, the file is named `result.json`. Example of the resulting JSON file: ```json { "CIDRRangeAlloc": "10.0.0.0/8", "desiredCIDR": 22, "status": "a-subnet-is-available" } ``` ## Subnet Available Analyzer The `subnetAvailable` analyzer supports the following outcomes: - `a-subnet-is-available`: Indicates that a subnet of the `desiredCIDR` size is available within `CIDRRangeAlloc`. - `no-subnet-available`: Indicates that the entirety of `CIDRRangeAlloc` is exhausted by the node routing table, and that no subnets can be allocated of the `desiredCIDR` size. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: HostPreflight metadata: name: subnet-available spec: analyzers: - subnetAvailable: outcomes: - fail: when: "no-subnet-available" message: failed to find available subnet - pass: when: "a-subnet-is-available" message: available /22 subnet found ``` --- ## Subnet Contains IP Analyzer The `subnetContainsIP` analyzer checks if a given IP address falls within a given subnet (CIDR range). This is useful for validating network configurations and ensuring IP addresses are within expected ranges. ### Parameters #### `cidr` (Required) The subnet range to check against. The format must be `"x.x.x.x/y"`, with an IPv4 network and `y` being a CIDR mask between 1 and 32. #### `ip` (Required) The IPv4 address to check. Must be in the format `"x.x.x.x"`. ### Outcomes The analyzer supports the following conditions: - `true`: Indicates that the IP address is within the specified subnet range - `false`: Indicates that the IP address is not within the specified subnet range ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: HostPreflight metadata: name: subnet-contains-ip spec: analyzers: - subnetContainsIP: cidr: "10.0.0.0/8" ip: "10.0.0.5" outcomes: - fail: when: "false" message: The IP address is not within the subnet range - pass: when: "true" message: The IP address is within the subnet range ``` --- ## Host Sysctl Collector To collect information about the configured kernel parameters you can use the `sysctl` collector. This will read the Kernel's parameters at runtime through the `sysctl` utility, similar to what you would get by running `sysctl -a`. ### Parameters None. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sysctl spec: hostCollectors: - sysctl: {} ``` ### Included Resources The results of the `sysctl` collector are stored in the `host-collectors/system` directory of the support bundle. #### `sysctl.json` Example of the resulting JSON file: ``` { (...) "net.ipv4.conf.all.arp_accept": "0", "net.ipv4.conf.all.arp_announce": "0", "net.ipv4.conf.all.arp_evict_nocarrier": "1", "net.ipv4.conf.all.arp_filter": "0", "net.ipv4.conf.all.arp_ignore": "0", "net.ipv4.conf.all.arp_notify": "0", "net.ipv4.conf.all.drop_gratuitous_arp": "0", "net.ipv4.conf.all.proxy_arp": "0", "net.ipv4.conf.all.proxy_arp_pvlan": "0", "net.netfilter.nf_log.0": "NONE", "net.netfilter.nf_log.1": "NONE", "net.netfilter.nf_log.10": "nf_log_ipv6", "net.netfilter.nf_log.2": "nf_log_ipv4", "net.netfilter.nf_log.3": "nf_log_arp", (...) } ``` ## Host Sysctl Analyzer The `sysctl` analyzer supports multiple outcomes by validating the values of multiple properties. For example: - `net.ipv4.conf.all.arp_ignore > 2`: Value for the `net.ipv4.conf.all.arp_ignore` property is greater than 2. - `net.ipv4.conf.all.arp_filter = 0`: Value for the `net.ipv4.conf.all.arp_filter` equals 0. **Note:** inequality operators (`>`, `>=`, `<` and `<=`) will only work when the type of the value being analyzed can be converted to `int`. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: sysctl spec: hostCollectors: - sysctl: {} hostAnalyzers: - sysctl: checkName: "ARP parameters" outcomes: - fail: when: "net.ipv4.conf.all.arp_ignore > 0" message: "ARP ignore is enabled for all interfaces on the host. Disable it by running 'sysctl net.ipv4.conf.all.arp_ignore=0'." - warn: when: "net.ipv4.conf.all.arp_filter = 1" message: "ARP filtering is enabled for all interfaces on the host. Disable it by running 'sysctl net.ipv4.conf.all.arp_filter=0'." ``` --- ## System Packages Collector To collect information about the host system packages for the specified operating system, you can use the `systemPackages` collector. ### Parameters #### `ubuntu` (Optional) An array of the names of packages to collect information about if the operating system is `Ubuntu`, regardless of the version. #### `ubuntu16` (Optional) An array of the names of packages to collect information about if the operating system is `Ubuntu` version `16.x`. #### `ubuntu18` (Optional) An array of the names of packages to collect information about if the operating system is `Ubuntu` version `18.x`. #### `ubuntu20` (Optional) An array of the names of packages to collect information about if the operating system is `Ubuntu` version `20.x`. #### `rhel` (Optional) An array of the names of packages to collect information about if the operating system is `RHEL`, regardless of the version. #### `rhel7` (Optional) An array of the names of packages to collect information about if the operating system is `RHEL` version `7.x`. #### `rhel8` (Optional) An array of the names of packages to collect information about if the operating system is `RHEL` version `8.x`. #### `rhel9` (Optional) An array of the names of packages to collect information about if the operating system is `RHEL` version `9.x`. #### `centos` (Optional) An array of the names of packages to collect information about if the operating system is `CentOS`, regardless of the version. #### `centos7` (Optional) An array of the names of packages to collect information about if the operating system is `CentOS` version `7.x`. #### `centos8` (Optional) An array of the names of packages to collect information about if the operating system is `CentOS` version `8.x`. #### `centos9` (Optional) An array of the names of packages to collect information about if the operating system is `CentOS` version `9.x`. #### `ol` (Optional) An array of the names of packages to collect information about if the operating system is `Oracle Linux`, regardless of the version. #### `ol7` (Optional) An array of the names of packages to collect information about if the operating system is `Oracle Linux` version `7.x`. #### `ol8` (Optional) An array of the names of packages to collect information about if the operating system is `Oracle Linux` version `8.x`. #### `ol9` (Optional) An array of the names of packages to collect information about if the operating system is `Oracle Linux` version `9.x`. #### `rocky` (Optional) An array of the names of packages to collect information about if the operating system is `Rocky Linux`, regardless of the version. #### `rocky8` (Optional) An array of the names of packages to collect information about if the operating system is `Rocky Linux` version `8.x`. #### `rocky9` (Optional) An array of the names of packages to collect information about if the operating system is `Rocky Linux` version `9.x`. #### `amzn` (Optional) An array of the names of packages to collect information about if the operating system is `Amazon Linux`, regardless of the version. #### `amzn2` (Optional) An array of the names of packages to collect information about if the operating system is `Amazon Linux` version `2.x`. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: systemPackages spec: hostCollectors: - systemPackages: collectorName: system-packages ubuntu: - open-iscsi ubuntu20: - nmap - nfs-common centos: - iscsi-initiator-utils centos7: - libzstd centos8: - nfs-utils - openssl ``` ### Included Resources The results of the `systemPackages` collector are stored in the `host-collectors/system` directory of the support bundle. #### `[collector-name].json` If the `collectorName` field is unset, it will be named `packages.json`. Example of the resulting JSON file: ```json { "os": "ubuntu", "osVersion": "18.04", "packages": [ { "details": "Package: open-iscsi\nStatus: install ok installed\nPriority: optional\nSection: net\nInstalled-Size: 1389\nMaintainer: Ubuntu Developers \u003cubuntu-devel-discuss@lists.ubuntu.com\u003e\nArchitecture: amd64\nVersion: 2.0.874-5ubuntu2.10\nDepends: udev, debconf (\u003e= 0.5) | debconf-2.0, libc6 (\u003e= 2.14), libisns0 (\u003e= 0.96-4~), libmount1 (\u003e= 2.24.2), lsb-base (\u003e= 3.0-6)\nPre-Depends: debconf | debconf-2.0\nRecommends: busybox-initramfs\nConffiles:\n /etc/default/open-iscsi 5744c65409cbdea2bcf5b99dbff89e96\n /etc/init.d/iscsid f45c4e0127bafee72454ce97a7ce2f6c\n /etc/init.d/open-iscsi b0cdf36373e443ad1e4171959dc8046f\n /etc/iscsi/iscsid.conf fc72bdd1c530ad5b8fd5760d260c7d91\nDescription: iSCSI initiator tools\n Open-iSCSI is a high-performance, transport independent, multi-platform\n implementation of the RFC3720 Internet Small Computer Systems Interface\n (iSCSI).\n .\n Open-iSCSI is partitioned into user and kernel parts, where the kernel\n portion implements the iSCSI data path (i.e. iSCSI Read and iSCSI Write).\n The userspace contains the entire control plane:\n * Configuration Manager;\n * iSCSI Discovery;\n * Login and Logout processing;\n * Connection level error processing;\n * Nop-In and Nop-Out handling;\n * (in the future) Text processing, iSNS, SLP, Radius, etc.\n .\n This package includes a daemon, iscsid, and a management utility,\n iscsiadm.\nHomepage: http://www.open-iscsi.com/\nOriginal-Maintainer: Debian iSCSI Maintainers \u003cpkg-iscsi-maintainers@lists.alioth.debian.org\u003e\n", "exitCode": "0", "name": "open-iscsi" }, { "details": "", "error": "dpkg-query: package 'nmap' is not installed and no information is available\nUse dpkg --info (= dpkg-deb --info) to examine archive files,\nand dpkg --contents (= dpkg-deb --contents) to list their contents.\n", "exitCode": "1", "name": "nmap" } ] } ``` ## System Packages Analyzer The `systemPackages` analyzer is used to analyze information about the collected packages. For example, the analyzer can check whether a certain package is installed, if the version of a package is greater than or equal to a certain version, and more. The analyzer also supports template functions to help customize the outcomes as desired. Some of the fields that are accessible using template functions are detailed in the following JSON object: ```json { "OS": "ubuntu", "OSVersion": "18.04", "OSVersionMajor": "18", "OSVersionMinor": "4", "Name": "openssl", "Error": "", "ExitCode": "0", "IsInstalled": true, } ``` The analyzer also has access to the fields in the details field for a package from the collector. For example, in the details field in the collector output above, you can reference the Version field with `{{ .Version }}`. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: systemPackages spec: hostCollectors: - systemPackages: collectorName: system-packages ubuntu: - open-iscsi ubuntu20: - nmap - nfs-common centos: - iscsi-initiator-utils centos7: - libzstd centos8: - nfs-utils - openssl analyzers: - systemPackages: collectorName: system-packages outcomes: - fail: when: '{{ not .IsInstalled }}' message: Package {{ .Name }} is not installed - pass: message: Package {{ .Name }} is installed ``` --- ## TCP Connect Collector To collect information about the ability to connect to a specified TCP address, you can use the `tcpConnect` collector. ### Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), the `tcpConnect` collector accepts the following parameters: #### `address` (Required) The address to check the connection to. #### `timeout` (Optional) Specifies the total timeout. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: tcpConnect spec: hostCollectors: - tcpConnect: collectorName: kubernetes-api-tcp-conn-status address: 10.128.0.29:6443 timeout: 10s ``` ### Included Resources The results of the `tcpConnect` collector are stored in the `host-collectors/connect` directory of the support bundle. #### `[collector-name].json` If the `collectorName` field is unset, it will be named `connect.json`. Example of the resulting file: ``` connection-refused ``` ## TCP Connect Analyzer The `tcpConnect` analyzer supports multiple outcomes: - `connection-refused`: Connection to the address was refused. - `connection-timeout`: Timed out connecting to the address. - `connected`: Successfully connected to the address. - `error`: Unexpected error connecting to the address. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: tcpConnect spec: hostCollectors: - tcpConnect: collectorName: kubernetes-api-tcp-conn-status address: 10.128.0.29:6443 timeout: 10s hostAnalyzers: - tcpConnect: checkName: "Kubernetes API TCP Connection Status" collectorName: kubernetes-api-tcp-conn-status outcomes: - fail: when: "connection-refused" message: Connection to the Kubernetes API address was refused - fail: when: "connection-timeout" message: Timed out connecting to the Kubernetes API address - fail: when: "error" message: Unexpected error connecting to the Kubernetes API address - pass: when: "connected" message: Successfully connected to the Kubernetes API address ``` --- ## TCP Load Balancer Collector To collect information about the ability to connect to the specified TCP load balancer address, you can use the `tcpLoadBalancer` collector. This collector listens on a host port on `0.0.0.0` and then attempts to connect through a TCP load balancer. A successful connection requires sending and receiving a random token through the load balancer to the test server. ### Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), the `tcpLoadBalancer` collector accepts the following parameters: #### `port` (Required) The port number to use. #### `address` (Required) The address to check the connection to. #### `timeout` (Optional) Specifies the total timeout. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: loadbalancer spec: hostCollectors: - tcpLoadBalancer: collectorName: kubernetes-api-lb port: 6443 address: 10.128.0.29:6443 timeout: 10s ``` ### Included Resources The results of the `tcpLoadBalancer` collector are stored in the `host-collectors/tcpLoadBalancer` directory of the support bundle. #### `[collector-name].json` If the `collectorName` field is unset, it will be named `tcpLoadBalancer.json`. Example of the resulting file: ``` address-in-use ``` ## TCP Load Balancer Analyzer The `tcpLoadBalancer` analyzer supports multiple outcomes: - `invalid-address`: The load balancer address is not valid. - `connection-refused`: Connection to the load balancer address was refused. - `connection-timeout`: Timed out connecting to the load balancer address. - `address-in-use`: Specified port is unavailable. - `connected`: Successfully connected to the load balancer address. - `bind-permission-denied`: Failed to bind to the address:port. - `error`: Unexpected error connecting to the load balancer address. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: loadbalancer spec: hostCollectors: - tcpLoadBalancer: collectorName: kubernetes-api-lb port: 6443 address: 10.128.0.29:6443 timeout: 10s hostAnalyzers: - tcpLoadBalancer: checkName: "Kubernetes API Server Load Balancer" collectorName: kubernetes-api-lb outcomes: - fail: when: "invalid-address" message: The load balancer address is not valid - warn: when: "connection-refused" message: Connection to via load balancer was refused - warn: when: "connection-timeout" message: Timed out connecting to load balancer. Check your firewall. - warn: when: "error" message: Unexpected port status - warn: when: "address-in-use" message: Port 6444 is unavailable - pass: when: "connected" message: Successfully connected to load balancer - warn: message: Unexpected port status ``` --- ## TCP Port Status Collector To collect information about the specified TCP port on the host where the collector runs, you can use the `tcpPortStatus` collector. If an interface is specified in the collector, this preflight check looks up the IPv4 address of that interface, binds to it, and connects to the same address. If no interface is specified, the test server binds to `0.0.0.0` and attempts to connect to the first non-loopback IPv4 address found on a network interface on the host. ### Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), the `tcpPortStatus` collector accepts the following parameters: #### `port` (Required) The port number to check on the host where the collector is run. #### `interface` (Optional) If set, the collector uses the IP address of the of the specified interface. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: tcpPortStatus spec: hostCollectors: - tcpPortStatus: collectorName: kubernetes-api-tcp-port port: 6443 interface: eth0 ``` ### Included Resources The results of the `tcpPortStatus` collector are stored in the `host-collectors/tcpPortStatus` directory of the support bundle. #### `[collector-name].json` If the `collectorName` field is unset, it will be named `tcpPortStatus.json`. Example of the resulting file: ``` {"status":"connected","message":""} ``` ## TCP Port Status Analyzer The `tcpPortStatus` analyzer supports multiple outcomes: - `connection-refused`: Connection to the port was refused. - `connection-timeout`: Timed out connecting to the port. - `address-in-use`: Specified port is unavailable. - `connected`: Successfully connected to the port. - `error`: Unexpected error connecting to the port. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: tcpPortStatus spec: hostCollectors: - tcpPortStatus: collectorName: kubernetes-api-tcp-port port: 6443 interface: eth0 hostAnalyzers: - tcpPortStatus: checkName: "Kubernetes API TCP Port Status" collectorName: kubernetes-api-tcp-port outcomes: - fail: when: "connection-refused" message: Connection to port 6443 was refused. This is likely to be a routing problem since this preflight configures a test server to listen on this port. - warn: when: "address-in-use" message: Another process was already listening on port 6443. - fail: when: "connection-timeout" message: Timed out connecting to port 6443. Check your firewall. - fail: when: "error" message: Unexpected port status - pass: when: "connected" message: Port 6443 is open - warn: message: Unexpected port status ``` --- ## Time Collector To collect information about the system clock, you can use the `time` collector. ### Parameters None. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: time spec: hostCollectors: - time: {} ``` ### Included Resources The results of the `time` collector are stored in the `host-collectors/system` directory of the support bundle. #### `time.json` Example of the resulting JSON file: ```json {"timezone":"UTC","ntp_synchronized":true,"ntp_active":true} ``` ## Time Analyzer The time analyzer supports multiple outcomes, by checking either the ntp status or the timezone. For example: - `ntp == unsynchronized+inactive`: System clock is not synchronized. - `ntp == unsynchronized+active`: System clock not yet synchronized. - `ntp == synchronized+active`: System clock is synchronized. - `timezone != UTC`: Timezone is not set to UTC. - `timezone == UTC`: Timezone is set to UTC. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: time spec: hostCollectors: - time: {} hostAnalyzers: - time: checkName: "NTP Status" outcomes: - fail: when: "ntp == unsynchronized+inactive" message: "System clock is not synchronized" - warn: when: "ntp == unsynchronized+active" message: System clock not yet synchronized - pass: when: "ntp == synchronized+active" message: "System clock is synchronized" - warn: when: "timezone != UTC" message: "Non UTC timezone can interfere with system function" - pass: when: "timezone == UTC" message: "Timezone is set to UTC" ``` --- ## UDP Port Status Collector To collect information about the specified UDP port on the host where the collector runs, you can use the `udpPortStatus` collector. If an interface is specified in the collector, this preflight check looks up the IPv4 address of that interface and binds to it. If no interface is specified, the test server binds to `0.0.0.0`. ### Parameters In addition to the [shared collector properties](/docs/collect/collectors/#shared-properties), the `udpPortStatus` collector accepts the following parameters: #### `port` (Required) The port number to check on the host where the collector is run. #### `interface` (Optional) If set, the collector uses the IP address of the of the specified interface. ### Example Collector Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: udpPortStatus spec: hostCollectors: - udpPortStatus: collectorName: flannel-vxlan-udp-port port: 8472 ``` ### Included Resources The results of the `udpPortStatus` collector are stored in the `host-collectors/udpPortStatus` directory of the support bundle. #### `[collector-name].json` If the `collectorName` field is unset, it will be named `udpPortStatus.json`. Example of the resulting file: ``` {"status":"connected","message":""} ``` ## UDP Port Status Analyzer The `udpPortStatus` analyzer supports multiple outcomes: - `address-in-use`: Specified port is unavailable. - `connected`: Successfully bound to the port. - `error`: Unexpected error binding to the port. ### Example Analyzer Definition ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: udpPortStatus spec: hostCollectors: - udpPortStatus: collectorName: flannel-vxlan-udp-port port: 8472 hostAnalyzers: - udpPortStatus: checkName: "Flannel VXLAN UDP Status" collectorName: flannel-vxlan-udp-port outcomes: - warn: when: "address-in-use" message: Another process was already listening on port 8472. - fail: when: "error" message: Unexpected port status - pass: when: "connected" message: Port 8472 is open - warn: message: Unexpected port status ``` --- Welcome! The easiest way to get started with the Troubleshoot project is to see it in action through a couple examples. ## Installation Do one of the following to install: - Download and uncompress the latest release binary To install the latest release, download and uncompress the latest release asset for the target platform from the [release page](https://github.com/replicatedhq/troubleshoot/releases): ``` curl -L https://github.com/replicatedhq/troubleshoot/releases/latest/download/support-bundle_linux_amd64.tar.gz | tar xzvf - curl -L https://github.com/replicatedhq/troubleshoot/releases/latest/download/preflight_linux_amd64.tar.gz | tar xzvf - sudo ./support-bundle version sudo ./preflight version ``` - Install with krew You can execute Preflight Checks and Support Bundles using a client-side utility, packaged as a `kubectl` plugin that is distributed through the [krew](https://krew.dev/) package manager. If you don't already have krew installed, head over to the [krew installation guide](https://krew.sigs.k8s.io/docs/user-guide/setup/install/), follow the steps there and then come back here. Install the Preflight and Support Bundle plugin using: ```shell kubectl krew install preflight kubectl krew install support-bundle ``` **Note:** This will not install anything to your cluster, it only places a single binary per plugin in your path. ## Examples Now that you have the plugins installed lets look at a simple example of a Preflight Check and a Support Bundle to get a sense for how they are structured and how to run them. ### Example Preflight check Preflight Checks can be executed before installing an application into a Kubernetes cluster. The checks are defined in a YAML file. Run the following command on your cluster to see an example Preflight Check. ```shell # using krew kubectl preflight https://raw.githubusercontent.com/replicatedhq/troubleshoot/main/examples/preflight/sample-preflight.yaml # using installation sudo ./preflight https://raw.githubusercontent.com/replicatedhq/troubleshoot/main/examples/preflight/sample-preflight.yaml ``` ### Example Support Bundle A Support Bundle needs to know what to collect and optionally, what to analyze. This is defined in a YAML file. There's a lot already included in the default collectors. Run the following command on your cluster to see an example Support Bundle. ```shell # using krew kubectl support-bundle https://raw.githubusercontent.com/replicatedhq/troubleshoot-specs/main/in-cluster/default.yaml # using installation sudo ./support-bundle https://raw.githubusercontent.com/replicatedhq/troubleshoot-specs/main/in-cluster/default.yaml ``` ## What's next?

Create your own

Learn how to add custom preflight checks and support bundles to your application.

Read the docs

Ready to dig to the reference docs?

--- If you don't have preflight installed, you can check out the [installation guide](https://troubleshoot.sh/docs/#installation). You can run preflight checks to verify that a cluster or host meets the application requirements attempting an installation. While this capability is built in to some applications, you can run preflight checks using the CLI. ## Usage To run, `preflight` needs a specification to understand what information to collect and what to do with that information. The specification can be a YAML file, or hosted at a URL, located in an OCI registry, or provided as stdin. To use stdin, supply `-` as the argument. Example usage: ```shell ./preflight https://raw.githubusercontent.com/replicatedhq/troubleshoot/main/examples/preflight/sample-preflight.yaml kubectl preflight oci://my.oci.registry/image ./preflight my-preflight-spec.yaml helm template mychart --values my-values.yaml | ./preflight - ``` As of v0.69.0, valid input for specs can include: - Documents of `kind: Preflight` - Documents of `kind: Secret` that have the label `troubleshoot.sh/kind: preflight` Multiple YAML "documents" (specs) are supported as input, in addition all documents other than the above supported will be filtered out. This allows feeding an entire set of manifests (eg. a full Helm chart) in, and preflight will take only the relevant specs. ```shell preflight [url] [flags] [-] ``` ## Options
Flag Type (if applicable) Description
--as string Username to impersonate for the operation. User could be a regular user or a service account in a namespace.
--as-group stringArray Group to impersonate for the operation. This flag can be repeated to specify multiple groups.
--as-uid string UID to impersonate for the operation.
--cache-dir string Default cache directory. Default: ~/.kube/cache
--certificate-authority string Path to a certificate file for the certificate authority.
--client-certificate string Path to a client certificate file for TLS.
--client-key string Path to a client key file for TLS.
--cluster string The name of the kubeconfig cluster to use.
--collect-without-permissions Always run preflight checks even if some require permissions that preflight does not have. Default: true
--collector-image string The full name of the collector image to use.
--collector-pullpolicy string The pull policy of the collector image.
--context string The name of the kubeconfig context to use.
--cpuprofile string File path to write CPU profiling data.
--debug Enable debug logging. This is equivalent to --v=0.
--disable-compression If true, opt-out of response compression for all requests to the server.
--format string Output format, one of human, json, yaml. Only used when interactive is set to false. Default: human
-h, --help Help for preflight.
--insecure-skip-tls-verify If true, the server's certificate will not be checked for validity and your HTTPS connections will be insecure.
--interactive Interactive preflights. Default: true
--kubeconfig string Path to the kubeconfig file to use for CLI requests.
--memprofile string File path to write memory profiling data.
-n, --namespace string If present, the namespace scope for this CLI request.
-o, --output string Specify the output file path for the preflight checks.
--request-timeout string The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit, such as 1s, 2m, 3h. A value of zero means that requests will not time out. Default: 0
--selector string Selector (label query) to filter remote collection nodes on.
-s, --server string The address and port of the Kubernetes API server.
--since string Force pod logs collectors to return logs newer than a relative duration, such as 5s, 2m, or 3h.
--since-time string Force pod logs collectors to return logs after a specific date (RFC3339).
--tls-server-name string Server name to use for server certificate validation. If it is not provided, the hostname used to contact the server is used.
--token string Bearer token for authentication to the API server.
--user string The name of the kubeconfig user to use.
-v, --v level Number for the log level verbosity.
--- In this step, we'll add a few basic preflight checks to: 1. Verify the version of Kubernetes is supported 2. Verify that the cluster is running in a supported managed Kubernetes provider These checks are designed to show how easy it is to add new preflight checks to an application. ## Create a `preflight.yaml` :::note This tutorial demonstrates how the author a v1beta2 Preflight spec. For information about v1beta3 Preflight specs, see [About Preflight v1beta3](v1beta3-overview). ::: To start, create a new file on your computer named `preflight.yaml` and paste the following content into it: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-tutorial spec: analyzers: [] ``` We aren't going to deploy this file to our cluster. Once you've saved this file, let's run it using the `kubectl` plugin: ```shell kubectl preflight ./preflight.yaml ``` When you execute this command, you should see a spinner while the Preflight application is running, and then a message that says: ```shell Error: no data has been collected ``` We now have the basic workflow of executing preflight checks, we just haven't defined any checks yet. ## Check Kubernetes Version Let's add a check that has a minimum and a recommended Kubernetes version. We want to show a warning message if the Kubernetes cluster is not at the recommended version, and show an error message if it's below the minimum version. For this example, let's assume that our application requires Kubernetes 1.16.0 or higher, and we recommend 1.18.0 or higher. Edit the `./preflight.yaml` and add a new analyzer: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-tutorial spec: analyzers: - clusterVersion: outcomes: - fail: when: "< 1.16.0" message: The application requires at least Kubernetes 1.16.0, and recommends 1.18.0. uri: https://kubernetes.io - warn: when: "< 1.18.0" message: Your cluster meets the minimum version of Kubernetes, but we recommend you update to 1.18.0 or later. uri: https://kubernetes.io - pass: message: Your cluster meets the recommended and required versions of Kubernetes. ``` Let's review the changes to this YAML: **Line 7**: We are adding a new `clusterVersion` analyzer to be evaluated when Preflight Checks are running. This key tells the Preflight application how to interpret and parse the `outcomes` below. The documentation for `clusterVersion` is in the [Analyze documentation](/docs/analyze/cluster-version/). **Line 8**: We are defining all possible outcomes for this analysis. An outcome is the result of a preflight analysis. Outcomes are evaluated in order, much like a switch statement is evaluated in code. If an outcome matches, then execution of this preflight terminates with the result being the matching outcome. **Lines 9-12**: We are defining an outcome where where the message, icon and colors will be "fail". The `clusterVersion` analyzer accepts semver ranges for the `when` clause, and we are declaring that all versions of Kubernetes less than 1.16.0 will match this outcome. Finally, we are defining the message and URI to show in the results, when this outcome matches. **Lines 13-16**: Much like the failure outcome above, we are defining a warning level outcome for Kubernetes versions less than 1.18.0. Because the failure outcome is listed above this outcome, all Kubernetes versions less than 1.16.0 will already be removed from analysis. **Lines 17-18**: Finally, we are defining a pass outcome for any analysis that makes it this far. We don't need to include a `when` clause because this is the default outcome at this point. Just like before, save and execute these preflight checks with: ```shell kubectl preflight ./preflight.yaml ``` It will take a few seconds for the Preflight Checks to collect the required data, and then you'll see a screen similar to below. Note that you might see a failure, warning or pass message, depending on the version of Kubernetes you are running. You can press `q` to exit this screen. ## Adding Cluster Provider Let's continue this and add a preflight check that verifies that the cluster is running on a supported *distribution* of Kubernetes. Once again, we have three categories (pass, warn and fail) that we want to check for. To write this preflight check, assume that our application does not work on Docker Desktop or Microk8s. Our application is known and validated on Amazon EKS, Google GKE, and Azure AKS. Any other distribution is supported on a best-effort basis and not validated. To add this check, open that `./preflight.yaml` again and edit the contents to match: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-tutorial spec: analyzers: - clusterVersion: outcomes: - fail: when: "< 1.16.0" message: The application requires at least Kubernetes 1.16.0, and recommends 1.18.0. uri: https://kubernetes.io - warn: when: "< 1.18.0" message: Your cluster meets the minimum version of Kubernetes, but we recommend you update to 1.18.0 or later. uri: https://kubernetes.io - pass: message: Your cluster meets the recommended and required versions of Kubernetes. - distribution: outcomes: - pass: when: "== gke" message: GKE is a supported platform - pass: when: "== aks" message: AKS is a supported platform - pass: when: "== eks" message: EKS is a supported platform - fail: when: "== docker-desktop" message: This application does not support Docker Desktop - fail: when: "== microk8s" message: This application does not support Microk8s - warn: message: The Kubernetes platform is not validated, but there are no known compatibility issues. ``` Reviewing this YAML, we've added an additional analyzer in the `analyzers` key. This analyzer uses the built in [distribution](/docs/analyze/distribution/) check that has the capabilities of detecting where the cluster is located. Once again, save and run this file with: ```shell kubectl preflight ./preflight.yaml ``` After collecting and analyzing, you'll see a screen similar to below: You can use the arrow keys to move up and down to select different preflight checks and see the results. Again, press `q` to exit when finished. ## Run preflights using multiple specs > Introduced in Troubleshoot v0.50.0 You may need to run preflights using the collectors and analyzers specified in multiple different specs. As of Troubleshoot `v0.50.0`, you can now pass multiple specs as arguments to the `preflight` CLI. Run preflights using multiple specs from the filesystem ```shell ./preflight ./preflight-1.yaml ./preflight-2.yaml ``` Run preflights using a spec from a URL, a file, and from a Kubernetes secret ```shell ./preflight https://raw.githubusercontent.com/replicatedhq/troubleshoot/main/examples/preflight/sample-preflight.yaml \ ./preflight-1.yaml \ secret/path/to/my/spec ``` --- > Introduced in Troubleshoot v0.63.0 When using `preflight` via the CLI, there are specific exit codes returned based on the outcomes of any checks. ## Exit Codes | Code | Description | |------|----| | `0` | All tests passed, and no errors occurred | | `1` | Some error occurred (most likely not related to a specific preflight check), catch all | | `2` | Invalid input (CLI options, invalid Preflight spec, etc) | | `3` | At least one Preflight check resulted in a `FAIL` | | `4` | No Preflight checks failed, but at least one resulted in a `WARN` | --- This tutorial will walk you through defining a set of Preflight Checks that your customer can execute before installing your application into their Kubernetes cluster. ## Goals By completing this tutorial, you will know how to write Preflight Checks, including: 1. How to write a new Preflight Check 2. How to execute Preflight Checks against a new environment ## Prerequisites Before starting this tutorial, you should have the following: 1. The Troubleshoot plugins [installed](/docs/#installation). 2. A Kubernetes cluster and local kubectl access to the cluster. If you don't have one for testing, consider [k0s](https://k0sproject.io/), [kURL](https://kurl.sh), [KiND](https://github.com/kubernetes-sigs/kind), or [K3S](https://k3s.io). --- Congratulations. You've just completed the introduction to Preflight Checks tutorial. Next, you write your own `kind: Preflight` manifest to ensure the enviornment and configuration provided by your customer meets expectations. Head over to the [Analyzers](https://troubleshoot.sh/docs/analyze/) documentation to browse all of the built-in analyzers. ## Including Preflight Checks There are several ways to include Preflight Checks in your application. #### KOTS If you are packing a [KOTS](https://kots.io) application, you can simply include a `kind: Preflight` document in your application and the KOTS Admin Console will show a browser-based representation of the Preflight results. #### Command line Another approach to including Preflight Checks is to host the Preflight YAML on a server (even a GitHub Gist) and include instructions to manually run them before installing. Adding this to your installation documentation is just asking the user to run `kubectl preflight https://your-server/preflight.yaml`. This method allows potential customers to run Preflight Checks before attempting the installation, which will prevent common errors from misconfigured environments. For detailed instructions on using the command line, see [CLI Usage](https://troubleshoot.sh/docs/preflight/cli-usage). --- In this step, we'll expand the Preflight Checks we've already added to: 1. Verify that the cluster has at least 3 nodes 2. Verify that at least 1 node in the cluster has 16 GB of RAM and 8 CPUs ## Verify node count Let's add a Preflight Check to show an error message if the cluster does not have at least 3 nodes. To add this check, open that `./preflight.yaml` again and edit the contents to match: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-tutorial spec: analyzers: - clusterVersion: outcomes: - fail: when: "< 1.16.0" message: The application requires at least Kubernetes 1.16.0, and recommends 1.18.0. uri: https://kubernetes.io - warn: when: "< 1.18.0" message: Your cluster meets the minimum version of Kubernetes, but we recommend you update to 1.18.0 or later. uri: https://kubernetes.io - pass: message: Your cluster meets the recommended and required versions of Kubernetes. - distribution: outcomes: - pass: when: "== gke" message: GKE is a supported platform - pass: when: "== aks" message: AKS is a supported platform - pass: when: "== eks" message: EKS is a supported platform - fail: when: "== docker-desktop" message: This application does not support Docker Desktop - fail: when: "== microk8s" message: This application does not support Microk8s - warn: message: The Kubernetes platform is not validated, but there are no known compatibility issues. - nodeResources: checkName: Must have at least 3 nodes in the cluster outcomes: - fail: when: "count() < 3" message: This application requires at least 3 nodes - pass: message: This cluster has enough nodes. ``` Reviewing this YAML, we've added an additional analyzer in the `analyzers` key starting at line 38. Let's review the changes to this YAML: **Line 38**: We are adding a new `nodeResources` analyzer to be evaluated when Preflight Checks are running. This key tells the Preflight application how to interpret and parse the `outcomes` below. The documentation for `nodeResources` is in the [Analyze documentation](/docs/analyze/node-resources/). **Line 39**: Provide a custom title to show up on the results page for this check. This attribute is available for any analyzer. **Line 40**: The following section will define the possible outcomes of this analyzer. **Line 41**: Define the failure outcome first. This outcome will be evaluated and, if true, evaluation of this analyzer will stop. **Line 42**: The criteria for this analyzer to evalate. We are providing a basic analyzer here that simply checks if the total count of the nodes is less than three. If this is evaluates to true, then this analyzer will have the current outcome (fail) and stop processing. **Line 43**: The message to show when this outcome is true. **Line 44**: Define the pass outcome next. There is no `when` attribute on this outcome, so it's the catch-all. **Line 45**: This is the message to show for the pass outcome. ## Node Memory and CPU Requirements Next, we will add another analyzer to ensure that at least 1 node has a minimum of 16 GB of RAM and 8 CPU cores available. To add this check, open that `./preflight.yaml` again and edit the contents to match: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: preflight-tutorial spec: analyzers: - clusterVersion: outcomes: - fail: when: "< 1.16.0" message: The application requires at least Kubernetes 1.16.0, and recommends 1.18.0. uri: https://kubernetes.io - warn: when: "< 1.18.0" message: Your cluster meets the minimum version of Kubernetes, but we recommend you update to 1.18.0 or later. uri: https://kubernetes.io - pass: message: Your cluster meets the recommended and required versions of Kubernetes. - distribution: outcomes: - pass: when: "== gke" message: GKE is a supported platform - pass: when: "== aks" message: AKS is a supported platform - pass: when: "== eks" message: EKS is a supported platform - fail: when: "== docker-desktop" message: This application does not support Docker Desktop - fail: when: "== microk8s" message: This application does not support Microk8s - warn: message: The Kubernetes platform is not validated, but there are no known compatibility issues. - nodeResources: checkName: Must have at least 3 nodes in the cluster outcomes: - fail: when: "count() < 3" message: This application requires at least 3 nodes - pass: message: This cluster has enough nodes. - nodeResources: checkName: One node must have 16 GB RAM and 8 CPU Cores filters: allocatableMemory: 16Gi cpuCapacity: "8" outcomes: - fail: when: count() < 1 message: Cannot find a node with sufficient memory and cpu - pass: message: Sufficient CPU and memory is available ``` Reviewing this new analyzer, the interesting parts are lines 48-50 where we add filters. Filters will filter the nodes analyzed to match the specified values. In this case, we are filtering the list of nodes to those that have at least 16Gi of allocatable memory and 8 CPUs. By filtering, we can use the simple `count()` function in the outcomes to analyze the results. ## Executing the preflights Let's stop here and execute the preflight checks. For this demo, I'm running this on my local Kubernetes cluster that has 5 nodes, but each node only has 2 cores and 8GB of RAM. The expectation here is that my local cluster will pass on the 3 node minimum check, but fail on the second one because I do not have sufficient memory and CPU. I'll run this file using: ```shell kubectl preflight ./preflight.yaml ``` And the results are: ## Next Steps Continue to the final part of this tutorial to learn how to distribute Preflight Checks as part of your application or documentation. --- ## Overview This guide shows how to author preflight YAML specs in a modular, values-driven style. The goal is to keep checks self-documenting, easy to toggle on/off, and customizable via values files or inline `--set` flags. ### Core structure - **Header** - `apiVersion`: `troubleshoot.sh/v1beta3` - `kind`: `Preflight` - `metadata.name`: a short, stable identifier - **Spec** - `spec.analyzers`: list of checks (analyzers) - Each analyzer is optionally guarded by templating conditionals (e.g., `{{- if .Values.kubernetes.enabled }}`) - A `docString` accompanies each analyzer, describing the requirement, why it matters, and any links ### Example skeleton to start a new spec ```yaml apiVersion: troubleshoot.sh/v1beta3 kind: Preflight metadata: name: your-product-preflight spec: {{- /* Determine if we need explicit collectors beyond always-on clusterResources */}} {{- $needExtraCollectors := or .Values.databases.postgres.enabled .Values.http.enabled }} collectors: # Always collect cluster resources to support core analyzers - clusterResources: {} {{- if .Values.databases.postgres.enabled }} - postgres: collectorName: '{{ .Values.databases.postgres.collectorName }}' uri: '{{ .Values.databases.postgres.uri }}' {{- end }} analyzers: {{- if .Values.kubernetes.enabled }} - docString: | Title: Kubernetes Control Plane Requirements Requirement: - Version: - Minimum: {{ .Values.kubernetes.minVersion }} - Recommended: {{ .Values.kubernetes.recommendedVersion }} - Docs: https://kubernetes.io These version targets ensure required APIs and defaults are available and patched. clusterVersion: checkName: Kubernetes version outcomes: - fail: when: '< {{ .Values.kubernetes.minVersion }}' message: Requires at least Kubernetes {{ .Values.kubernetes.minVersion }}. - warn: when: '< {{ .Values.kubernetes.recommendedVersion }}' message: Recommended to use Kubernetes {{ .Values.kubernetes.recommendedVersion }} or later. - pass: when: '>= {{ .Values.kubernetes.recommendedVersion }}' message: Meets recommended and required Kubernetes versions. {{- end }} {{- if .Values.storageClass.enabled }} - docString: | Title: Default StorageClass Requirements Requirement: - A StorageClass named "{{ .Values.storageClass.className }}" must exist A default StorageClass enables dynamic PVC provisioning without manual intervention. storageClass: checkName: Default StorageClass storageClassName: '{{ .Values.storageClass.className }}' outcomes: - fail: message: Default StorageClass not found - pass: message: Default StorageClass present {{- end }} {{- if .Values.databases.postgres.enabled }} - docString: | Title: Postgres Connectivity Requirement: - Postgres checks collected by '{{ .Values.databases.postgres.collectorName }}' must pass postgres: checkName: Postgres checks collectorName: '{{ .Values.databases.postgres.collectorName }}' outcomes: - fail: message: Postgres checks failed - pass: message: Postgres checks passed {{- end }} ``` ## Use Helm Templating This section describes how to use Helm templating when authoring v1beta3 Preflight specs. ### Templating with Helm Engine v1beta3 uses Helm's rendering engine, which means you have access to: **Available Builtin Objects:** - `.Values` - Values from your values files and `--set` overrides - `.Release` - Release information (Name, Namespace, IsInstall, IsUpgrade, etc.) - `.Chart` - Chart metadata (Name, Version, AppVersion, etc.) - `.Capabilities` - Cluster capabilities (KubeVersion, APIVersions, etc.) - `.Template` - Template file information (Name, BasePath) **Sprig Functions:** Full Sprig function library is available for string manipulation, math, date functions, etc. **Supply values via:** - Multiple values files: `--values base.yaml --values prod.yaml` - Command-line overrides: `--set storage.className=fast` - Both combined (sets override files) **Example using builtin objects:** ```yaml apiVersion: troubleshoot.sh/v1beta3 kind: Preflight metadata: name: my-app-preflight spec: analyzers: {{- if .Values.kubernetes.enabled }} - clusterVersion: checkName: Kubernetes version outcomes: - fail: when: '< {{ .Values.kubernetes.minVersion }}' message: Requires Kubernetes {{ .Values.kubernetes.minVersion }}+ - pass: message: Kubernetes version meets requirements {{- end }} # Using .Capabilities to conditionally check based on Kubernetes version {{- if .Capabilities.KubeVersion.GitVersion }} - distribution: checkName: Distribution for {{ .Capabilities.KubeVersion.GitVersion }} outcomes: - pass: message: Running on {{ .Capabilities.KubeVersion.GitVersion }} {{- end }} ``` **Note on Helm integration:** While you have access to Helm builtins, Preflight specs are rendered independently. If you want to use `.Chart` or `.Release` information, you'll need to either: - Pass those values explicitly via `--set` (e.g., `--set release.name=my-release`) - Use Helm's `helm template` to pre-render your Preflight spec as part of your chart, then pipe it to preflight **Caveat about Helm built-ins (e.g., `.Capabilities`):** In Preflight, many Helm built-ins are present but often populated with defaults rather than live cluster or release data. - `.Capabilities` is not cluster-aware in this context. It comes from Helm's `chartutil.DefaultCapabilities`, not API discovery against your cluster. Do not use it to gate analyzers based on supposed API availability or Kubernetes version; instead, write analyzers that directly check for the resources/APIs you need. - Other built-ins that typically come from chart or release context (such as `.Chart`, `.Release`, and parts of `.Capabilities`) may be empty or defaulted unless you explicitly provide values. - We recommend avoiding Helm built-ins entirely inside standalone Preflight specs. If you require Helm context, move your Preflight spec into your chart and render it with `helm template`, or explicitly pass needed values via `--set`. - **Toggling sections**: wrap analyzer blocks in conditionals tied to values. ```yaml {{- if .Values.storageClass.enabled }} - docString: | Title: Default StorageClass Requirements Requirement: - A StorageClass named "{{ .Values.storageClass.className }}" must exist Default StorageClass enables dynamic PVC provisioning without manual intervention. storageClass: checkName: Default StorageClass storageClassName: '{{ .Values.storageClass.className }}' outcomes: - fail: message: Default StorageClass not found - pass: message: Default StorageClass present {{- end }} ``` - **Values**: template expressions directly use values from your values files. ```yaml {{ .Values.kubernetes.minVersion }} ``` - **Nested conditionals**: further constrain checks (e.g., only when a specific CRD is required). ```yaml {{- if .Values.crd.enabled }} - docString: | Title: Required CRD Presence Requirement: - CRD must exist: {{ .Values.crd.name }} The application depends on this CRD for controllers to reconcile desired state. customResourceDefinition: checkName: Required CRD customResourceDefinitionName: '{{ .Values.crd.name }}' outcomes: - fail: message: Required CRD not found - pass: message: Required CRD present {{- end }} ``` ### Values files: shape and examples Provide a values schema that mirrors your toggles and thresholds. Example full and minimal values are included in this repository: - `values-v1beta3-full.yaml` (all features enabled, opinionated defaults) - `values-v1beta3-minimal.yaml` (most features disabled, conservative thresholds) Typical structure: ```yaml clusterVersion: enabled: true minVersion: "1.24.0" recommendedVersion: "1.28.0" storageClass: enabled: true className: "standard" crd: enabled: true name: "samples.mycompany.com" containerRuntime: enabled: true distribution: enabled: true supported: ["eks", "gke", "aks", "kubeadm"] unsupported: [] nodeResources: count: enabled: true min: 1 recommended: 3 cpu: enabled: true min: "4" memory: enabled: true minGi: 8 recommendedGi: 16 ephemeral: enabled: true minGi: 20 recommendedGi: 50 workloads: deployments: enabled: true namespace: "default" name: "example-deploy" minReady: 1 databases: postgres: enabled: true collectorName: "postgres" uri: "postgres://user:pass@postgres:5432/db?sslmode=disable" mysql: enabled: true collectorName: "mysql" uri: "mysql://user:pass@tcp(mysql:3306)/db" ``` ### Values File Structure A typical values file mirrors the structure of your checks: ```yaml # Base requirements kubernetes: enabled: true minVersion: "1.24.0" recommendedVersion: "1.27.0" # Storage requirements storage: enabled: true className: "standard" # Node requirements nodes: enabled: true minimum: 3 # Resource requirements resources: memory: enabled: true minPerNodeGi: 16 totalMinGi: 64 cpu: enabled: true totalCores: 12 ephemeral: enabled: true minPerNodeGi: 50 # Distribution constraints distribution: enabled: true supported: [eks, gke, aks] unsupported: [kind, minikube] # Custom Resource Definitions crd: enabled: true name: "myapp.example.com" # Database connectivity databases: postgres: enabled: false uri: "" ``` ## Write Documentation This section describes how to add documentation to preflight checks when authoring v1beta3 Preflight specs. ### Write Documentation with docStrings Every analyzer should include a `docString` that describes the requirement, rationale, and links. The docString uses templates to show actual configured values rather than placeholders, and can be extracted automatically for documentation. **Example:** ```yaml - docString: | Title: Kubernetes Version Requirements Requirement: - Minimum: {{ .Values.kubernetes.minVersion }} - Recommended: {{ .Values.kubernetes.recommendedVersion }} Ensures required APIs and security patches are available. Links: - https://kubernetes.io/releases/ clusterVersion: checkName: Kubernetes version outcomes: - fail: when: '< {{ .Values.kubernetes.minVersion }}' message: Requires at least {{ .Values.kubernetes.minVersion }} - pass: message: Kubernetes version OK - docString: | Title: Storage Class Availability Requirement: - StorageClass "{{ .Values.storage.className }}" must exist Dynamic provisioning requires a properly configured StorageClass. storageClass: checkName: Storage Class storageClassName: '{{ .Values.storage.className }}' outcomes: - fail: message: StorageClass {{ .Values.storage.className }} not found - pass: message: StorageClass {{ .Values.storage.className }} exists ``` When rendered with values, the docString will show the actual requirements (e.g., "Minimum: 1.24.0" instead of a placeholder). ### Author high-quality docString blocks Every analyzer should start with a `docString` so you can extract documentation automatically: - **Title**: a concise name for the requirement - **Requirement**: bullet list of specific, testable criteria (e.g., versions, counts, names) - **Rationale**: 1–3 sentences explaining why the requirement exists and the impact if unmet - **Links**: include authoritative docs with stable URLs Example: ```yaml docString: | Title: Required CRDs and Ingress Capabilities Requirement: - Ingress Controller: Contour - CRD must be present: - Group: heptio.com - Kind: IngressRoute - Version: v1beta1 or later served version The ingress layer terminates TLS and routes external traffic to Services. Contour relies on the IngressRoute CRD to express host/path routing, TLS configuration, and policy. If the CRD is not installed and served by the API server, Contour cannot reconcile desired state, leaving routes unconfigured and traffic unreachable. ``` ### Create Dynamic docStrings and Messages This example shows how to create **dynamic docStrings and messages** that adapt based on actual cluster state. Imagine checking for sufficient memory across nodes, but providing specific feedback about what's needed: ```yaml apiVersion: troubleshoot.sh/v1beta3 kind: Preflight metadata: name: advanced-preflight spec: collectors: - clusterResources: {} analyzers: {{- if .Values.memory.enabled }} - docString: | Title: Node Memory Requirements Requirement: - Each node must have at least {{ .Values.memory.minPerNodeGi }} GiB memory - Total cluster memory must be at least {{ .Values.memory.totalMinGi }} GiB Rationale: The application workloads require {{ .Values.memory.minPerNodeGi }} GiB per node to run database replicas and caching layers. If nodes have less memory, pods will fail to schedule or may be evicted under load. nodeResources: checkName: Node memory check outcomes: - fail: when: 'min(memoryCapacity) < {{ .Values.memory.minPerNodeGi }}Gi' message: | Insufficient memory on one or more nodes. Minimum required: {{ .Values.memory.minPerNodeGi }} GiB per node Smallest node has: {{ "{{" }} min(memoryCapacity) {{ "}}" }} Action: Add {{ "{{" }} subtract({{ .Values.memory.minPerNodeGi }}Gi, min(memoryCapacity)) {{ "}}" }} more memory to the smallest node, or add nodes with at least {{ .Values.memory.minPerNodeGi }} GiB memory. - warn: when: 'sum(memoryCapacity) < {{ .Values.memory.totalMinGi }}Gi' message: | Total cluster memory below recommended minimum. Required total: {{ .Values.memory.totalMinGi }} GiB Current total: {{ "{{" }} sum(memoryCapacity) {{ "}}" }} Additional memory needed: {{ "{{" }} subtract({{ .Values.memory.totalMinGi }}Gi, sum(memoryCapacity)) {{ "}}" }} - pass: message: | Memory requirements met. Per-node minimum: {{ "{{" }} min(memoryCapacity) {{ "}}" }} (required: {{ .Values.memory.minPerNodeGi }} GiB) Total cluster: {{ "{{" }} sum(memoryCapacity) {{ "}}" }} (required: {{ .Values.memory.totalMinGi }} GiB) {{- end }} {{- if .Values.cpu.enabled }} - docString: | Title: CPU Core Requirements Requirement: - Minimum {{ .Values.cpu.totalCores }} cores across all nodes Rationale: Application services require {{ .Values.cpu.totalCores }} cores for compute-intensive workloads. The scheduler may fail to place pods if insufficient CPU capacity is available. nodeResources: checkName: Total CPU capacity outcomes: - fail: when: 'sum(cpuCapacity) < {{ .Values.cpu.totalCores }}' message: | Insufficient CPU capacity. Required: {{ .Values.cpu.totalCores }} cores Available: {{ "{{" }} sum(cpuCapacity) {{ "}}" }} cores Need {{ "{{" }} subtract({{ .Values.cpu.totalCores }}, sum(cpuCapacity)) {{ "}}" }} more cores. Consider scaling the cluster or using larger instance types. - pass: message: | CPU capacity meets requirements. Available: {{ "{{" }} sum(cpuCapacity) {{ "}}" }} cores (required: {{ .Values.cpu.totalCores }}) {{- end }} {{- if .Values.distribution.enabled }} - docString: | Title: Supported Kubernetes Distribution Requirement: - Must be one of: {{ join ", " .Values.distribution.supported }} {{- if .Values.distribution.unsupported }} - Must NOT be: {{ join ", " .Values.distribution.unsupported }} {{- end }} The application has been tested and certified on specific distributions. Using unsupported distributions may result in compatibility issues. distribution: checkName: Distribution check outcomes: {{- range $dist := .Values.distribution.unsupported }} - fail: when: '== {{ $dist }}' message: 'Distribution "{{ $dist }}" is not supported. Please use one of: {{ join ", " $.Values.distribution.supported }}' {{- end }} {{- range $dist := .Values.distribution.supported }} - pass: when: '== {{ $dist }}' message: 'Distribution "{{ $dist }}" is supported' {{- end }} - warn: message: | Unable to determine distribution. Supported distributions: {{ join ", " .Values.distribution.supported }} Please verify your cluster is running a supported distribution. {{- end }} ``` **Values file for advanced example (values-advanced.yaml):** ```yaml resources: memory: enabled: true minPerNodeGi: 16 totalMinGi: 64 cpu: enabled: true totalCores: 12 distribution: enabled: true supported: - eks - gke - aks - kops unsupported: - kind - minikube ``` **Run it:** ```bash preflight advanced-preflight.yaml --values values-advanced.yaml ``` #### How Dynamic Messages Work In the advanced example: 1. **Template expressions in outcomes** use double curly braces: `{{ "{{" }} min(memoryCapacity) {{ "}}" }}` - These are evaluated at **runtime** against collected data - Show actual cluster values in messages 2. **Template expressions in the spec** use single curly braces: `{{ .Values.resources.memory.minPerNodeGi }}` - These are evaluated at **template render time** - Insert values from your values files 3. **Math and logic in messages** can calculate gaps: ``` Need {{ "{{" }} subtract({{ .Values.resources.cpu.totalCores }}, sum(cpuCapacity)) {{ "}}" }} more cores ``` This shows "Need 4 more cores" if you require 12 but only have 8. 4. **Dynamic docStrings** reflect your actual configuration: ```yaml Title: CPU Core Requirements Requirement: - Minimum 12 cores across all nodes ``` The "12" comes from `.Values.resources.cpu.totalCores`, not a hardcoded value. ## Render Templates, Run Preflights, and Extract Documentation You can render templates, run preflights with values, and extract requirement docs without running checks. - **Render a templated preflight spec** to stdout or a file: ```bash preflight template v1beta3.yaml \ --values values-base.yaml \ --values values-prod.yaml \ --set storage.className=fast-local \ -o rendered-preflight.yaml ``` - **Run preflights with values** (values and sets also work with `preflight` root command): ```bash preflight run rendered-preflight.yaml # or run directly against the template with values preflight run v1beta3.yaml --values values-prod.yaml --set cluster.minNodes=5 ``` - **Extract only documentation** from enabled analyzers in one or more templates: ```bash preflight docs v1beta3.yaml other-spec.yaml \ --values values-prod.yaml \ --set kubernetes.enabled=true \ -o REQUIREMENTS.md ``` Notes: - Multiple `--values` files are merged in order; later files win. - `--set` uses Helm-style semantics for nested keys and types, applied after files. ### Rendering Templates Preview the rendered YAML before running checks: ```bash # Render with values preflight template my-spec.yaml --values values.yaml # Render with multiple values files and overrides preflight template my-spec.yaml \ --values base.yaml \ --values prod.yaml \ --set cluster.minNodes=5 \ -o rendered.yaml ``` ### Running Templated Checks Run preflights directly with values: ```bash # Run with values file preflight my-spec.yaml --values prod-values.yaml # Run with overrides preflight my-spec.yaml \ --values base.yaml \ --set kubernetes.version.minimum=1.25.0 # Run already-rendered spec preflight rendered.yaml ``` ### Extract Documentation When Running Checks Generate markdown documentation from enabled checks: ```bash # Extract docs with specific values preflight docs my-spec.yaml \ --values prod-values.yaml \ -o REQUIREMENTS.md # Extract from multiple specs preflight docs spec1.yaml spec2.yaml \ --values shared-values.yaml \ -o REQUIREMENTS.md ``` ## CLI Reference ### Template Command ```bash preflight template SPEC [flags] Flags: --values strings Values files (can be repeated, merged in order) --set strings Override individual values (Helm-style: key=value, key.nested=value) -o, --output Write output to file instead of stdout ``` ### Docs Command ```bash preflight docs SPEC [SPEC...] [flags] Flags: --values strings Values files to use when rendering specs --set strings Override individual values -o, --output Write documentation to file ``` ### Run with Values ```bash preflight [run] SPEC [flags] Flags: --values strings Values files to use --set strings Override values # ... all other preflight run flags ``` ## Authoring Best Practices This section includes guidelines and best practices for authoring v1beta3 Preflight specs. ### Best Practices 1. **Always use `docString`** - Makes specs self-documenting and enables automated doc generation 2. **Gate optional checks** - Use `{{- if .Values.feature.enabled }}` so users enable only what they need 3. **Parameterize thresholds** - Never hardcode values; use `.Values` expressions 4. **Provide clear messages** - Use dynamic expressions to show actual vs. required values 5. **Include rationale** - Explain *why* a requirement exists in the docString 6. **Link to docs** - Add authoritative documentation URLs 7. **Test multiple scenarios** - Render with different values files (minimal, full, production) 8. **Use meaningful checkNames** - These appear in output and should be user-friendly ### Authoring checklist - Add `docString` with Title, Requirement bullets, rationale, and links. - Gate optional analyzers with `{{- if .Values.analyzers..enabled }}`. - Parameterize thresholds and names with `.Values` expressions. - Ensure all required values are present in your values files since there are no fallback defaults. - Use precise, user-actionable `message` text for each outcome; add `uri` where helpful. - Prefer a minimal values file with everything disabled, and a full values file enabling most checks. - Test with `preflight template` (no values, minimal, full) and verify `preflight docs` output reads well. ### Design conventions for maintainability - **Guard every optional analyzer** with a values toggle, so consumers can enable only what they need. - **Always include collectors section** when analyzers require them (databases, http, registryImages, etc.). - **Use `checkName`** to provide a stable, user-facing label for each check. - **Prefer `fail` for unmet hard requirements**, `warn` for soft requirements, and `pass` with a direct, affirmative message. - **Attach `uri`** to outcomes when helpful for remediation. - **Keep docString in sync** with the actual checks; avoid drift by templating values into both the docs and the analyzer. - **Ensure values files contain all required fields** since templates now directly use values without fallback defaults. ### Choose the right analyzer type and outcomes Use the analyzer that matches the requirement, and enumerate `outcomes` with clear messages. Common analyzers in this style: - **clusterVersion**: compare to min and recommended versions ```yaml clusterVersion: checkName: Kubernetes version outcomes: - fail: when: '< {{ .Values.kubernetes.minVersion }}' message: Requires at least Kubernetes {{ .Values.kubernetes.minVersion }}. - warn: when: '< {{ .Values.kubernetes.recommendedVersion }}' message: Recommended to use Kubernetes {{ .Values.kubernetes.recommendedVersion }} or later. - pass: when: '>= {{ .Values.kubernetes.recommendedVersion }}' message: Meets recommended and required Kubernetes versions. ``` - **customResourceDefinition**: ensure a CRD exists ```yaml customResourceDefinition: checkName: Required CRD customResourceDefinitionName: '{{ .Values.crd.name }}' outcomes: - fail: message: Required CRD not found - pass: message: Required CRD present ``` - **containerRuntime**: verify container runtime ```yaml containerRuntime: outcomes: - pass: when: '== containerd' message: containerd runtime detected - fail: message: Unsupported container runtime; containerd required ``` - **storageClass**: check for a named StorageClass (often the default) ```yaml storageClass: checkName: Default StorageClass storageClassName: '{{ .Values.analyzers.storageClass.className }}' outcomes: - fail: message: Default StorageClass not found - pass: message: Default StorageClass present ``` - **distribution**: whitelist/blacklist distributions ```yaml distribution: checkName: Supported distribution outcomes: {{- range $d := .Values.distribution.unsupported }} - fail: when: '== {{ $d }}' message: '{{ $d }} is not supported' {{- end }} {{- range $d := .Values.distribution.supported }} - pass: when: '== {{ $d }}' message: '{{ $d }} is a supported distribution' {{- end }} - warn: message: Unable to determine the distribution ``` - **nodeResources**: aggregate across nodes; common patterns include count, CPU, memory, and ephemeral storage ```yaml # Node count requirement nodeResources: checkName: Node count outcomes: - fail: when: 'count() < {{ .Values.nodeResources.count.min }}' message: Requires at least {{ .Values.nodeResources.count.min }} nodes - warn: when: 'count() < {{ .Values.nodeResources.count.recommended }}' message: Recommended at least {{ .Values.nodeResources.count.recommended }} nodes - pass: message: Cluster has sufficient nodes # Cluster CPU total nodeResources: checkName: Cluster CPU total outcomes: - fail: when: 'sum(cpuCapacity) < {{ .Values.nodeResources.cpu.min }}' message: Requires at least {{ .Values.nodeResources.cpu.min }} cores - pass: message: Cluster CPU capacity meets requirement # Per-node memory (Gi) nodeResources: checkName: Per-node memory outcomes: - fail: when: 'min(memoryCapacity) < {{ .Values.nodeResources.memory.minGi }}Gi' message: All nodes must have at least {{ .Values.nodeResources.memory.minGi }} GiB - warn: when: 'min(memoryCapacity) < {{ .Values.nodeResources.memory.recommendedGi }}Gi' message: Recommended {{ .Values.nodeResources.memory.recommendedGi }} GiB per node - pass: message: All nodes meet recommended memory # Per-node ephemeral storage (Gi) nodeResources: checkName: Per-node ephemeral storage outcomes: - fail: when: 'min(ephemeralStorageCapacity) < {{ .Values.nodeResources.ephemeral.minGi }}Gi' message: All nodes must have at least {{ .Values.nodeResources.ephemeral.minGi }} GiB - warn: when: 'min(ephemeralStorageCapacity) < {{ .Values.nodeResources.ephemeral.recommendedGi }}Gi' message: Recommended {{ .Values.nodeResources.ephemeral.recommendedGi }} GiB per node - pass: message: All nodes meet recommended ephemeral storage ``` - **deploymentStatus**: verify workload deployment status ```yaml deploymentStatus: checkName: Deployment ready namespace: '{{ .Values.workloads.deployments.namespace }}' name: '{{ .Values.workloads.deployments.name }}' outcomes: - fail: when: absent message: Deployment not found - fail: when: '< {{ .Values.workloads.deployments.minReady }}' message: Deployment has insufficient ready replicas - pass: when: '>= {{ .Values.workloads.deployments.minReady }}' message: Deployment has sufficient ready replicas ``` - **postgres/mysql/redis**: database connectivity (requires collectors) ```yaml # Collector section - postgres: collectorName: '{{ .Values.databases.postgres.collectorName }}' uri: '{{ .Values.databases.postgres.uri }}' # Analyzer section postgres: checkName: Postgres checks collectorName: '{{ .Values.databases.postgres.collectorName }}' outcomes: - fail: message: Postgres checks failed - pass: message: Postgres checks passed ``` - **textAnalyze/yamlCompare/jsonCompare**: analyze collected data ```yaml textAnalyze: checkName: Text analyze collectorName: 'cluster-resources' fileName: '{{ .Values.textAnalyze.fileName }}' regex: '{{ .Values.textAnalyze.regex }}' outcomes: - fail: message: Pattern matched in files - pass: message: Pattern not found ``` ## References - Example template in this repo: `v1beta3-all-analyzers.yaml` - Values example: `values-v1beta3-all-analyzers.yaml` --- ## Overview This guide walks through converting v1beta2 Preflight specs to v1beta3. The v1beta3 format introduces templating and values-driven configuration, making specs more flexible and maintainable. ## Why Migrate? **v1beta3 offers several advantages:** 1. **Reusable specs** - One spec works across multiple environments with different values files 2. **Dynamic configuration** - Toggle checks on/off without editing YAML 3. **Self-documenting** - Extract requirements documentation automatically 4. **Type-safe values** - Centralized values with clear structure 5. **Reduced duplication** - Template repeated patterns instead of copy-paste 6. **Runtime context** - Messages show actual vs. required values dynamically **v1beta2 remains supported** - This is not a breaking change. Migrate when the benefits match your needs. ## Key Differences | Feature | v1beta2 | v1beta3 | |---------|---------|---------| | **API Version** | `troubleshoot.sh/v1beta2` | `troubleshoot.sh/v1beta3` | | **Templating** | Not supported | Go templates + Sprig | | **Values** | Hardcoded in spec | External values files | | **Documentation** | Comments only | Extractable `docString` | | **Configuration** | Edit YAML | Supply different values | | **Toggles** | Maintain multiple files | Conditional blocks | ## Migration Process ### Migration Checklist - [ ] Update `apiVersion` to `troubleshoot.sh/v1beta3` - [ ] Add `docString` to every analyzer - [ ] Extract hardcoded values to a values file - [ ] Replace hardcoded values with `{{ .Values.* }}` expressions - [ ] Wrap optional checks in `{{- if .Values.feature.enabled }}` - [ ] Update messages to use runtime expressions where helpful - [ ] Test rendering: `preflight template spec.yaml --values values.yaml` - [ ] Test with multiple values scenarios (dev, prod, minimal) - [ ] Extract docs: `preflight docs spec.yaml --values values.yaml` - [ ] Verify extracted documentation is clear and complete ### Step 1: Change API Version ```yaml # v1beta2 apiVersion: troubleshoot.sh/v1beta2 # v1beta3 apiVersion: troubleshoot.sh/v1beta3 ``` ### Step 2: Add docStrings Add documentation to each analyzer using `docString`. This should describe the requirement, rationale, and links. **Before (v1beta2):** ```yaml spec: analyzers: - clusterVersion: checkName: Kubernetes version outcomes: - fail: when: "< 1.24.0" message: Requires Kubernetes 1.24.0 or later - pass: message: Kubernetes version is supported ``` **After (v1beta3):** ```yaml spec: analyzers: - docString: | Title: Kubernetes Version Requirements Requirement: - Minimum: 1.24.0 - Recommended: 1.27.0 Ensures required APIs and security patches are available. Older versions may lack necessary features or contain known vulnerabilities. Links: - https://kubernetes.io/releases/ clusterVersion: checkName: Kubernetes version outcomes: - fail: when: "< 1.24.0" message: Requires Kubernetes 1.24.0 or later - pass: message: Kubernetes version is supported ``` ### Step 3: Extract Values Identify hardcoded values and move them to a values file. **Before (v1beta2):** ```yaml spec: analyzers: - clusterVersion: outcomes: - fail: when: "< 1.24.0" message: Requires Kubernetes 1.24.0 or later - warn: when: "< 1.27.0" message: Kubernetes 1.27.0 or later recommended - storageClass: storageClassName: "standard" outcomes: - fail: message: StorageClass 'standard' not found - nodeResources: outcomes: - fail: when: "count() < 3" message: Requires at least 3 nodes ``` **Create values.yaml:** ```yaml kubernetes: enabled: true minVersion: "1.24.0" recommendedVersion: "1.27.0" storage: enabled: true className: "standard" nodes: enabled: true minimum: 3 ``` **After (v1beta3):** ```yaml spec: analyzers: {{- if .Values.kubernetes.enabled }} - docString: | Title: Kubernetes Version Requirement: - Minimum: {{ .Values.kubernetes.minVersion }} - Recommended: {{ .Values.kubernetes.recommendedVersion }} Ensures compatibility with required APIs. clusterVersion: checkName: Kubernetes version outcomes: - fail: when: '< {{ .Values.kubernetes.minVersion }}' message: Requires Kubernetes {{ .Values.kubernetes.minVersion }} or later - warn: when: '< {{ .Values.kubernetes.recommendedVersion }}' message: Kubernetes {{ .Values.kubernetes.recommendedVersion }} or later recommended - pass: message: Kubernetes version meets requirements {{- end }} {{- if .Values.storage.enabled }} - docString: | Title: Storage Class Requirement: - StorageClass "{{ .Values.storage.className }}" must exist Required for dynamic volume provisioning. storageClass: checkName: Storage Class storageClassName: '{{ .Values.storage.className }}' outcomes: - fail: message: StorageClass {{ .Values.storage.className }} not found - pass: message: StorageClass {{ .Values.storage.className }} exists {{- end }} {{- if .Values.nodes.enabled }} - docString: | Title: Node Count Requirement: - Minimum {{ .Values.nodes.minimum }} nodes Ensures high availability and capacity. nodeResources: checkName: Node count outcomes: - fail: when: 'count() < {{ .Values.nodes.minimum }}' message: Requires at least {{ .Values.nodes.minimum }} nodes - pass: message: Sufficient nodes available {{- end }} ``` ### Step 4: Add Conditional Toggles Wrap optional analyzers in conditionals so they can be enabled/disabled via values. This is especially useful for: - Database checks (not all deployments use databases) - Distribution-specific checks - Development vs. production requirements - Custom resource definitions that may not always be needed **Pattern:** ```yaml {{- if .Values.feature.enabled }} - docString: | ... analyzerType: ... {{- end }} ``` ## Complete Example: Side-by-Side ### v1beta2 Original ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Preflight metadata: name: my-app-preflight spec: collectors: - clusterResources: {} analyzers: # Check Kubernetes version - clusterVersion: outcomes: - fail: when: "< 1.24.0" message: Kubernetes 1.24.0 or later required - warn: when: "< 1.27.0" message: Kubernetes 1.27.0 recommended - pass: message: Kubernetes version OK # Check node count - nodeResources: outcomes: - fail: when: "count() < 3" message: At least 3 nodes required for HA - pass: message: Sufficient nodes # Check memory per node - nodeResources: outcomes: - fail: when: "min(memoryCapacity) < 16Gi" message: Each node must have at least 16 GiB memory - pass: message: Memory requirements met # Check storage class - storageClass: storageClassName: "standard" outcomes: - fail: message: StorageClass 'standard' not found - pass: message: StorageClass exists # Check distribution - distribution: outcomes: - fail: when: "== kind" message: kind is not supported in production - pass: when: "== eks" message: EKS is supported - pass: when: "== gke" message: GKE is supported - pass: when: "== aks" message: AKS is supported - warn: message: Unknown distribution ``` ### v1beta3 Converted **preflight-v1beta3.yaml:** ```yaml apiVersion: troubleshoot.sh/v1beta3 kind: Preflight metadata: name: my-app-preflight spec: collectors: - clusterResources: {} analyzers: {{- if .Values.kubernetes.enabled }} - docString: | Title: Kubernetes Version Requirements Requirement: - Minimum: {{ .Values.kubernetes.minVersion }} - Recommended: {{ .Values.kubernetes.recommendedVersion }} Ensures the cluster has required APIs and security patches. Older versions may not support features required by this application. Links: - https://kubernetes.io/releases/ clusterVersion: checkName: Kubernetes version outcomes: - fail: when: '< {{ .Values.kubernetes.minVersion }}' message: Kubernetes {{ .Values.kubernetes.minVersion }} or later required - warn: when: '< {{ .Values.kubernetes.recommendedVersion }}' message: Kubernetes {{ .Values.kubernetes.recommendedVersion }} recommended for optimal performance - pass: message: Kubernetes version meets all requirements {{- end }} {{- if .Values.nodes.enabled }} - docString: | Title: High Availability Node Count Requirement: - Minimum {{ .Values.nodes.minimum }} nodes Multiple nodes ensure the application remains available during node maintenance or failures. Single-node clusters risk downtime. nodeResources: checkName: Node count outcomes: - fail: when: 'count() < {{ .Values.nodes.minimum }}' message: At least {{ .Values.nodes.minimum }} nodes required for high availability - pass: message: Sufficient nodes for HA ({{ "{{" }} count() {{ "}}" }} nodes) {{- end }} {{- if .Values.memory.enabled }} - docString: | Title: Per-Node Memory Requirements Requirement: - Each node: minimum {{ .Values.memory.minGi }} GiB Application pods require {{ .Values.memory.minGi }} GiB to run database and cache workloads. Insufficient memory causes scheduling failures. nodeResources: checkName: Node memory outcomes: - fail: when: 'min(memoryCapacity) < {{ .Values.memory.minGi }}Gi' message: Each node must have at least {{ .Values.memory.minGi }} GiB memory - pass: message: Memory requirements met (smallest node: {{ "{{" }} min(memoryCapacity) {{ "}}" }}) {{- end }} {{- if .Values.storage.enabled }} - docString: | Title: Storage Class Availability Requirement: - StorageClass "{{ .Values.storage.className }}" must exist Dynamic volume provisioning depends on a configured StorageClass. Without it, PVCs cannot be automatically fulfilled. storageClass: checkName: Storage Class storageClassName: '{{ .Values.storage.className }}' outcomes: - fail: message: StorageClass '{{ .Values.storage.className }}' not found - pass: message: StorageClass '{{ .Values.storage.className }}' exists {{- end }} {{- if .Values.distribution.enabled }} - docString: | Title: Supported Kubernetes Distribution Requirement: - Must be one of: {{ join ", " .Values.distribution.supported }} - Must NOT be: {{ join ", " .Values.distribution.unsupported }} This application is tested and certified on specific distributions. Unsupported distributions may have compatibility issues. distribution: checkName: Distribution check outcomes: {{- range $dist := .Values.distribution.unsupported }} - fail: when: '== {{ $dist }}' message: '{{ $dist }} is not supported in production' {{- end }} {{- range $dist := .Values.distribution.supported }} - pass: when: '== {{ $dist }}' message: '{{ $dist }} is a supported distribution' {{- end }} - warn: message: Unable to determine distribution. Supported: {{ join ", " .Values.distribution.supported }} {{- end }} ``` **values.yaml:** ```yaml kubernetes: enabled: true minVersion: "1.24.0" recommendedVersion: "1.27.0" nodes: enabled: true minimum: 3 memory: enabled: true minGi: 16 storage: enabled: true className: "standard" distribution: enabled: true supported: - eks - gke - aks unsupported: - kind ``` **Usage:** ```bash # Render template preflight template preflight-v1beta3.yaml --values values.yaml # Run checks preflight preflight-v1beta3.yaml --values values.yaml # Extract documentation preflight docs preflight-v1beta3.yaml --values values.yaml -o REQUIREMENTS.md ``` ## Common Migration Patterns ### Pattern 1: Multiple Environment Configurations **Before (v1beta2):** Maintain separate files per environment. ``` preflight-dev.yaml preflight-staging.yaml preflight-prod.yaml ``` **After (v1beta3):** One spec, multiple values files. ```yaml # preflight.yaml (shared) apiVersion: troubleshoot.sh/v1beta3 kind: Preflight spec: analyzers: {{- if .Values.nodes.enabled }} - nodeResources: outcomes: - fail: when: 'count() < {{ .Values.nodes.minimum }}' {{- end }} ``` ```yaml # values-dev.yaml nodes: enabled: false # Dev can be single-node # values-prod.yaml nodes: enabled: true minimum: 3 # Prod requires HA ``` ```bash # Dev preflight preflight.yaml --values values-dev.yaml # Prod preflight preflight.yaml --values values-prod.yaml ``` ### Pattern 2: Optional Database Checks **Before (v1beta2):** Comment out or manually remove. ```yaml # Uncomment if using PostgreSQL # - postgres: # uri: "postgres://..." ``` **After (v1beta3):** Toggle in values. ```yaml spec: collectors: {{- if .Values.databases.postgres.enabled }} - postgres: collectorName: postgres uri: '{{ .Values.databases.postgres.uri }}' {{- end }} analyzers: {{- if .Values.databases.postgres.enabled }} - docString: | Title: PostgreSQL Connectivity Requirement: - Database must be reachable at {{ .Values.databases.postgres.uri }} Application requires PostgreSQL for persistent storage. postgres: collectorName: postgres outcomes: - fail: message: Cannot connect to PostgreSQL - pass: message: PostgreSQL connection successful {{- end }} ``` ```yaml # values.yaml databases: postgres: enabled: true uri: "postgres://user:pass@postgres:5432/db?sslmode=disable" ``` ### Pattern 3: Dynamic Outcome Messages **Before (v1beta2):** Static messages. ```yaml - nodeResources: outcomes: - fail: when: "sum(cpuCapacity) < 8" message: "Need at least 8 CPU cores" ``` **After (v1beta3):** Show actual gap. ```yaml - nodeResources: outcomes: - fail: when: 'sum(cpuCapacity) < {{ .Values.cpu.minimum }}' message: | Insufficient CPU capacity. Required: {{ .Values.cpu.minimum }} cores Available: {{ "{{" }} sum(cpuCapacity) {{ "}}" }} cores Need {{ "{{" }} subtract({{ .Values.cpu.minimum }}, sum(cpuCapacity)) {{ "}}" }} more cores ``` ## Tips and Best Practices 1. **Start with values extraction** - Identify all hardcoded values first 2. **Group related values** - Use nested structure (e.g., `nodes.count`, `nodes.memory`) 3. **Always include `enabled` flags** - Allows disabling entire check groups 4. **Keep docStrings simple** - Use templates minimally in docs, primarily for values 5. **Test incrementally** - Migrate one analyzer at a time, test between changes 6. **Use consistent naming** - Match values keys to analyzer types where possible 7. **Provide defaults** - Create a "base" values file with sensible defaults 8. **Document values schema** - Add comments in values files explaining each field ## Troubleshooting ### Template Syntax Errors **Error:** `template: ...:10:5: executing "..." at <.Values.foo>: map has no entry for key "foo"` **Fix:** Ensure the value exists in your values file. Check spelling and nesting. ```yaml # Missing: foo bar: value # Fixed: foo: value bar: value ``` ### Quoting Issues **Error:** YAML parsing errors with template expressions. **Fix:** Always quote template expressions in YAML string contexts: ```yaml # Wrong storageClassName: {{ .Values.storage.className }} # Correct storageClassName: '{{ .Values.storage.className }}' ``` ### Conditional Not Working **Error:** Check still runs even though `enabled: false` **Fix:** Ensure conditional syntax is correct: ```yaml # Wrong - note the spacing {{ if .Values.feature.enabled }} # Correct - note the dash to strip whitespace {{- if .Values.feature.enabled }} ``` ### Runtime vs. Template Expressions **Confusion:** When to use `{{ }}` vs `{{ "{{" }} {{ "}}" }}`? **Template-time** (single braces): Values from values files, rendered before execution ```yaml when: '< {{ .Values.kubernetes.minVersion }}' ``` **Runtime** (escaped braces): Collector data, evaluated during preflight execution ```yaml message: 'Found {{ "{{" }} count() {{ "}}" }} nodes' ``` ## Additional Resources - [v1beta3 Overview](./v1beta3-overview.md) - Complete guide to v1beta3 features - [Authoring Guide](./v1beta3-guide.md) - Detailed reference for all analyzer types - [Analyze Reference](/docs/analyze) - All available analyzers - [Collect Reference](/docs/collect) - All available collectors ## Need Help? - Open an issue: [GitHub Issues](https://github.com/replicatedhq/troubleshoot/issues) - Ask in discussions: [GitHub Discussions](https://github.com/replicatedhq/troubleshoot/discussions) --- ## Overview Preflight v1beta3 introduces a templated, values-driven approach to authoring Preflight checks. This allows you to: - **Template your checks** using Go templates and Sprig functions - **Drive configuration with values files** similar to Helm charts - **Toggle checks on/off** based on deployment requirements - **Generate dynamic documentation** that reflects actual configuration - **Maintain reusable, modular specs** that work across environments With v1beta3, you can write a single Preflight spec that adapts to different scenarios by supplying different values files or command-line overrides. ## Basic Example Here's a simple v1beta3 spec with a few common checks: ```yaml apiVersion: troubleshoot.sh/v1beta3 kind: Preflight metadata: name: basic-preflight spec: collectors: - clusterResources: {} analyzers: {{- if .Values.kubernetes.enabled }} - docString: | Title: Kubernetes Version Requirement: - Minimum: {{ .Values.kubernetes.minVersion }} Ensures the cluster meets minimum API requirements. clusterVersion: checkName: Kubernetes version outcomes: - fail: when: '< {{ .Values.kubernetes.minVersion }}' message: Requires Kubernetes {{ .Values.kubernetes.minVersion }} or later - pass: message: Kubernetes version meets requirements {{- end }} {{- if .Values.storage.enabled }} - docString: | Title: Default Storage Class Requirement: - StorageClass "{{ .Values.storage.className }}" must exist Enables dynamic volume provisioning for application data. storageClass: checkName: Default StorageClass storageClassName: '{{ .Values.storage.className }}' outcomes: - fail: message: StorageClass not found - pass: message: StorageClass {{ .Values.storage.className }} is available {{- end }} {{- if .Values.nodes.enabled }} - docString: | Title: Minimum Node Count Requirement: - At least {{ .Values.nodes.minimum }} nodes required Ensures sufficient capacity for high availability. nodeResources: checkName: Node count outcomes: - fail: when: 'count() < {{ .Values.nodes.minimum }}' message: Requires at least {{ .Values.nodes.minimum }} nodes (found {{ "{{" }} count() {{ "}}" }}) - pass: message: Cluster has sufficient nodes {{- end }} ``` **Corresponding values file (values.yaml):** ```yaml kubernetes: enabled: true minVersion: "1.24.0" storage: enabled: true className: "standard" nodes: enabled: true minimum: 3 ``` **Run it:** ```bash preflight basic-preflight.yaml --values values.yaml ``` ## Get Started - Review the [v1beta3 migration guide](./v1beta3-migration.md) to convert existing v1beta2 specs - See the [authoring guide](./v1beta3-guide.md) for detailed reference on all analyzer types - Explore analyzer types in the [Analyze section](/docs/analyze) - Learn about collectors in the [Collect section](/docs/collect) --- Troubleshoot automatically redacts API token environment variables in JSON. This redaction is equivalent to the following redact yaml: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Redactor metadata: name: API Tokens spec: redactors: - name: Redact values for environment variables with names beginning with 'token' removals: regex: - redactor: '(?i)(\\\"name\\\":\\\"[^\"]*token[^\"]*\\\",\\\"value\\\":\\\")(?P[^\"]*)(\\\")' - name: Redact values that look like API tokens in multiline JSON removals: regex: - selector: '(?i)"name": *".*token[^\"]*"' redactor: '(?i)("value": *")(?P.*[^\"]*)(")' ``` --- Troubleshoot automatically redacts AWS credential environment variables in JSON. This redaction is equivalent to the following redact yaml: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Redactor metadata: name: AWS Credentials spec: redactors: - name: Redact values for environment variables that look like AWS Secret Access Keys removals: regex: - redactor: '(?i)(\\\"name\\\":\\\"[^\"]*SECRET_?ACCESS_?KEY\\\",\\\"value\\\":\\\")(?P[^\"]*)(\\\")' - name: Redact values for environment variables that look like AWS Access Keys removals: regex: - redactor: '(?i)(\\\"name\\\":\\\"[^\"]*ACCESS_?KEY_?ID\\\",\\\"value\\\":\\\")(?P[^\"]*)(\\\")' - name: Redact values for environment variables that look like AWS Owner or Account numbers removals: regex: - redactor: '(?i)(\\\"name\\\":\\\"[^\"]*OWNER_?ACCOUNT\\\",\\\"value\\\":\\\")(?P[^\"]*)(\\\")' - name: Redact AWS Secret Access Key values in multiline JSON removals: regex: - selector: '(?i)"name": *"[^\"]*SECRET_?ACCESS_?KEY[^\"]*"' redactor: '(?i)("value": *")(?P.*[^\"]*)(")' - name: Redact AWS Access Key ID values in multiline JSON removals: regex: - selector: '(?i)"name": *"[^\"]*ACCESS_?KEY_?ID[^\"]*"' redactor: '(?i)("value": *")(?P.*[^\"]*)(")' - name: Redact AWS Owner and Account Numbers in multiline JSON removals: regex: - selector: '(?i)"name": *"[^\"]*OWNER_?ACCOUNT[^\"]*"' redactor: '(?i)("value": *")(?P.*[^\"]*)(")' ``` --- Automatically enabled is a set of built in redactors... --- Troubleshoot automatically redacts database connection strings containing a username and password, standard Postgres and MySQL connection string components, and 'database' environment variables in JSON. This redaction is equivalent to the following redact yaml: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Redactor metadata: name: Database Connection Strings spec: redactors: - name: Redact database connection strings that contain username and password removals: regex: - redactor: '\b(?P[^:\"\/]*){1}(:)(?P[^:\"\/]*){1}(@tcp\()(?P[^:\"\/]*){1}(?P:[\d]*)?(\)\/)(?P[\w\d\S-_]+){1}\b' - name: Redact values for environment variables with names beginning with 'database' removals: regex: - redactor: '(?i)(\\\"name\\\":\\\"[^\"]*database[^\"]*\\\",\\\"value\\\":\\\")(?P[^\"]*)(\\\")' - name: Redact 'Data Source' values commonly found in database connection strings removals: regex: - redactor: '(?i)(Data Source *= *)(?P[^\;]+)(;)' - name: Redact 'location' values commonly found in database connection strings removals: regex: - redactor: '(?i)(location *= *)(?P[^\;]+)(;)' - name: Redact 'User ID' values commonly found in database connection strings removals: regex: - redactor: '(?i)(User ID *= *)(?P[^\;]+)(;)' - name: Redact 'password' values commonly found in database connection strings removals: regex: - redactor: '(?i)(password *= *)(?P[^\;]+)(;)' - name: Redact 'Server' values commonly found in database connection strings removals: regex: - redactor: '(?i)(Server *= *)(?P[^\;]+)(;)' - name: Redact 'Database' values commonly found in database connection strings removals: regex: - redactor: '(?i)(Database *= *)(?P[^\;]+)(;)' - name: Redact 'UID' values commonly found in database connection strings removals: regex: - redactor: '(?i)(Uid *= *)(?P[^\;]+)(;)' - name: Redact 'Pwd' values commonly found in database connection strings removals: regex: - redactor: '(?i)(Pwd *= *)(?P[^\;]+)(;)' - name: Redact values for environment variables with names beginning with 'database' removals: regex: - redactor: '(?i)(\\\"name\\\":\\\"[^\"]*database[^\"]*\\\",\\\"value\\\":\\\")(?P[^\"]*)(\\\")' - name: Redact database connection strings in multiline JSON removals: regex: - selector: '(?i)"name": *".*database[^\"]*"' redactor: '(?i)("value": *")(?P.*[^\"]*)(")' ``` --- Troubleshoot automatically redacts http/ftp connection strings containing a username and password. This redaction is equivalent to the following redact yaml: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Redactor metadata: name: Connection Strings spec: redactors: - name: Redact connection strings with username and password removals: regex: - redactor: '(?i)(https?|ftp)(:\/\/)(?P[^:\"\/]+){1}(:)(?P[^@\"\/]+){1}(?P@[^:\/\s\"]+){1}(?P:[\d]+)?' ``` --- When collecting data from an application and the evironment, it's common that sensitive information will be retrieved. This could be database connection strings, passwords and other API tokens. Both preflight checks and support bundles contain a built-in and extensible redaction phase. --- Troubleshoot versions earlier than 0.49.0 redact IPv4 addresses automatically. To redact IPv4 addresses in Troubleshoot version 0.49.0 and later, add the following regex to your redactor specification: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Redactor metadata: name: IP Addresses spec: redactors: - name: Redact ipv4 addresses removals: regex: - redactor: '(?P\b(?P25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(?P25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(?P25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(?P25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b)' ``` --- Troubleshoot automatically redacts password environment variables in JSON for the values provided in the `regex` arrays. :::info **Important:** Passwords that do not match the specified regular expressions are not redacted. ::: This redaction is equivalent to the following redact yaml: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Redactor metadata: name: Passwords spec: redactors: - name: Redact values for environment variables with names beginning with 'password' removals: regex: - redactor: '(?i)(\\\"name\\\":\\\"[^\"]*password[^\"]*\\\",\\\"value\\\":\\\")(?P[^\"]*)(\\\")' - name: Redact password environment variables in multiline JSON removals: regex: - selector: '(?i)"name": *".*password[^\"]*"' redactor: '(?i)("value": *")(?P.*[^\"]*)(")' ``` --- Redactors are YAML specifications that define which data to remove when generating a support bundle. ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Redactor metadata: name: my-redactor-name spec: redactors: - name: replace password # names are not used internally, but are useful for recordkeeping fileSelector: file: data/my-password-dump # this targets a single file removals: values: - abc123 # this value is my password, and should never appear in a support bundle - name: all files # as no file is specified, this redactor will run against all files removals: regex: - redactor: (another)(?P.*)(here) # this will replace anything between the strings `another` and `here` with `***HIDDEN***` - selector: 'S3_ENDPOINT' # remove the value in lines following those that contain the string S3_ENDPOINT redactor: '("value": ").*(")' yamlPath: - "abc.xyz.*" # redact all items in the array at key xyz within key abc in yaml documents ``` Each redactor consists of a set of files which it can apply to, a set of string literals to replace, a set of regex replacements to be run, and a list of yaml paths to redact. Any of the four can be omitted. This is divided into two sub-objects - `fileSelector` (containing `file` or `files`) and `removals` (containing `values`, `regex` and/or `yamlPath`). `fileSelector` determines what files the redactor applies to, and `removals` determines what it removes. ### `file` and `files` If a `file` or set of `files` are specified, then the redactor will only be applied to files matching those. Globbing is used to match files. For instance, `/my/test/glob/*` will match `/my/test/glob/file` but will not match `/my/test/glob/subdir/file`. If neither `file` or `files` are specified, then the redactor will be applied to all files. ### `values` All entries in `values` will be replaced with the string `***HIDDEN***`. ### `regex` Regex allows applying a regex to lines following a line that matches a filter. `selector` is used to identify lines, and then `redactor` is run on the next line. If `selector` is empty, the redactor will run on every line. This can be useful for removing values from pretty-printed JSON, among other things. For instance, a `selector` of `S3_ENDPOINT`, when combined with a `redactor` of `("value": ").*(")` and run on the following string removes `this is a secret` while leaving `this is NOT a secret` untouched. ```json { "name": "S3_ENDPOINT", "value": "this is a secret" }, { "name": "ANOTHER_ENDPOINT", "value": "this is NOT a secret" }, ``` Matches to entries in `regex` will be removed or redacted depending on how the regex is constructed. Any portion of a match not contained within a capturing group will be removed entirely. For instance, the regex `abc(123)`, when applied to the string `test abc123`, will be redacted to `test 123`, because `abc` was matched but not included within a capturing group. The contents of capturing groups tagged `mask` will be masked with `***HIDDEN***`. Thus `(?Pabc)(123)` applied to `test abc123` will become `test ***HIDDEN***123`. Capturing groups tagged `drop` will be dropped, just as if they were not within a capturing group. ### `yamlPath` The yamlPath redactor redacts items within yaml documents. Input is a `.`-delimited path to the items to be redacted. If an item in the path is the literal string `*`, the redactor will apply to all options at that level. For instance, with the following yaml doc: ```yaml abc: a: alpha: bravo charlie: delta c: charlie: delta echo: foxtrot xyz: - xray: yankee zulu: alpha - zulu: alpha bravo: charlie ``` A redactor of `abc.*.charlie` would remove the values for `abc.a.charlie` and `abc.c.charlie`, yielding: ```yaml abc: a: alpha: bravo charlie: '***HIDDEN***' c: charlie: '***HIDDEN***' echo: foxtrot xyz: - xray: yankee zulu: alpha - zulu: alpha bravo: charlie ``` Items within an array can be addressed either with an integer position or the wildcard `*`. `xyz.0.zulu` would only remove one item from the original document - yielding this: ```yaml abc: a: alpha: bravo charlie: delta c: charlie: delta echo: foxtrot xyz: - xray: yankee zulu: '***HIDDEN***' - zulu: alpha bravo: charlie ``` Files that fail to parse as yaml, or that do not contain any matches, will not be modified by this redactor. Files that _do_ contain matches will be re-rendered, which will strip comments and custom formatting. Multi-doc yaml is not yet fully supported. Only the first document is checked for matches, and if a match is found, later documents are discarded entirely. --- Troubleshoot automatically redacts username credential environment variables in JSON. This redaction is equivalent to the following redact yaml: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: Redactor metadata: name: Usernames spec: redactors: - name: Redact values for environment variables with names beginning with 'user' removals: regex: - redactor: '(?i)(\\\"name\\\":\\\"[^\"]*user[^\"]*\\\",\\\"value\\\":\\\")(?P[^\"]*)(\\\")' - name: Redact usernames in multiline JSON removals: regex: - selector: '(?i)"name": *".*user[^\"]*"' redactor: '(?i)("value": *")(?P.*[^\"]*)(")' ``` --- ## Collect a support bundle Now that we have the `kubectl` plugin installed, let's collect a support bundle. A support bundle needs to know what to collect and optionally, what to analyze. This is defined in a YAML file. Open your favorite editor and paste the following content in: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: supportbundle-tutorial spec: collectors: [] analyzers: [] ``` Save the file as `support-bundle.yaml` and then execute it with: ```shell kubectl support-bundle ./support-bundle.yaml ``` The support bundle plugin will work for a few seconds and then show you the filename that it created. Note: This does not deploy anything to the cluster, it's all client-side code. In my case, the file created was named `support-bundle.tar.gz`. You can `tar xzvf` the file and open it in your editor to look at the contents. ## Collect a support bundle using multiple specs > Introduced in Troubleshoot v0.42.0 You may need to collect a support bundle using the collectors and analyzers specified in multiple different specs. As of Troubleshoot `v0.42.0`, you can now pass multiple specs as arguments to the `support-bundle` CLI. Create a support bundle using multiple specs from the filesystem ```shell kubectl support-bundle ./support-bundle-spec-1.yaml ./support-bundle-spec-2.yaml ``` Create a support bundle using a spec from a URL, a file, and from a Kubernetes secret ```shell kubectl support-bundle https://raw.githubusercontent.com/replicatedhq/troubleshoot-specs/main/in-cluster/default.yaml \ ./support-bundle-spec-1.yaml \ secret/path/to/my/spec ``` ## Collect a support bundle using specs discovered from the cluster > Introduced in Troubleshoot v0.47.0 You can also use the `--load-cluster-specs` flag with the `support-bundle` CLI to collect a Support Bundle by automatically discovering Support Bundle and Redactor specs in Secrets and ConfigMaps in the cluster. For more information, see [Discover Cluster Specs](discover-cluster-specs). ### Notes on using multiple specs with runHostCollectorsInPod flag - If one spec has `runHostCollectorsInPod: true` and another does not, the merged spec sets `runHostCollectorsInPod: true` and includes all host collectors from both specs. - When using a spec with a URI pointing to a spec hosted elsewhere, if the target URI spec does not have the `runHostCollectorsInPod` setting, the merged output reflects the default setting of `false` regardless of the original spec's setting. ## Include user-provided metadata > Introduced in Troubleshoot v0.125.0 You can attach arbitrary key-value metadata to a support bundle using the `--metadata` flag. The flag accepts `key=value` pairs and can be specified multiple times: ```shell kubectl support-bundle ./support-bundle.yaml \ --metadata contactEmail=support@example.com \ --metadata ticketID=ISSUE-42 ``` The provided pairs are saved as a JSON map at `metadata/user.json` inside the bundle: ```json { "contactEmail": "support@example.com", "ticketID": "ISSUE-42" } ``` --- > Introduced in Troubleshoot v0.47.0. You can use the `--load-cluster-specs` flag with the `support-bundle` CLI to discover Support Bundle and Redactor specs in Secrets and ConfigMaps in the cluster. This allows you to use the `support-bundle` CLI to automatically discover specs at runtime, rather than manually specifying each spec individually on the command line. For Troubleshoot v0.42.0 and later, you can specify multiple specs on the command line. When you use the `--load-cluster-specs` flag, Troubleshoot applies the specs that you provide on the command line as well as any specs discovered in the cluster. ## Requirements To use the `--load-cluster-specs` flag with the `support-bundle` CLI, there must be an existing Secret or ConfigMap object in the cluster. The Secret and ConfigMap objects in the cluster must meet the following requirements: * The `labels` key must have a matching label of `troubleshoot.sh/kind: support-bundle`. **NOTE**: You can overwrite the expected label with the `-l` or `--selector` flag. For example, `./support-bundle -l troubleshoot.sh/kind=something-else`. * The `data` key in the Secret or ConfigMap object must match `support-bundle-spec` or `redactor-spec`. The following is an example of a ConfigMap with a `troubleshoot.sh/kind: support-bundle` label and a `data` key matching `support-bundle-spec`: ```yaml apiVersion: v1 kind: ConfigMap metadata: labels: troubleshoot.sh/kind: support-bundle name: some-bundle data: support-bundle-spec: | apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: example spec: collectors: - logs: selector: - app=example - component=nginx namespace: default name: app-example-logs limits: maxAge: 720h maxLines: 10000 - runPod: collectorName: "static-hi" podSpec: containers: - name: static-hi image: alpine:3 command: ["echo", "hi static!"] analyzers: - textAnalyze: checkName: Said hi! fileName: /static-hi.log regex: 'hi static' outcomes: - fail: message: Didn't say hi. - pass: message: Said hi! ``` ## Usage Generate a Support Bundle with specs found in the cluster: `./support-bundle --load-cluster-specs` Generate a Support Bundle with a spec from a CLI argument as well as the specs discovered in the cluster: `./support-bundle https://raw.githubusercontent.com/replicatedhq/troubleshoot/main/sample-troubleshoot.yaml --load-cluster-specs` Generate a Support Bundle with specs found in the cluster matching a custom label: `./support-bundle --load-cluster-specs -l troubleshoot.sh/kind=something-else` --- This tutorial will walk you through defining a Support Bundle that your customer can execute when something isn't working quite right. A support bundle will collect data from the cluster, redact sensitive fields, and then perform analysis on the data to provide remediation steps. ## Goals By completing this tutorial, you will know how to write a Support Bundle, including: 1. How to collect data 2. How to analyze collected data 3. How to generate a support bundle from a cluster. ## Prerequisites Before starting this tutorial, you should have the following: 1. The Troubleshoot plugins [installed](/docs/#installation). 2. A Kubernetes cluster and local kubectl access to the cluster. If you don't have one for testing, consider [k0s](https://k0sproject.io/), [kURL](https://kurl.sh), [KiND](https://github.com/kubernetes-sigs/kind), or [K3S](https://k3s.io). --- An OpenAPI Schema for this type is published at: [https://github.com/replicatedhq/kots-lint/blob/main/kubernetes_json_schema/schema/troubleshoot/supportbundle-troubleshoot-v1beta2.json](https://github.com/replicatedhq/kots-lint/blob/main/kubernetes_json_schema/schema/troubleshoot/supportbundle-troubleshoot-v1beta2.json). ## SupportBundle Schema ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: supportbundle spec: runHostCollectorsInPod: true # default is false collectors: [] hostCollectors: [] analyzers: [] hostAnalyzers: [] uri: "" ``` ## Properties ### `runHostCollectorsInPod` Default is `false`. If set to `true`, the `hostCollectors` will be run in a privileged pod. This is useful for collecting host information across a group of nodes in a cluster. This will reduce the number of support bundles that need to be collected to get a complete picture of the cluster nodes. ### `collectors` Optional. A list of [`collectors`](https://troubleshoot.sh/docs/collect/). Returns information collected from the current `kubectl` context. ### `analyzers` Optional. A list of [`analyzers`](https://troubleshoot.sh/docs/analyze/). Returns information collected from the current `kubectl` context. ### `hostcollectors` Optional. A list of [`hostcollector`](https://troubleshoot.sh/docs/host-collect-analyze/overview/) properties. Returns information from the host where the `support-bundle` or `collect` binary is executed. ### `hostanalyzers` Optional. A list of [`hostanalyzer`](https://troubleshoot.sh/docs/host-collect-analyze/overview/) properties. Returns information from the host where the `support-bundle` or `collect` binary is executed. ### `uri` Optional. A string containing a URI for a support bundle spec YAML file, in `http://` or `https://` protocols. **Usage**: if a `uri` is set in a support bundle spec, Troubleshoot will attempt to download that resource and if it is retrieved, it entirely replaces the contents of the given spec. Example, given the following spec: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle name: supportbundle spec: uri: hhttps://raw.githubusercontent.com/replicatedhq/troubleshoot-specs/main/in-cluster/default.yaml collectors: - cluster-info: {} - cluster-resources: {} ``` Troubleshoot will attempt to retrieve https://raw.githubusercontent.com/replicatedhq/troubleshoot-specs/main/in-cluster/default.yaml and will use that spec in its entirety: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: default spec: collectors: - clusterInfo: {} - clusterResources: {} ... ``` If Troubleshoot is unable to retrieve that file, or if the upstream file fails to parse as valid YAML, then it will fall back to what was given in the original spec: ```yaml spec: # uri: https://raw.githubusercontent.com/replicatedhq/troubleshoot-specs/main/in-cluster/default.yaml collectors: - cluster-info: {} - cluster-resources: {} ``` ---