se.health.adoc Maven / Gradle / Ivy

Show more of this group Show more artifacts with this name
Show all versions of helidon-docs
The newest version!
///////////////////////////////////////////////////////////////////////////////

    Copyright (c) 2019, 2024 Oracle and/or its affiliates.

    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.

///////////////////////////////////////////////////////////////////////////////

= Health Checks
:description: Helidon health checks
:keywords: helidon, health checks, health, check, readiness, liveness, probes, kubernetes
:feature-name: Health Checks
:rootdir: {docdir}/..

include::{rootdir}/includes/se.adoc[]

== Contents

- <>
- <>
- <>
** <>
** <>
** <>
** <>
- <>
- <>
** <>
** <>
- <>

== Overview

It’s a good practice to monitor your microservice’s health to ensure that it is
available and performs correctly.
Applications implement health checks to expose health status that is collected
at regular intervals by external tooling, such as orchestrators like
Kubernetes. The orchestrator may then take action, such as restarting your
application if the health check fails.

A typical health check combines the statuses of all the dependencies that
affect availability and the ability to perform correctly:

* Network Latency
* Storage
* Database
* Other Services (used by your application)

include::{rootdir}/includes/dependencies.adoc[]

[source,xml]
----

    io.helidon.webserver.observe
    helidon-webserver-observe-health

----

Optional dependency to use built-in health checks:

[source,xml]
----

    io.helidon.health
    helidon-health-checks

----

== API

=== Enabling Health Support (and Built-in Health Checks) in Your Application
The health subsystem is part of the observability support. As a result, your application includes health support by default provided your project meets several conditions:

* Your project depends on the `helidon-webserver-observe-health` component as described above.
* (Optional) Your project depends on the `helidon-health-checks` component (if you want the built-in health checks).
* Your code allows the webserver's automatic feature discovery (enabled by default).
* Your code allows the observe feature's automatic observer discovery (also enabled by default).

If you disable either type of automatic discovery you can add the observe feature to the webserver explicitly, and you can add the health observer to the observe feature explicitly, customizing the behavior of each programmatically if you wish.
You can also use configuration to tailor some of the behavior of the health component (such as changing the URI path from `/observe/health` to something else).

=== Writing Custom Health Checks
In many cases, the ability of your application to do its job depends on conditions known only to your application: for example, whether certain external resources such as databases are available.
You can create custom health checks which reflect those conditions and add them to the overall health assessment of your application.

A health check is a Java functional interface that returns a new
`HealthCheckResponse` instance each time Helidon queries the health check.
Each health check also has a fixed name and a fixed health check type (start-up, liveness, or readiness).

Your code registers a custom health check by invoking a method on Helidon-provided types in one of the following ways:

* Pass the name and type of the health check and a `Supplier` of a `HealthCheckResponse` such as a method reference or a lambda expression.
* Pass an instance of a class which implements the `HealthCheck` interface.

Within an application different techniques might make sense for different custom health checks, depending on the complexity of the logic for computing the status for each check.
The various styles are functionally equivalent; for a given custom health check choose the style which enhances the readability and clarity of your code.
The examples below, in no particular order, implement the same custom health check functionality in different ways to illustrate.

==== Option 1: Using a `HealthCheckResponse` supplier method
If you gather the logic for computing the health check response into a method, then you can use a method reference to register the health check.

[source,java]
.Declaring a health check response supplier method
----
include::{sourcedir}/se/HealthSnippets.java[tag=snippet_1, indent=0]
----

[source,java]
.Registering a health check using a method reference
----
include::{sourcedir}/se/HealthSnippets.java[tag=snippet_2, indent=0]
----
<1> Apply configuration to auto-discovered observers (e.g., health, metrics).
<2> Augment the web server by adding the `ObserveFeature` containing the `HealthObserver`. This replaces the auto-discovered health observer.
<3> Include the Helidon-supplied health checks.
<4> Add the custom health check, passing a reference to the method which returns the health check responses.
<5> Set the type of the custom health check.
<6> Set the name of the custom health check.

==== Option 2: Using an in-line lambda expression
If the logic for computing the health check response is fairly simple, express it as an in-line lambda when you register the health check.

[source,java]
.Registering a health check using an in-line lambda expression
----
include::{sourcedir}/se/HealthSnippets.java[tag=snippet_3, indent=0]
----
<1> Augment the web server by adding the `ObserveFeature` containing the `HealthObserver`.
<2> Add the custom health check passing a lambda expression supplying the health check response.
<3> In the lambda, set the health check response status.
<4> Still in the lambda, set a detail associated with the health check response.
<5> Still in the lambda, build the health check response.
<6> Set the type of the custom health check.
<7> Set the name of the custom health check.

Note that the logic in the lambda expression runs every time Helidon probes the added health check, so the values passed to `status` and `detail` are recomputed every time.

==== Option 3: Using a `HealthCheck` Instance
If a custom health check requires a lot of information to compute its health check response, it might be clearest to implement it as a class that implements the `HealthCheck` interface.
Your code instantiates the class with all the information, including references to other data, it might need to compute the response each time Helidon probes it.

This example _is not_ complicated in that way, but it's useful to illustrate this technique of writing a custom health check.

[source,java]
.Declaring a concrete `HealthCheck` implementation
----
include::{sourcedir}/se/HealthSnippets.java[tag=snippet_4, indent=0]
----
<1> Implement the `io.helidon.health.HealthCheck` interface. The default health check name is the simple class name of the implementing class. Your code can override the `name()` method to return a different name. (Not shown in this example)
<2> The default health check type is `LIVENESS` so this implementation overrides `type()` to declare a `READINESS` check.
<3> Sets a detail value `time` associated with the response to the current time.
<4> Reports `DOWN` until at least eight seconds have passed since the server start-up, then reports `UP` thereafter.

[source,java]
.Registering a `HealthCheck` instance
----
include::{sourcedir}/se/HealthSnippets.java[tag=snippet_5, indent=0]
----
<1> Augment the web server by adding the `ObserveFeature` containing the `HealthObserver`.
<2> Instantiate the custom health check class and add the instance to the `HealthObserver`.

==== Adding Observability (including the Custom Health Checks) to Helidon
The code examples above prepare the `observe` feature instance using the built-in and custom health checks.
To activate the health subsystem and other auto-discovered observability subsystems, add that `observe` instance as a feature to the webserver and start the server.

[source,java]
.Register the observe feature with the server and start it
----
include::{sourcedir}/se/HealthSnippets.java[tag=snippet_6, indent=0]
----
<1> Add the previously-prepared health observer to the server as a feature

==== Triggering and Interpreting Health Check Output

Health support in Helidon is part of the observability feature.
`HealthObserver` is a Helidon-provided observability implementation that contains a collection of
registered `HealthCheck` instances and, when queried, invokes the registered
health checks and returns a response with a status code representing the overall
status of the application.

[cols="1,5",role="flex, sm7"]
.Health status codes
|=======
| `200` | The application is healthy (with health check details in the response).
| `204` | The application is healthy (with _no_ health check details in the response).
| `503` | The application is not healthy.
| `500` | An error occurred while reporting the health.
|=======

You control, either using configuration or adding code to your application, whether the HTTP responses to `GET` requests contain detailed information about each health check.
With details enabled, HTTP `GET` responses include JSON content showing the detailed results of all the health checks which the server executed after receiving the request.
With details disabled, HTTP `GET` responses have no payload.
HTTP `HEAD` requests always return only the status with no payload.

If you add the Helidon health dependency to your `pom.xml` file, Helidon automatically registers the `HelidonObserver` service and responds to the default `/observe/health` endpoint.
Further, if you add the built-in health checks dependency, Helidon automatically finds them and adds those checks to the `HealthObserver`.

Below are parts of health responses which include the custom health check added in the earlier example code.
This first response shows the health output within the first eight seconds after start-up. Recall that the custom health check will report `DOWN` during that time, so the overall health is `DOWN` and the HTTP response status is `503 Service Unavailable`.

[source,json]
.Response within 8 seconds: HTTP status 503 (not healthy)
----
{
  "status": "DOWN",
  "checks": [
    {
      "name": "live-after-8-seconds",
      "status": "DOWN",
      "data": {
        "time": 1701984253071
      }
    }
  ]
}
----

The next response shows the health output once the server has been running for at least eight seconds. The custom health check now reports `UP` so the overall health status is also `UP` now and the HTTP status is `200`.

[source,json]
.Response after 8 seconds: HTTP status 200
----
{
  "status": "UP",
  "checks": [
    {
      "name": "live-after-8-seconds",
      "status": "UP",
      "data": {
        "time": 1701984258292
      }
    }
  ]
}
----

TIP: Balance collecting a lot of information with the need to avoid overloading
the application and overwhelming users.

The following table provides a summary of the Health Check API classes.

[cols="4,6"]
.Health check API classes
|=======
| `io.helidon.health.HealthCheck`
| Java functional interface representing the logic of a single health check

| `io.helidon.health.HealthCheckResponse`
| Result of a health check invocation that contains a status

| `io.helidon.webserver.observe.health.HealthObserver`
| WebServer service that exposes `/observe/health` and invokes the registered health
checks
|=======

[[built-in-health-checks-table]]
=== Built-In Health Checks

You can use Helidon-provided health checks to report various
common health check statuses:

// Had to move the anchor to the heading above because the rendered page did not define the ID
// correctly on the table so the link did not work. The link itself looks OK; just no ID generated on the page.
//[[built-in-health-checks-table]]
[cols="1,1,3,15,3"]
|=======
|Built-in health check |Health check name |JavaDoc |Config properties (within `server.features.observe.observers.health`) |Default config value

|deadlock detection
|`deadlock`
| link:{health-javadoc-base-url}/io/helidon/health/checks/DeadlockHealthCheck.html[`DeadlockHealthCheck`]
| n/a
| n/a

|available disk space
|`diskSpace`
| link:{health-javadoc-base-url}/io/helidon/health/checks/DiskSpaceHealthCheck.html[`DiskSpaceHealthCheck`]
|`helidon.health.diskSpace.thresholdPercent` +
+
`helidon.health.diskSpace.path`
| `99.999` +
+
`/`
|available heap memory
| `heapMemory`
| link:{health-javadoc-base-url}/io/helidon/health/checks/HeapMemoryHealthCheck.html[`HeapMemoryHealthCheck`]
|`helidon.health.heapMemory.thresholdPercent`
|`98`
|=======

Simply adding the built-in health check dependency is sufficient to register all the built-in health checks automatically.
If you want to use only some of the built-in checks in your application, you can disable automatic discovery of the built-in health checks and register only the ones you want.

The following code adds only selected built-in health checks to your application:

[source,java]
.Adding selected built-in health checks
----
include::{sourcedir}/se/HealthSnippets.java[tag=snippet_7, indent=0]
----
<1> Disables automatic registration of the built-in health checks.
<2> Adds the specific built-in check(s) you want.
<3> Adds a custom check (in a previously-prepared variable `hc`).

You can control the thresholds for built-in health checks in either of two ways:

* Create the health checks individually
using their builders instead of using the `HealthChecks` convenience class.
Follow the JavaDoc links in the <> above.

* Using configuration as explained in <>.

=== Kubernetes Probes
* <>
* <>
* <>


Probes is the term used by Kubernetes to describe health checks for containers
(link:https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes[Kubernetes documentation]).

There are three types of probes:

* _liveness_: Indicates whether the container is running
* _readiness_: Indicates whether the container is ready to service requests
* _startup_: Indicates whether the application in the container has started

You can implement probes using the following mechanisms:

. Running a command inside a container
. Sending an `HTTP` request to a container
. Opening a `TCP` socket to a container

A microservice exposed to HTTP traffic will typically implement both the
liveness probe and the readiness probe using HTTP requests.
If the microservice takes a significant time to initialize itself, you can also define a startup probe, in which case
Kubernetes does not check liveness or readiness probes until the startup probe returns success.

You can configure several parameters for probes. The following are the most
relevant parameters:

[cols="2,5",role="flex, sm7"]
|=======
| `initialDelaySeconds`
| Number of seconds after the container has started before liveness or readiness
probes are initiated.

| `periodSeconds`
| Probe interval. Default to 10 seconds. Minimum value is 1.

| `timeoutSeconds`
| Number of seconds after which the probe times out. Defaults to 1 second.
Minimum value is 1

| `failureThreshold`
| Number of consecutive failures after which the probe should stop. Default: 3.
Minimum: 1.
|=======

==== Liveness Probe

The liveness probe is used to verify the container has become unresponsive.
For example, it can be used to detect deadlocks or analyze heap usage. When
Kubernetes gives up on a liveness probe, the corresponding pod is restarted.

NOTE: The liveness probe can result in repeated restarts in certain cases.
For example, if the probe is implemented to check all the dependencies
strictly, then it can fail repeatedly for temporary issues. Repeated restarts
can also occur if `timeoutSeconds` or `periodSeconds` is too low.

We recommend the following:

* Avoid checking dependencies in a liveness probe.
* Set `timeoutSeconds` to avoid excessive probe failures.
* Acknowledge startup times with `initialDelaySeconds`.

==== Readiness Probe

The readiness probe is used to avoid routing requests to the pod until it is
ready to accept traffic. When Kubernetes gives up on a readiness probe, the
pod is not restarted, traffic is not routed to the pod anymore.

NOTE: In certain cases, the readiness probe can cause all the pods to be removed
from service routing. For example, if the probe is implemented to check all the
dependencies strictly, then it can fail repeatedly for temporary issues. This
issue can also occur if `timeoutSeconds` or `periodSeconds` is too low.

We recommend the following:

* Be conservative when checking shared dependencies.
* Be aggressive when checking local dependencies.
* Set `failureThreshold` according to `periodSeconds` in order to accommodate
temporary errors.

==== Startup Probe

The startup probe prevents Kubernetes from prematurely checking the other probes if the application takes a long time to start.
Otherwise, Kubernetes might misinterpret a failed liveness or readiness probe and shut down the container when, in fact, the application is still coming up.


=== Troubleshooting Probes

Failed probes are recorded as events associated with their corresponding pods.
The event message contains only the status code.

[source,bash]
.Get the events of a single pod:
----
POD_NAME=$(kubectl get pod -l app=acme -o jsonpath='{.items[0].metadata.name}') # <1>
kubectl get event --field-selector involvedObject.name=${POD_NAME} # <2>
----
<1> Get the effective pod name by filtering pods with the label `app=acme`.
<2> Filter the events for the pod.

TIP: Create log messages in your health check implementation when setting a
`DOWN` status. This will allow you to correlate the cause of a failed probe.

== Configuration

Built-in health checks can be configured using the config property keys
described in this
<>. Further, you can suppress one or more of the built-in
health checks by setting the configuration item
`helidon.health.exclude` to a comma-separated list of the health check names
(from this <>) you want to exclude.

== Examples

=== JSON Response Example

Accessing the Helidon-provided `/observe/health` endpoint reports the health of your application
as shown below:

[source,json]
.JSON response:
----
{
    "status": "UP",
    "checks": [
        {
            "name": "deadlock",
            "status": "UP"
        },
        {
            "name": "diskSpace",
            "status": "UP",
            "data": {
                "free": "211.00 GB",
                "freeBytes": 226563444736,
                "percentFree": "45.31%",
                "total": "465.72 GB",
                "totalBytes": 500068036608
            }
        },
        {
            "name": "heapMemory",
            "status": "UP",
            "data": {
                "free": "215.15 MB",
                "freeBytes": 225600496,
                "max": "3.56 GB",
                "maxBytes": 3817865216,
                "percentFree": "99.17%",
                "total": "245.50 MB",
                "totalBytes": 257425408
            }
        }
    ]
}
----

=== Kubernetes Example

This example shows the usage of the Helidon health API in an application that
implements health endpoints for the liveness and readiness probes. Note that
the application code dissociates the health endpoints from the default routes,
so that the health endpoints are not exposed by the service. An example YAML
specification is also provided for the Kubernetes service and deployment.

[source,java]
.Application code:
----
include::{sourcedir}/se/HealthSnippets.java[tag=snippet_8, indent=0]
----
<1> The health service for the `liveness` probe is exposed at `/health/live`.
<2> Using the built-in health checks for the `liveness` probe.
<3> The health service for the `readiness` probe is exposed at `/health/ready`.
<4> Using a custom health check for a pseudo database that is always `UP`.
<5> Route the `observe` feature exclusively on the `observe` socket.
<6> The default socket uses port 8080 for the default routes.
<7> The default route: returns It works! for any request.
<8> The `observe` socket uses port 8081 for the "/observe" routes.

[source,yaml]
.Kubernetes descriptor:
----
kind: Service
apiVersion: v1
metadata:
  name: acme # <1>
  labels:
    app: acme
spec:
  type: NodePort
  selector:
    app: acme
  ports:
  - port: 8080
    targetPort: 8080
    name: http
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: acme # <2>
spec:
  replicas: 1
  selector:
    matchLabels:
      app: acme
  template:
    metadata:
      name: acme
      labels:
        name: acme
    spec:
      containers:
      - name: acme
        image: acme
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /observe/health/live # <3>
            port: 8081
          initialDelaySeconds: 3 # <4>
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /observe/health/ready # <5>
            port: 8081
          initialDelaySeconds: 10 # <6>
          periodSeconds: 30
          timeoutSeconds: 10
---
----
<1> A service of type `NodePort` that serves the default routes on port `8080`.
<2> A deployment with one replica of a pod.
<3> The HTTP endpoint for the liveness probe.
<4> The liveness probe configuration.
<5> The HTTP endpoint for the readiness probe.
<6> The readiness probe configuration.

== Additional Information

* link:{health-javadoc-base-url}/module-summary.html[Health Checks SE API JavaDocs].