Spring Boot 3 Observability with Grafana Stack

In this blog post - Spring Boot 3 Observability with Grafana Stack, we will learn how to implement Observability in our Spring Boot applications using Grafana Stack which comprises Grafana, Loki, and Tempo.

What is Observability?

In a nutshell, Observability is the process of understanding the internal state of the application with the help of different indicators such as Logs, Metrics, and Tracing information.

For a more detailed explanation, have a look at this article.

We will see how to implement Observability for a sample loan processing system built with Spring Boot 3 using the Grafana Stack.

Grafana Stack

Grafana Stack comprises about 3 softwares:

Grafana: This is the most widely used tool that helps to monitor and visualize the metrics of our application. Users can visualize the metrics by building different dashboards and can use different kinds of charts to visualize the metrics. We can also configure alerts to be notified whenever a metric reaches a certain required threshold.

To collect metrics, we will be using Prometheus, a metrics aggregation tool.
Loki: is a Log Aggregation tool that receives the logs from our application and indexes the logs to be visualized using Grafana.
Tempo: is used as a distributed tracing tool, which can track requests that span across different systems.

Implementing Observability

The below picture shows you a high-level overview of our project and how tools like Grafana, Loki, and Tempo fit into our overall architecture.

We have a loan-service that is responsible for accepting requests for loans and this request is validated against a fraud-service that verifies if the applicant is on the fraud list.

You can find the source code of this application at - https://github.com/SaiUpadhyayula/springboot3-observablity

This tutorial will only concentrate on implementing the observability aspects of the application, the initial working version of the application can be found in the branch - start-here.

The application is built as a maven multi-module project, where loan-service and fraud-service are created as maven modules.

Logging

Let's start with implementing logging in our application. To send our application logs to Loki, we have to add the below dependency to the pom.xml of both loan-service and fraud-service.

    <dependency>
        <groupId>com.github.loki4j</groupId>
        <artifactId>loki-logback-appender</artifactId>
        <version>1.3.2</version>
    </dependency>

The loki-logback-appender adds the necessary integration with Loki with the help of the Logback logging library.

Next, we have to define a logback-spring.xml file inside the src/main/resources which contains necessary information about how to structure our logs and where to send the logs (in other words it contains the information about Loki URL).

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <include resource="org/springframework/boot/logging/logback/base.xml"/>
    <springProperty scope="context" name="appName" source="spring.application.name"/>

    <appender name="LOKI" class="com.github.loki4j.logback.Loki4jAppender">
        <http>
            <url>http://localhost:3100/loki/api/v1/push</url>
        </http>
        <format>
            <label>
                <pattern>application=${appName},host=${HOSTNAME},level=%level</pattern>
            </label>
            <message>
                <pattern>${FILE_LOG_PATTERN}</pattern>
            </message>
            <sortByTime>true</sortByTime>
        </format>
    </appender>

    <root level="INFO">
        <appender-ref ref="LOKI"/>
    </root>
</configuration>

The <appender> defines the Loki4JAppender, which contains the reference to the Loki url under the <url> tag. It also defines the log pattern using the <pattern> tag which is defined as application=${app.name}, host=${HOSTNAME}, level=%level, where we display the application name which is defined in the <springProperty> tag, host, and the log level, which is defined as INFO under the <root> tag.

That's all we need to do to implement logging using Loki. You can download and run Loki on your machine using Docker. In the sample project, I am using docker-compose, add the below Loki configuration in the docker-compose.yml file:

loki:
  image: grafana/loki:main
  command: ['-config.file=/etc/loki/local-config.yaml']
  ports:
    - '3100:3100'

Now let's see how to implement Metrics using Prometheus and Grafana.

Metrics

Metrics can be any kind of measurable information about our application like JVM statistics, Thread Count, Heap Memory information, etc. To collect metrics of our application, we need to first enable Spring Boot Actuator in our project by adding the below dependency:

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-actuator</artifactId>
		</dependency>

Next, we have to add another dependency to expose the metrics of our application, Spring Boot uses Micrometer to collect metrics, and by adding the below dependency we can configure Micrometer to expose an endpoint that can be scraped by Prometheus.

		<dependency>
			<groupId>io.micrometer</groupId>
			<artifactId>micrometer-registry-prometheus</artifactId>
			<scope>runtime</scope>
		</dependency>

To see different metrics exposed by Spring Boot you can refer to this link from Spring Boot documentation - https://docs.spring.io/spring-boot/docs/current/reference/html/actuator.html#actuator.metrics.supported

The next step is to add some properties to our application.yml file.

management.endpoints.web.exposure.include=health, info, metrics, prometheus
management.metrics.distribution.percentiles-histogram.http.server.requests=true
management.observations.key-values.application=loan-service

The property - management.endpoints.web.exposure.include=health, info, metrics, prometheus exposes the endpoints health, info, metrics, and prometheus through the actuator.

Next, we are defining a property called management.metrics.distribution.percentiles-histogram.http.server.requests=true which is used by the micrometer to gather the metrics in the form of a histogram and send it to Prometheus. You can read more about this concept here - https://micrometer.io/docs/concepts#_histograms_and_percentiles.

After adding the above properties run both applications and open the URL - http://localhost:8080/actuator/prometheus to see different metrics that are exposed by the micrometer.

You can run Prometheus by adding the below entry in the docker-compose.yml file

prometheus:
  image: prom/prometheus:v2.46.0
  command:
    - --enable-feature=exemplar-storage
    - --config.file=/etc/prometheus/prometheus.yml
  volumes:
    - ./docker/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
  ports:
    - '9090:9090'

We need a configuration file, to tell Prometheus where it can find the necessary metrics to scrape. For that, we need to create a file called prometheus.yml with the following content.

global:
  scrape_interval: 2s
  evaluation_interval: 2s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus:9090']
  - job_name: 'loan-service'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['host.docker.internal:8080'] ## only for demo purposes don't use host.docker.internal in production
  - job_name: 'fraud-detection'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['host.docker.internal:8081'] ## only for demo purposes don't use host.docker.internal in production

Under the global field, we defined the scrape and evaluation interval as 2s. In the scrape_configs section, we have 3 jobs, one for prometheus, loan-service, and fraud-detection service. Notice that to scrape the loan-service and fraud-detection services we defined the URL of both the services and the metrics path as - /actuator/prometheus

Tracing

Now let's go ahead and implement Distributed Tracing using Tempo. For that, we need to add some more dependencies.

Prior to Spring Boot 3, we used to add the Spring Cloud Sleuth dependency to add distributed tracing capabilities to our application, but from Spring Boot 3, Spring Cloud Sleuth is no longer needed and this is replaced by the Micrometer Tracing Project. To add the support, add the below dependencies:

		<dependency>
			<groupId>io.micrometer</groupId>
			<artifactId>micrometer-tracing-bridge-brave</artifactId>
		</dependency>
		<dependency>
			<groupId>io.zipkin.reporter2</groupId>
			<artifactId>zipkin-reporter-brave</artifactId>
		</dependency>

micrometer-tracing-bridge-brave is the dependency that does all the magic and adds distributed tracing for our application. Whereas zipkin-reporter-brave will exportthe tracing information to Tempo.

NOTE: You can also use other tracing implementation like OpenTelemetry - micrometer-tracing-bridge-otel dependency instead of Brave - micrometer-tracing-bridge-brave

If you want to trace the calls to the database, as we are using Spring Data JDBC, we can add the dependency datasource-micrometer-spring-boot dependency.

		<dependency>
			<groupId>net.ttddyy.observation</groupId>
			<artifactId>datasource-micrometer-spring-boot</artifactId>
			<version>1.0.1</version>
		</dependency>

As we are using a RestTemplate to call fraud-detection service from loan-service , the traceId and spanId are generated and propagated automatically.

But if you want to create manual tracing for specific calls you can use the Observation API and the @Observed annotation.

For example, as we wanted to trace the calls to the database, we can do that by adding the @Observed annotation on the LoanRepository interface.

@Repository
@RequiredArgsConstructor
@Observed
public class LoanRepository {

    private final JdbcClient jdbcClient;

    .....
    .....
    .....
}

Next, we need to define a bean of type `ObservedAspect` we can do that by creating a class called ObservationConfig.java

package com.programming.techie.loans.config;

import io.micrometer.observation.ObservationRegistry;
import io.micrometer.observation.aop.ObservedAspect;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ObservationConfig {
    @Bean
    ObservedAspect observedAspect(ObservationRegistry registry) {
        return new ObservedAspect(registry);
    }
}

Finally, to enable the Aspect Oriented Programming, we need to add the spring-boot-starter-aop dependency.

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-aop</artifactId>
		</dependency>

Micrometer Tracing will only send 10% of the traces it generates to Tempo, just to avoid overwhelming it with a lot of requests. We can set it to 100% by adding the below property to our application.yml file

management.tracing.sampling.probability=1.0

Finally, you can run Tempo using docker, by adding the below piece of code inside the docker-compose.yml file:

tempo:
  image: grafana/tempo:2.2.2
  command: ['-config.file=/etc/tempo.yaml']
  volumes:
    - ./docker/tempo/tempo.yml:/etc/tempo.yaml:ro
    - ./docker/tempo/tempo-data:/tmp/tempo
  ports:
    - '3110:3100' # Tempo
    - '9411:9411' # zipkin

Finally, we need to configure a file called tempo.yml file to store the necessary settings to be used in Tempo. I created this file under the docker folder

server:
  http_listen_port: 3200

distributor:
  receivers:
    zipkin:

storage:
  trace:
    backend: local
    local:
      path: /tmp/tempo/blocks

You can observe that we are referring to this file inside the docker-compose service, and we are mounting this file into the /etc/ location of the container.

Running Grafana

Before testing our implementation, let's also see how to run Grafana using Docker. After all, this is what brings all the services like Tempo, Loki, and Prometheus together and visualizes the information produced by our services.

grafana:
  image: grafana/grafana:10.1.0
  volumes:
    - ./docker/grafana:/etc/grafana/provisioning/datasources:ro
  environment:
    - GF_AUTH_ANONYMOUS_ENABLED=true
    - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    - GF_AUTH_DISABLE_LOGIN_FORM=true
  ports:
    - '3000:3000'

The above configuration will run Grafana by disabling the login and authentication, do not use this configuration in Production.

Also for Grafana, we need to define the data sources from which it needs to gather the information to visualize, for that let's create a file called datasources.yml

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    editable: false
    jsonData:
      httpMethod: POST
      exemplarTraceIdDestinations:
        - name: trace_id
          datasourceUid: tempo
  - name: Tempo
    type: tempo
    access: proxy
    orgId: 1
    url: http://tempo:3200
    basicAuth: false
    isDefault: true
    version: 1
    editable: false
    apiVersion: 1
    uid: tempo
    jsonData:
      httpMethod: GET
      tracesToLogs:
        datasourceUid: 'loki'
      nodeGraph:
        enabled: true
  - name: Loki
    type: loki
    uid: loki
    access: proxy
    orgId: 1
    url: http://loki:3100
    basicAuth: false
    isDefault: false
    version: 1
    editable: false
    apiVersion: 1
    jsonData:
      derivedFields:
        - datasourceUid: tempo
            matcherRegex: \[.+,(.+?),
            name: TraceID
            url: $${__value.raw}

This file defines all the data sources like Prometheus, Loki, and Tempo and references to the respective URLs.

Testing

Okay, now it's Testing Time.

Start all the services by running the command:

docker compose up -d

Also, run both the loan-service and fraud-detection services.

After you make some calls to GET/loan and POST/loan, let's first open Loki and check for logs.

Open the URL - http://localhost:3000
Click on the toggle menu and click on 'Explore'
Under the dropdown select - 'Loki' and run the query with your desired parameters, e.g.: select the application label as - loan-service.

Now let's open Prometheus, and apply the same filter, you should see the results below:

Note down the traceId from the logs that are generated by the GET/loan (or) POST/loan calls.

Now open Tempo, go to the Query Type - TraceQL, paste the traceId, and press Shift-Enter.
You should see the tracing information of that particular request.

You can observe from the below image that the fraud-detection service also displays the calls made to the database, thanks to the datasource-micrometer-spring-boot dependency we added before.

Conclusion

Observability plays a vital role in ensuring that our applications are running as expected and provides us insights into the inner state of the application.

You can find the complete source code of this application on Github - https://github.com/SaiUpadhyayula/springboot3-observablity