Sentry for Spring Boot, a tutorial

🇬🇧 🇧🇷

Sentry is a performance monitoring and error tracking platform that helps developers identify, diagnose and fix failures in their applications. Its main functionality is to display application errors in an aggregated way, so that the team responsible for the system can have an idea of ​​the errors that most affect the system, as well as the distribution of these errors over time.

Sentry logo

But Sentry also has a distributed tracing functionality, which rivals some of the solutions, such as Jaeger, OpenZipkin and Grafana Tempo. In order to later be able to establish a comparison between these solutions, especially regarding support for distributed trace analysis, I decided to first understand Sentry in more depth, knowing all the potential benefits it can bring.

One motivation for publishing this tutorial was the scarcity of resources about Sentry, including good videos on YouTube. In addition, I noticed some relevant omissions and inaccuracies in the official Sentry documentation.

What we won’t cover here: basic concepts about distributed tracing.

So let’s go!

Scenario

For a good analysis of Sentry’s functionalities, especially in relation to the distributed trace, it is necessary to set up a suitable scenario for such analysis, involving at least two services. We then set up a scenario with services A and B according to the following flow:

The above description is represented below as a sequence diagram:

Sequence diagram illustrating the sequence of steps described above

Figure 1 - Interaction between the services in the demonstration scenario

The source code for these services, already instrumented by Sentry, is available at https://gitlab.com/leonardofl/sentry-lab.

Note: the version of Spring Boot used is 3.4.1.

Initial setup

First, we created two Sentry projects on https://sentry.io: the servico-a project and the servico-b project. When creating a project, Sentry displays a page on how to perform the basic configurations, which are as follows.

In build.gradle:

plugins {
  id "io.sentry.jvm.gradle" version "4.14.1"
}

sentry {
  includeSourceContext = true
  org = "leite-oe"
  projectName = "servico-a"
  authToken = System.getenv("SENTRY_AUTH_TOKEN")
}

In application.properties:

sentry.dsn=https://ac25fa38da889012dc40717e0e32b071@o4508698572029952.ingest.us.sentry.io/450875676046XYZX
sentry.traces-sample-rate=1.0

Place the Sentry.captureException(e) in exception handler:

@ControllerAdvice
@Log4j2
public class ServiceAExceptionHandler {
	
    @ExceptionHandler(RuntimeException.class)
    public ResponseEntity<InternalErrorMessage> runtimeException(RuntimeException e, HttpServletRequest request) {

        log.error("Unexpected error: " + e.getMessage(), e);

        Sentry.captureException(e);

        InternalErrorMessage msg = new InternalErrorMessage();
        return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR.value()).body(msg);
    }

}

To start the service, we also created a script (where the value of SENTRY_AUTH_TOKEN is provided by the project’s initial configuration page):

set -e # stop script if any command fails
clear
export SENTRY_AUTH_TOKEN=sntrys_eyJpYXQiOjE3Mzg2MDgwOTYuNDkzNDMsInVybCI6Imh0dHBzOi8vc2VudHJ5LmlvIXYZX
./gradlew clean build
java -jar build/libs/serviceA-0.0.1-SNAPSHOT.jar

The above steps were applied to both service A and service B.

Inter-service invocation

In order for the trace view to work properly, care must be taken. When using RestTemplate to invoke another service (in this case A calling B), the restTemplate object must have been created from a RestTemplateBuilder provided by the framework. In other words, do not instantiate restTemplate with new! To do this, we created the AppConfig class as a restTemplate factory:

@Configuration
public class AppConfig {

    @Bean
    RestTemplate restTemplate(RestTemplateBuilder builder) {
        return builder.build();
    }
    
}

And where restTemplate is used, just inject it:

@RestController
@Log4j2
public class ServiceAController {

    @Autowired
    private RestTemplate restTemplate; // and then, ust use it
    
    // ....
    
    private String accessAnotherService() {
        String url = "http://localhost:8092/service-b/albuns/titulos/aleatorio";
        try {
            ResponseEntity<String> response = restTemplate.getForEntity(url, String.class);
            return response.getBody();
        } catch (HttpStatusCodeException e) {
            throw new IllegalStateException("Failed to get random album title from service B");
        }
    }
}

This way, the propagation of the trace ID generated by Sentry instrumentation will be performed automatically.

Custom span

To highlight a code snippet in the Sentry trace as a span, we can do the following:

var activeTransaction = Sentry.getSpan();
var span = activeTransaction.startChild("Dormindo", "app.logic"); // "Dormindo" means sleeping

try {
    Thread.sleep(1000);
    span.finish(SpanStatus.OK);
} catch (InterruptedException e) {
    span.finish(SpanStatus.INTERNAL_ERROR);
    throw new IllegalStateException("Nunca deveria acontecer enquanto dormindo.", e);
}

We put the snippet above at the beginning of the processing of Service A.

Database in Sentry

According to the Sentry documentation, its JDBC integration creates a span for each JDBC statement executed. Turning database calls into Sentry spans is fantastic for identifying which database calls are our performance bottlenecks.

To do this, we first changed build.gradle by adding the following dependency:

implementation 'io.sentry:sentry-jdbc:7.18.0'

Next, we also changed application.properties:

#spring.datasource.url=jdbc:postgresql://localhost:58432/sentry-lab?ApplicationName=service-b
#spring.datasource.driver-class-name=org.postgresql.Driver
spring.datasource.url=jdbc:p6spy:postgresql://localhost:58432/sentry-lab?ApplicationName=service-b
spring.datasource.driver-class-name=com.p6spy.engine.spy.P6SpyDriver

Finally, to avoid generating log files locally, we created the file src/main/resources/spy.properties:

modulelist=com.p6spy.engine.spy.P6SpyFactory

This was done basically following the Sentry documentation on JDBC integration, but with some catches: the version of sentry-jdbc compatible with the project’s Spring Boot version was not 8.0.0. To avoid local logs, the documentation suggests two options (creating spy.properties or changing application.properties), but only one of them worked (spy.properties).

Integration with logs

In Sentry, logs are linked to captured errors. This is already done by default. To demonstrate this, we created a log record at the beginning of the execution of each of our service operations, as follows:

import lombok.extern.log4j.Log4j2;

@RestController
@Log4j2
public class ServiceAController {

    // ...

    @GetMapping("/albuns/aleatorio")
    public Album hello() {

        log.info("Something quite interesting seemed to have happened here...");

According to the documentation and tutorials on the Internet, there is an idea that the io.sentry:sentry-logback:7.18.0 dependency would bring more details to the breadcrumb that makes up the details of an error in Sentry. But at least in our case, we did not observe any apparent effect, so we chose not to use this dependency.

Results in Sentry

View of captured errors

Below we have the view of Sentry issues:

Sentry issues screen showing different errors that occurred in Service A

Figure 2 - Sentry issues screen

On this screen (Figure 2), we can see:

  1. The different errors occurring in our system.
  2. The quantity of these errors. In addition, it is also possible to see whether a given error is affecting many users or just a few. These questions are important to help prioritize the treatment of these errors.
  3. It is possible to filter the errors by module (service); this is important because it helps each team to focus on their problems.

For each occurrence of each error, we have a screen with the error details:

Sentry issue screen showing details of an error

Figure 3 - Sentry issue detail screen

In this same view, we also have a breadcrumb that displays the logs that occurred during the execution that generated the error:

Breadcrumb panel showing logs

Figure 4 - Breadcrumb panel, composing the issue screen

An observation: the error is associated with a module, so the stack trace and the breadcrumb will only mention events within the module. In other words, for an error associated with service A, the stack trace will not show the error lines from service B and the breadcrumb will not show the logs recorded by service B.

Distributed trace view

And this is the amazing distributed trace view that Sentry provides us:

Distributed trace view showing several spans organized as lines in a stack of spans

Figure 5 - Distributed trace view in Sentry

In this view (Figure 5), we see the spans that make up the trace, and the spans of all services involved in that distributed transaction (identified by a trace ID) are displayed. In addition, we can see that:

  1. We can select the different spans that make up the trace.
  2. The interface displays the error associated with that span.
  3. The interface displays the module (service) in which the selected span was executed.
  4. We can also see the custom spans.
  5. We have the execution time of each span. In the case of the “dormindo” span, which was a custom span, we have a time of just over 1 second, which was expected since in this span we executed the command Thread.sleep(1000);.
  6. And we also have the spans of accesses to the database, including the SQL query performed.

Finding a trace in Sentry by trace ID

Given a trace ID (e.g.: eb30841f72144188abc3cea2592265db), we can find its corresponding trace in Sentry using the filter trace:eb30841f72144188abc3cea2592265db in the search bar of some screens, such as the issues screen and the performance screen. It was supposed to work on the traces screen, but it didn’t.

One project per service vs. one project for all services

In this tutorial, we used the configuration of one Sentry project per service. As we can see from the results obtained (Figures 2 to 5), in this option we have an integrated view of the distributed trace. At the same time, it is very easy to observe only the errors of a specific service. Furthermore, in the distributed trace view we have clarification about which service each span refers to. In other words, excellent.

But we also tried the strategy of using a single Sentry project for different services. However, this strategy presented drawbacks and no particular advantages (except for greater ease of configuration). In this option, in some contexts there is no practical way to filter items by module or identify the module associated with an item. In other words, in a situation where different teams take care of different modules, it will probably be difficult to “let me see only the things from my team here”.

Therefore, we recommend configuring a Sentry project per service (i.e., system module).

Sentry trace ID in logs and in internal error message

When a request is made to an “edge service” of our system, Sentry instrumentation generates a trace ID for that distributed transaction. From that point on, this trace ID is automatically propagated through the system services. However, using this trace ID is more effective if we do two more things: 1) print this trace ID in each log record and 2) return this trace ID in internal error messages (so that tickets can be opened with the client providing us with the trace ID).

To put the trace ID in the log, we first capture the trace ID from Sentry and make it available to the log in the “MDC context”:

@Component
@Order(1)
public class TraceIdFilter implements Filter {

    public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain chain)
            throws IOException, ServletException {

        String traceId = Sentry.getSpan().getSpanContext().getTraceId().toString();
        TraceIdManager.configurarTraceId(traceId);

        chain.doFilter(servletRequest, servletResponse);
    }
}

public class TraceIdManager {

    private static final String MDC_KEY = "traceId";

    public static void configurarTraceId(String traceId) {
        if (StringUtils.isNotBlank(traceId)) {
            MDC.put(MDC_KEY, traceId);
        }
    }

    public static String getTraceId() {
        return MDC.get(MDC_KEY);
    }
}

Now, for the trace ID to appear in the log, we have to configure the log in application.properties:

logging.pattern.console=%d{yyyy-MM-dd HH:mm:ss} %-5level %logger Trace ID: %X{traceId} - %msg%n

And for the trace ID to also be returned in the internal error message:

@Data
public class InternalErrorMessage {

    private final String title = "Internal error";
    private final LocalDateTime dateHour = LocalDateTime.now();
    private final String traceId = TraceIdManager.getTraceId();
    private final String error = "An unexpected error occurred at " + dateTime
        + ", please try again later. If the error persists, contact your system administrator and provide the trace ID: "
        + traceId + "."; 
}

So, when an internal error occurs in service B, we see the trace ID in the internal error message we receive from Service A, as well as in the logs of both services:

Screens showing the trace ID in the return of a HTTP request with internal error, in the logs recording these errors and in the Sentry screen

Figure 6 - Sentry Trace ID returned in the internal error message and appearing in the logs

Resilience and performance concerns

Consider the Sentry.captureException(e) command in the exception handler. At this point, the Sentry client (in our application) sends an event to the Sentry server. What happens if this communication is slow? Will it slow down our application? What happens if this communication fails? Will events be lost? In short, how does Sentry handle performance and resilience issues?

Well, Sentry has mechanisms to deal with these issues:

There are some Sentry client settings that influence the behavior of the mechanisms listed above. We thought it would be a good idea to customize the following (application.properties):

sentry.max-queue-size=50 # default 30
sentry.read-timeout-millis=1000 # default 5000
sentry.max-cache-items=0 # default 30

The image below shows the settings used:

Terminal output displaying properties such as "MaxQueueSize: 50 (default = 30)", "ReadTimeoutMillis: 1000 (default = 5000)" and "maxCacheItems: 0 (default = 30)"

Figure 7 - Sentry properties related to resilience and performance

To print these properties:

public class SentryResilieConfiguration {

    public static void print() {
        System.out.println("=================");
        System.out.println("Sentry settings about resilience or that may affect application performance:"); 
        SentryOptions options = Sentry.getCurrentHub().getOptions();
        System.out.println("SampleRate: " + options.getSampleRate());
        System.out.println("MaxBreadcrumbs: " + options.getMaxBreadcrumbs());
        System.out.println("AttachStacktrace: " + options.isAttachStacktrace());
        System.out.println("ShutdownTimeoutMillis: " + options.getShutdownTimeoutMillis());
        System.out.println("FlushTimeoutMillis: " + options.getFlushTimeoutMillis());
        System.out.println("Agora as mais importantes:");
        System.out.println("CacheDirPath: " + options.getCacheDirPath());
        System.out.println("MaxCacheItems: " + options.getMaxCacheItems() + " (default = 30)");
        System.out.println("MaxQueueSize: " + options.getMaxQueueSize() + " (default = 30)");
        System.out.println("ReadTimeoutMillis: " + options.getReadTimeoutMillis() + " (default = 5000)");
        System.out.println("=================");
    }
}

@SpringBootApplication
public class ServiceBApplication {

    @EventListener(ContextRefreshedEvent.class)
    public void onApplicationEvent() {
        SentryResilieConfiguration.print();
    }

    public static void main(String[] args) {
        SpringApplication.run(ServiceBApplication.class, args);
    }
}

Conclusions

We configured our Spring Boot services to work with Sentry as-a-service (https://sentry.io), so we were able to:

We experimented with configuring a single Sentry project for all modules (services) in the system and configuring a project for each module. We concluded that configuring a project per module is more appropriate: it is easy to filter errors from a given module without losing the integrated view of the distributed trace.

One caveat: Sentry is not designed to be used to audit requests. Although it seems possible to work around it, at first we do not have the request body and the response body of each invocation in Sentry. Furthermore, the recommendation for production is not to keep the traces of all invocations (inadequate for auditing), which also brings us to the next point.

Open questions

It may be an impression, but it seems that Sentry is more recognized for being a tool for analyzing errors (as we said, the quantitative aggregation of errors that Sentry does is quite valuable). However, its use as a tracing tool seems somewhat neglected by the community.

Sentry itself indicates that the sentry.traces-sample-rate property should not be set to 1.0 in production. In other words, in production we would not have a distributed trace of all system executions (although we might have all the traces of the executions involved in errors). I wonder if this would be enough in production to handle the necessary analyses to be performed in case of complaints of slowness in the system. Perhaps a workaround would be to capture all traces for only a short period of time in certain situations.

I haven’t yet used Sentry’s distributed trace functionality in production. Therefore, I don’t have answers to the concerns mentioned above. However, I know that when we talk about distributed traces, names like Jaeger and Open Zipkin seem to be more consolidated. And then there’s Grafana Tempo, which seems to be very promising because it make explicit its concerns with performance and scalability for capturing all traces of the system.

PS: Other Sentry features

Since we only ran the demo services on localhost, we didn’t explored some Sentry features that are harder to evaluate in this context. But it’s worth mentioning two important capabilities that Sentry offers: