Skip to main content

Observability

Distributed logging and tracing

We have leveraged the System.Diagnostics abstractions to instrument parts of our code for tracing. This enables the use of trace collectors (see Zipkin) to view distributed operations within MindLink.

Tracing enables an in-depth analysis of the performance of different operations within MindLink and is a valuable tool for understanding and debugging MindLink.

When tracing is enabled, additional instrumentation is activated within the following dependencies:

  • AspNet Core HTTP services
  • GraphQL services
  • HTTP clients
  • Orleans grain calls
  • Entity Framework Core (Database) requests

Distributed tracing enables traces from every MindLink server in a cluster to be correlated. This means that you can see how an operation originating on one service triggered interactions with other services.

Distributed logging enables you to see logs from multiple services and correlate those log messages and slice by operation, and user.

Correlating user sessions and logs

we have added a generalized correlation identity, traceSessionId, for a session from the client throughout the server, propagated on the traces. This identity is included in logs and traces, enabling an easy way for you to filter logs for a particular user session.

To ensure that you can correlate logs with a user session, make sure that any custom Serilog formatted content includes a {TraceSessionId} placeholder.

The correlation identity is visible to end users in the client in the bottom-right corner when logging on and in the user's profile menu when logged into a session.

Correlation identity in the log on screen Correlation identity in the user profile

In the default log file, you'll be able to filter logs containing a correlation identity, to see log messages related to a specific session, e.g., 88669f7c-392a-4bae-9183-36012c524452:

...
[Information] 2023-09-12T21:36:26.730Z SERVER [Microsoft.AspNetCore.Mvc.Infrastructure.ObjectResultExecutor] [<88669f7c-392a-4bae-9183-36012c524452/>] Executing OkObjectResult.
[TraceId=34f22eeb0a75f6c66033f54f025df362 SpanId=ef71596022bcdf5c]
[Information] 2023-09-12T21:36:26.730Z SERVER [Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker] [88669f7c-392a-4bae-9183-36012c524452] Route matched with {action = "WaitForResponses", controller = "PersistentLongPolling"}.
[TraceId=34f22eeb0a75f6c66033f54f025df362 SpanId=ef71596022bcdf5c]
[Information] 2023-09-12T21:47:53.261Z SERVER [MindLink.Core.Collaboration.Connectors.Auditing.LoggingAuditor] ['88669f7c-392a-4bae-9183-36012c524452'] {"ChatName":"Chat Room","MessageReceivedEventArgs":{"ChatId":"ma-chan://domain.local/b3777084-7027-462c-b8ff-518b0299d2a7","Message":{"SenderId":"sip:alice@domain.local","Content":"[StoryContent: SubjectLength='36', Parts='1', IsAlert='False']","Id":{"MessageId":1510,"OrderingId":560,"Timestamp":"2023-09-12T20:47:53.0170000+00:00","$type":"PersistentGroupMessageId"},"Timestamp":"2023-09-12T20:47:53.2800000+00:00","MessageMetadata":[],"$type":"Message"},"MessageReceivalNotificationHandle":{"$type":"MessageReceivalNotificationHandle"},"$type":"MessageReceivedEventArgs"},"LocalUserId":"sip:bob@domain.local","Description":"message received","AuditEntryStatus":"Propagated","$type":"MessageReceivedAuditEntry"}
...

Example tooling

In this section we describe the use of Loki for collecting distributed logs, Zipkin for collecting distributed traces, and Grafana as a dashboard to interact and visualize these.

It is also possible to configure different distributed logging and trace collectors, such as Azure Application Insights. Logging can leverage a different Serilog sink, and tracing can be forwarded to a different service by configuring the Open Telemetry collector.

Distributed logging with Loki

Loki is a log aggregation system. It stores and indexes logs it receives from applications and makes that data available to Grafana, for more information on Loki, refer to the docs https://grafana.com/oss/loki/. You can modify your MindLink serilog.json configuration to include a "GrafanaLoki" sink, which effectively enables streaming of application log data to a target Loki instance. Below is an example streaming to a local Loki server on port 3100:

{
"Name": "GrafanaLoki",
"Args": {
"uri": "http://localhost:3100",
"labels": [
{
"key": "Logs",
"value": "MindLinkLogs"
}
],
"filtrationMode": "Include",
"filtrationLabels": [
"Logs",
"StartTimestamp"
],
"outputTemplate": "[{Level}] {Timestamp:yyyy-MM-ddTHH:mm:ss.fffZ} {MachineName} [{SourceContext}] [{TraceSessionId}] {Message:lj}{NewLine}{Exception}",
"useMindLinkFormatter": "true"
}
}

Once configured to output logs to Loki, you can use Grafana's Loki data source to view and query the logs in Grafana.

LogQL

The query language for Loki is called "LogQL". This is the language used to retrieve logs from the Loki instance and display them in Grafana dashboards. For more information on using logQL we strongly recommend reading the docs, they are very helpful https://grafana.com/docs/loki/latest/logql/.

Distributed tracing with Zipkin

Zipkin is a distributed tracing system. It stores, indexes and correlates distributed system traces that describe what is going on in the system. This is useful for understanding the route to/from errors, the operations involved in a request and associated timing information to identify slow code paths, or failing dependent services.

Traces are pushed to Zipkin via the open source Open Telemetry (OTEL) connector (see https://opentelemetry.io/docs/collector/getting-started/). OTEL traces are produced by MindLink services when the global.tracing.enabled configuration setting is true, which causes various tracing components to be initialized.

You will need to provide the following advanced configuration keys to MindLink:

KeyValueDescription
global.tracing.enabledtrueDetermines whether tracing is turned on in the MindLink service. This is false by default.
debug.global.tracing.otlpendpointhttps://localhost:4317Specifies the Open Telemetry Protocol endpoint to receive tracing data.
debug.global.tracing.includedbstatementsfalseDetermines whether tracing includes database statements in the trace. This is false by default.

Zipkin dashboard

While Zipkin traces are integrated into grafana an allow you to jump to a trace from a log message, there is a separate Zipkin dashboard accessible at http://localhost:9411.

Grafana dashboard

Grafana is a powerful framework for visualizing data output by deployments. This data, referred to as "metrics", can be provided by multiple "data sources" such as logs (Loki) and other metric data stores (Prometheus). For more information, check out the docs here https://grafana.com/docs/.

Data Sources

Grafana collects application data via configured data sources. Below you'll find some information on how the MindLink host can be configured to provide information to these data sources.

It is also possible to configure cross-references so that you can jump to traces from logs and vice-versa.