Apache SkyWalking – eBPF

Blog: eBPF enhanced HTTP observability - L7 metrics and tracing

Thu, 12 Jan 2023 00:00:00 +0000

Background

Apache SkyWalking is an open-source Application Performance Management system that helps users collect and aggregate logs, traces, metrics, and events for display on a UI. In the previous article, we introduced how to use Apache SkyWalking Rover to analyze the network performance issue in the service mesh environment. However, in business scenarios, users often rely on mature layer 7 protocols, such as HTTP, for interactions between systems. In this article, we will discuss how to use eBPF techniques to analyze performance bottlenecks of layer 7 protocols and how to enhance the tracing system using network sampling.

This article will show how to use Apache SkyWalking with eBPF to enhance metrics and traces in HTTP observability.

HTTP Protocol Analysis

HTTP is one of the most common Layer 7 protocols and is usually used to provide services to external parties and for inter-system communication. In the following sections, we will show how to identify and analyze HTTP/1.x protocols.

Protocol Identification

In HTTP/1.x, the client and server communicate through a single file descriptor (FD) on each side. Figure 1 shows the process of communication involving the following steps:

Connect/accept: The client establishes a connection with the HTTP server, or the server accepts a connection from the client.
Read/write (multiple times): The client or server reads and writes HTTPS requests and responses. A single request-response pair occurs within the same connection on each side.
Close: The client and server close the connection.

To obtain HTTP content, it’s necessary to read it from the second step of this process. As defined in the RFC, the content is contained within the data of the Layer 4 protocol and can be obtained by parsing the data. The request and response pair can be correlated because they both occur within the same connection on each side.

Figure 1: HTTP communication timeline.

HTTP Pipeline

HTTP pipelining is a feature of HTTP/1.1 that enables multiple HTTP requests to be sent over a single TCP connection without waiting for the corresponding responses. This feature is important because it ensures that the order of the responses on the server side matches the order of the requests.

Figure 2 illustrates how this works. Consider the following scenario: an HTTP client sends multiple requests to a server, and the server responds by sending the HTTP responses in the same order as the requests. This means that the first request sent by the client will receive the first response from the server, the second request will receive the second response, and so on.

When designing HTTP parsing, we should follow this principle by adding request data to a list and removing the first item when parsing a response. This ensures that the responses are processed in the correct order.

Figure 2: HTTP/1.1 pipeline.

Metrics

Based on the identification of the HTTP content and process topology diagram mentioned in the previous article, we can combine these two to generate process-to-process metrics data.

Figure 3 shows the metrics that currently support the analysis between the two processes. Based on the HTTP request and response data, we can analyze the following data:

Metrics Name	Type	Unit	Description
Request CPM(Call Per Minute)	Counter	count	The HTTP request count
Response Status CPM(Call Per Minute)	Counter	count	The count of per HTTP response status code
Request Package Size	Counter/Histogram	Byte	The request package size
Response Package Size	Counter/Histogram	Byte	The response package size
Client Duration	Counter/Histogram	Millisecond	The duration of single HTTP response on the client side
Server Duration	Counter/Histogram	Millisecond	The duration of single HTTP response on the server side

Figure 3: Process-to-process metrics.

HTTP and Trace

During the HTTP process, if we unpack the HTTP requests and responses from raw data, we can use this data to correlate with the existing tracing system.

Trace Context Identification

In order to track the flow of requests between multiple services, the trace system usually creates a trace context when a request enters a service and passes it along to other services during the request-response process. For example, when an HTTP request is sent to another server, the trace context is included in the request header.

Figure 4 displays the raw content of an HTTP request intercepted by Wireshark. The trace context information generated by the Zipkin Tracing system can be identified by the “X-B3” prefix in the header. By using eBPF to intercept the trace context in the HTTP header, we can connect the current request with the trace system.

Figure 4: View of HTTP headers in Wireshark.

Trace Event

We have added the concept of an event to traces. An event can be attached to a span and consists of start and end times, tags, and summaries, allowing us to attach any desired information to the Trace.

When performing eBPF network profiling, two events can be generated based on the request-response data. Figure 5 illustrates what happens when a service performs an HTTP request with profiling. The trace system generates trace context information and sends it in the request. When the service executes in the kernel, we can generate an event for the corresponding trace span by interacting with the request-response data and execution time in the kernel space.

Previously, we could only observe the execution status in the user space. However, by combining traces and eBPF technologies, we can now also get more information about the current trace in the kernel space, which would impact less performance for the target service if we do similar things in the tracing SDK and agent.

Figure 5: Logical view of profiling an HTTP request and response.

Sampling

To ensure efficient data storage and minimize unnecessary data sampling, we use a sampling mechanism for traces in our system. This mechanism triggers sampling only when certain conditions are met. We also provide a list of the top N traces, which allows users to quickly access the relevant request information for a specific trace.

To help users easily identify and analyze relevant events, we offer three different sampling rules:

Slow Traces: Sampling is triggered when the response time for a request exceeds a specified threshold.
Response Status [400, 500): Sampling is triggered when the response status code is greater than or equal to 400 and less than 500.
Response Status [500, 600): Sampling is triggered when the response status code is greater than or equal to 500 and less than 600.

In addition, we recognize that not all request or response raw data may be necessary for analysis. For example, users may be more interested in requesting data when trying to identify performance issues, while they may be more interested in response data when troubleshooting errors. As such, we also provide configuration options for request or response events to allow users to specify which type of data they would like to sample.

Profiling in a Service Mesh

The SkyWalking and SkyWalking Rover projects have already implemented the HTTP protocol analyze and trace associations. How do they perform when running in a service mesh environment?

Deployment

Figure 6 demonstrates the deployment of SkyWalking and SkyWalking Rover in a service mesh environment. SkyWalking Rover is deployed as a DaemonSet on each machine where a service is located and communicates with the SkyWalking backend cluster. It automatically recognizes the services on the machine and reports metadata information to the SkyWalking backend cluster. When a new network profiling task arises, SkyWalking Rover senses the task and analyzes the designated processes, collecting and aggregating network data before ultimately reporting it back to the SkyWalking backend service.

Figure 6: SkyWalking rover deployment topology in a service mesh.

Tracing Systems

Starting from version 9.3.0, the SkyWalking backend fully supports all functions in the Zipkin server. Therefore, the SkyWalking backend can collect traces from both the SkyWalking and Zipkin protocols. Similarly, SkyWalking Rover can identify and analyze trace context in both the SkyWalking and Zipkin trace systems. In the following two sections, network analysis results will be displayed in the SkyWalking and Zipkin UI respectively.

SkyWalking

When SkyWalking performs network profiling, similar to the TCP metrics in the previous article, the SkyWalking UI will first display the topology between processes. When you open the dashboard of the line representing the traffic metrics between processes, you can see the metrics of HTTP traffic from the “HTTP/1.x” tab and the sampled HTTP requests with tracing in the “HTTP Requests” tab.

As shown in Figure 7, there are three lists in the tab, each corresponding to a condition in the event sampling rules. Each list displays the traces that meet the pre-specified conditions. When you click on an item in the trace list, you can view the complete trace.

Figure 7: Sampled HTTP requests within tracing context.

When you click on an item in the trace list, you can quickly view the specified trace. In Figure 8, we can see that in the current service-related span, there is a tag with a number indicating how many HTTP events are related to that trace span.

Since we are in a service mesh environment, each service involves interacting with Envoy. Therefore, the current span includes Envoy’s request and response information. Additionally, since the current service has both incoming and outgoing requests, there are events in the corresponding span.

Figure 8: Events in the trace detail.

When the span is clicked, the details of the span will be displayed. If there are events in the current span, the relevant event information will be displayed on a time axis. As shown in Figure 9, there are a total of 6 related events in the current Span. Each event represents a data sample of an HTTP request/response. One of the events spans multiple time ranges, indicating a longer system call time. It may be due to a blocked system call, depending on the implementation details of the HTTP request in different languages. This can also help us query the possible causes of errors.

Figure 9: Events in one trace span.

Finally, we can click on a specific event to see its complete information. As shown in Figure 10, it displays the sampling information of a request, including the SkyWalking trace context protocol contained in the request header from the HTTP raw data. The raw request data allows you to quickly re-request the request to solve any issues.

Figure 10: The detail of the event.

Zipkin

Zipkin is one of the most widely used distributed tracing systems in the world. SkyWalking can function as an alternative server to provide advanced features for Zipkin users. Here, we use this way to bring the feature into the Zipkin ecosystem out-of-box. The new events would also be treated as a kind of Zipkin’s tags and annotations.

To add events to a Zipkin span, we need to do the following:

Split the start and end times of each event into two annotations with a canonical name.
Add the sampled HTTP raw data from the event to the Zipkin span tags, using the same event name for corresponding purposes.

Figures 11 and 12 show annotations and tags in the same span. In these figures, we can see that the span includes at least two events with the same event name and sequence suffix (e.g., “Start/Finished HTTP Request/Response Sampling-x” in the figure). Both events have separate timestamps to represent their relative times within the span. In the tags, the data content of the corresponding event is represented by the event name and sequence number, respectively.

Figure 11: Event timestamp in the Zipkin span annotation.

Figure 12: Event raw data in the Zipkin span tag.

Demo

In this section, we demonstrate how to perform network profiling in a service mesh and complete metrics collection and HTTP raw data sampling. To follow along, you will need a running Kubernetes environment.

Deploy SkyWalking Showcase

SkyWalking Showcase contains a complete set of example services and can be monitored using SkyWalking. For more information, please check the official documentation.

In this demo, we only deploy service, the latest released SkyWalking OAP, and UI.

export SW_OAP_IMAGE=apache/skywalking-oap-server:9.3.0
export SW_UI_IMAGE=apache/skywalking-ui:9.3.0
export SW_ROVER_IMAGE=apache/skywalking-rover:0.4.0

export FEATURE_FLAGS=mesh-with-agent,single-node,elasticsearch,rover
make deploy.kubernetes

After deployment is complete, please run the following script to open SkyWalking UI: http://localhost:8080/.

kubectl port-forward svc/ui 8080:8080 --namespace default

Start Network Profiling Task

Currently, we can select the specific instances that we wish to monitor by clicking the Data Plane item in the Service Mesh panel and the Service item in the Kubernetes panel.

In figure 13, we have selected an instance with a list of tasks in the network profiling tab.

Figure 13: Network Profiling tab in the Data Plane.

When we click the Start button, as shown in Figure 14, we need to specify the sampling rules for the profiling task. The sampling rules consist of one or more rules, each of which is distinguished by a different URI regular expression. When the HTTP request URI matches the regular expression, the rule is used. If the URI regular expression is empty, the default rule is used. Using multiple rules can help us make different sampling configurations for different requests.

Each rule has three parameters to determine if sampling is needed:

Minimal Request Duration (ms): requests with a response time exceeding the specified time will be sampled.
Sampling response status code between 400 and 499: all status codes in the range [400-499) will be sampled.
Sampling response status code between 500 and 599: all status codes in the range [500-599) will be sampled.

Once the sampling configuration is complete, we can create the task.

Figure 14: Create network profiling task page.

Done!

After a few seconds, you will see the process topology appear on the right side of the page.

When you click on the line between processes, you can view the data between the two processes, which is divided into three tabs:

TCP: displays TCP-related metrics.
HTTP/1.x: displays metrics in the HTTP 1 protocol.
HTTP Requests: displays the analyzed request and saves it to a list according to the sampling rule.

Figure 16: TCP metrics in a network profiling task.

Figure 17: HTTP/1.x metrics in a network profiling task.

Figure 18: HTTP sampled requests in a network profiling task.

Conclusion

In this article, we detailed the overview of how to analyze the Layer 7 HTTP/1.x protocol in network analysis, and how to associate it with existing trace systems. This allows us to extend the scope of data we can observe from just user space to also include kernel-space data.

In the future, we will delve further into the analysis of kernel data, such as collecting information on TCP packet size, transmission frequency, network card, and help on enhancing distributed tracing from another perspective.

Additional Resources

Zh: 使用 eBPF 提升 HTTP 可观测性 - L7 指标和追踪

Thu, 12 Jan 2023 00:00:00 +0000

背景

Apache SkyWalking 是一个开源应用性能管理系统，帮助用户收集和聚合日志、追踪、指标和事件，并在 UI 上显示。在上一篇文章中，我们介绍了如何使用 Apache SkyWalking Rover 分析服务网格环境中的网络性能问题。但是，在商业场景中，用户通常依靠成熟的第 7 层协议（如 HTTP）来进行系统之间的交互。在本文中，我们将讨论如何使用 eBPF 技术来分析第 7 层协议的性能瓶颈，以及如何使用网络采样来增强追踪系统。

本文将演示如何使用 Apache SkyWalking 与 eBPF 来增强 HTTP 可观察性中的指标和追踪。

HTTP 协议分析

HTTP 是最常用的 7 层协议之一，通常用于为外部方提供服务和进行系统间通信。在下面的章节中，我们将展示如何识别和分析 HTTP/1.x 协议。

协议识别

在 HTTP/1.x 中，客户端和服务器通过两端的单个文件描述符（File Descriptor）进行通信。图 1 显示了涉及以下步骤的通信过程：

Connect/Accept：客户端与 HTTP 服务器建立连接，或者服务器接受客户端的连接。
Read/Write（多次）：客户端或服务器读取和写入 HTTPS 请求和响应。单个请求 - 响应对在每边的同一连接内发生。
Close：客户端和服务器关闭连接。

为了获取 HTTP 内容，必须从此过程的第二步读取它。根据 RFC 定义，内容包含在 4 层协议的数据中，可以通过解析数据来获取。请求和响应对可以相关联，因为它们都在两端的同一连接内发生。

图 1：HTTP 通信时间线。

HTTP 管线化

HTTP 管线化（Pipelining）是 HTTP/1.1 的一个特性，允许在等待对应的响应的情况下在单个 TCP 连接上发送多个 HTTP 请求。这个特性很重要，因为它确保了服务器端的响应顺序必须与请求的顺序匹配。

图 2 说明了这是如何工作的，考虑以下情况：HTTP 客户端向服务器发送多个请求，服务器通过按照请求的顺序发送 HTTP 响应来响应。这意味着客户端发送的第一个请求将收到服务器的第一个响应，第二个请求将收到第二个响应，以此类推。

在设计 HTTP 解析时，我们应该遵循这个原则，将请求数据添加到列表中，并在解析响应时删除第一个项目。这可以确保响应按正确的顺序处理。

图 2： HTTP/1.1 管道。

指标

根据前文提到的 HTTP 内容和流程拓扑图的识别，我们可以将这两者结合起来生成进程间的指标数据。

图 3 显示了目前支持两个进程间分析的指标。基于 HTTP 请求和响应数据，可以分析以下数据：

指标名称	类型	单位	描述
请求 CPM（Call Per Minute）	计数器	计数	HTTP 请求计数
响应状态 CPM (Call Per Minute)	计数器	计数	每个 HTTP 响应状态码的计数
请求包大小	计数器 / 直方图	字节	请求包大小
响应包大小	计数器 / 直方图	字节	响应包大小
客户端持续时间	计数器 / 直方图	毫秒	客户端单个 HTTP 响应的持续时间
服务器持续时间	计数器 / 直方图	毫秒	服务器端单个 HTTP 响应的持续时间

图 3：进程到进程指标。

HTTP 和追踪

在 HTTP 过程中，如果我们能够从原始数据中解包 HTTP 请求和响应，就可以使用这些数据与现有的追踪系统进行关联。

追踪上下文标识

为了追踪多个服务之间的请求流，追踪系统通常在请求进入服务时创建追踪上下文，并在请求 - 响应过程中将其传递给其他服务。例如，当 HTTP 请求发送到另一个服务器时，追踪上下文包含在请求头中。

图 4 显示了 Wireshark 拦截的 HTTP 请求的原始内容。由 Zipkin Tracing 系统生成的追踪上下文信息可以通过头中的 “X-B3” 前缀进行标识。通过使用 eBPF 拦截 HTTP 头中的追踪上下文，可以将当前请求与追踪系统连接起来。

图 4：Wireshark 中的 HTTP Header 视图。

Trace 事件

我们已经将事件这个概念加入了追踪中。事件可以附加到跨度上，并包含起始和结束时间、标签和摘要，允许我们将任何所需的信息附加到追踪中。

在执行 eBPF 网络分析时，可以根据请求 - 响应数据生成两个事件。图 5 说明了在带分析的情况下执行 HTTP 请求时发生的情况。追踪系统生成追踪上下文信息并将其发送到请求中。当服务在内核中执行时，我们可以通过与内核空间中的请求 - 响应数据和执行时间交互，为相应的追踪跨度生成事件。

以前，我们只能观察用户空间的执行状态。现在，通过结合追踪和 eBPF 技术，我们还可以在内核空间获取更多关于当前追踪的信息，如果我们在追踪 SDK 和代理中执行类似的操作，将对目标服务的性能产生较小的影响。

图 5：分析 HTTP 请求和响应的逻辑视图。

抽样

该机制仅在满足特定条件时触发抽样。我们还提供了前 N 条追踪的列表，允许用户快速访问特定追踪的相关请求信息。为了帮助用户轻松识别和分析相关事件，我们提供了三种不同的抽样规则：

慢速追踪：当请求的响应时间超过指定阈值时触发抽样。
响应状态 [400,500)：当响应状态代码大于或等于 400 且小于 500 时触发抽样。
响应状态 [500,600)：当响应状态代码大于或等于 500 且小于 600 时触发抽样。

此外，我们认识到分析时可能并不需要所有请求或响应的原始数据。例如，当试图识别性能问题时，用户可能更感兴趣于请求数据，而在解决错误时，他们可能更感兴趣于响应数据。因此，我们还提供了请求或响应事件的配置选项，允许用户指定要抽样的数据类型。

服务网格中的分析

SkyWalking Rover 项目已经实现了 HTTP 协议的分析和追踪关联。当在服务网格环境中运行时它们的表现如何？

部署

图 6 演示了 SkyWalking 和 SkyWalking Rover 在服务网格环境中的部署方式。SkyWalking Rover 作为一个 DaemonSet 部署在每台服务所在的机器上，并与 SkyWalking 后端集群通信。它会自动识别机器上的服务并向 SkyWalking 后端集群报告元数据信息。当出现新的网络分析任务时，SkyWalking Rover 会感知该任务并对指定的进程进行分析，在最终将数据报告回 SkyWalking 后端服务之前，收集和聚合网络数据。

图 6：服务网格中的 SkyWalking rover 部署拓扑。

追踪系统

从版本 9.3.0 开始，SkyWalking 后端完全支持 Zipkin 服务器中的所有功能。因此，SkyWalking 后端可以收集来自 SkyWalking 和 Zipkin 协议的追踪。同样，SkyWalking Rover 可以在 SkyWalking 和 Zipkin 追踪系统中识别和分析追踪上下文。在接下来的两节中，网络分析结果将分别在 SkyWalking 和 Zipkin UI 中显示。

SkyWalking

当 SkyWalking 执行网络分析时，与前文中的 TCP 指标类似，SkyWalking UI 会首先显示进程间的拓扑图。当打开代表进程间流量指标的线的仪表板时，您可以在 “HTTP/1.x” 选项卡中看到 HTTP 流量的指标，并在 “HTTP Requests” 选项卡中看到带追踪的抽样的 HTTP 请求。

如图 7 所示，选项卡中有三个列表，每个列表对应事件抽样规则中的一个条件。每个列表显示符合预先规定条件的追踪。当您单击追踪列表中的一个项目时，就可以查看完整的追踪。

图 7：Tracing 上下文中的采样 HTTP 请求。

当您单击追踪列表中的一个项目时，就可以快速查看指定的追踪。在图 8 中，我们可以看到在当前的服务相关的跨度中，有一个带有数字的标签，表示与该追踪跨度相关的 HTTP 事件数。

由于我们在服务网格环境中，每个服务都涉及与 Envoy 交互。因此，当前的跨度包括 Envoy 的请求和响应信息。此外，由于当前的服务有传入和传出的请求，因此相应的跨度中有事件。

图 8：Tracing 详细信息中的事件。

当单击跨度时，将显示跨度的详细信息。如果当前跨度中有事件，则相关事件信息将在时间轴上显示。如图 9 所示，当前跨度中一共有 6 个相关事件。每个事件代表一个 HTTP 请求 / 响应的数据样本。其中一个事件跨越多个时间范围，表示较长的系统调用时间。这可能是由于系统调用被阻塞，具体取决于不同语言中的 HTTP 请求的实现细节。这也可以帮助我们查询错误的可能原因。

图 9：一个 Tracing 范围内的事件。

最后，我们可以单击特定的事件查看它的完整信息。如图 10 所示，它显示了一个请求的抽样信息，包括从 HTTP 原始数据中的请求头中包含的 SkyWalking 追踪上下文协议。原始请求数据允许您快速重新请求以解决任何问题。

图 10：事件的详细信息。

Zipkin

Zipkin 是世界上广泛使用的分布式追踪系统。SkyWalking 可以作为替代服务器，提供高级功能。在这里，我们使用这种方式将功能无缝集成到 Zipkin 生态系统中。新事件也将被视为 Zipkin 的标签和注释的一种。

为 Zipkin 跨度添加事件，需要执行以下操作：

将每个事件的开始时间和结束时间分别拆分为两个具有规范名称的注释。
将抽样的 HTTP 原始数据从事件添加到 Zipkin 跨度标签中，使用相同的事件名称用于相应的目的。

图 11 和图 12 显示了同一跨度中的注释和标签。在这些图中，我们可以看到跨度包含至少两个具有相同事件名称和序列后缀的事件（例如，图中的 “Start/Finished HTTP Request/Response Sampling-x”）。这两个事件均具有单独的时间戳，用于表示其在跨度内的相对时间。在标签中，对应事件的数据内容分别由事件名称和序列号表示。

图 11：Zipkin span 注释中的事件时间戳。

图 12：Zipkin span 标签中的事件原始数据。

演示

在本节中，我们将演示如何在服务网格中执行网络分析，并完成指标收集和 HTTP 原始数据抽样。要进行操作，您需要一个运行中的 Kubernetes 环境。

部署 SkyWalking Showcase

SkyWalking Showcase 包含一套完整的示例服务，可以使用 SkyWalking 进行监控。有关详细信息，请参阅官方文档。

在本演示中，我们只部署了服务、最新发布的 SkyWalking OAP 和 UI。

export SW_OAP_IMAGE=apache/skywalking-oap-server:9.3.0
export SW_UI_IMAGE=apache/skywalking-ui:9.3.0
export SW_ROVER_IMAGE=apache/skywalking-rover:0.4.0

export FEATURE_FLAGS=mesh-with-agent,single-node,elasticsearch,rover
make deploy.kubernetes

部署完成后，运行下面的脚本启动 SkyWalking UI：http://localhost:8080/。

kubectl port-forward svc/ui 8080:8080 --namespace default

启动网络分析任务

目前，我们可以通过单击服务网格面板中的 Data Plane 项和 Kubernetes 面板中的 Service 项来选择要监视的特定实例。

在图 13 中，我们已在网络分析选项卡中选择了一个具有任务列表的实例。

图 13：数据平面中的网络分析选项卡。

当我们单击 “开始” 按钮时，如图 14 所示，我们需要为分析任务指定抽样规则。抽样规则由一个或多个规则组成，每个规则都由不同的 URI 正则表达式区分。当 HTTP 请求的 URI 与正则表达式匹配时，将使用该规则。如果 URI 正则表达式为空，则使用默认规则。使用多个规则可以帮助我们为不同的请求配置不同的抽样配置。

每个规则都有三个参数来确定是否需要抽样：

最小请求持续时间（毫秒）：响应时间超过指定时间的请求将被抽样。
在 400 和 499 之间的抽样响应状态代码：范围 [400-499) 中的所有状态代码将被抽样。
在 500 和 599 之间的抽样响应状态代码：范围 [500-599) 中的所有状态码将被抽样。

抽样配置完成后，我们就可以创建任务了。

图 14：创建网络分析任务页面。

完成

几秒钟后，你会看到页面的右侧出现进程拓扑结构。

图 15：网络分析任务中的流程拓扑。

当您单击进程之间的线时，您可以查看两个过程之间的数据，它被分为三个选项卡：

TCP：显示与 TCP 相关的指标。
HTTP/1.x：显示 HTTP 1 协议中的指标。
HTTP 请求：显示已分析的请求，并根据抽样规则保存到列表中。

图 16：网络分析任务中的 TCP 指标。

图 17：网络分析任务中的 HTTP/1.x 指标。

图 18：网络分析任务中的 HTTP 采样请求。

总结

在本文中，我们详细介绍了如何在网络分析中分析 7 层 HTTP/1.x 协议，以及如何将其与现有追踪系统相关联。这使我们能够将我们能够观察到的数据从用户空间扩展到内核空间数据。

在未来，我们将进一步探究内核数据的分析，例如收集 TCP 包大小、传输频率、网卡等信息，并从另一个角度提升分布式追踪。

其他资源

Blog: Diagnose Service Mesh Network Performance with eBPF

Tue, 27 Sep 2022 00:00:00 +0000

Background

This article will show how to use Apache SkyWalking with eBPF to make network troubleshooting easier in a service mesh environment.

Apache SkyWalking is an application performance monitor tool for distributed systems. It observes metrics, logs, traces, and events in the service mesh environment and uses that data to generate a dependency graph of your pods and services. This dependency graph can provide quick insights into your system, especially when there’s an issue.

However, when troubleshooting network issues in SkyWalking’s service topology, it is not always easy to pinpoint where the error actually is. There are two reasons for the difficulty:

Traffic through the Envoy sidecar is not easy to observe. Data from Envoy’s Access Log Service (ALS) shows traffic between services (sidecar-to-sidecar), but not metrics on communication between the Envoy sidecar and the service it proxies. Without that information, it is more difficult to understand the impact of the sidecar.
There is a lack of data from transport layer (OSI Layer 4) communication. Since services generally use application layer (OSI Layer 7) protocols such as HTTP, observability data is generally restricted to application layer communication. However, the root cause may actually be in the transport layer, which is typically opaque to observability tools.

Access to metrics from Envoy-to-service and transport layer communication can make it easier to diagnose service issues. To this end, SkyWalking needs to collect and analyze transport layer metrics between processes inside Kubernetes pods - a task well suited to eBPF. We investigated using eBPF for this purpose and present our results and a demo below.

Monitoring Kubernetes Networks with eBPF

With its origins as the Extended Berkeley Packet Filter, eBPF is a general purpose mechanism for injecting and running your own code into the Linux kernel and is an excellent tool for monitoring network traffic in Kubernetes Pods. In the next few sections, we'll provide an overview of how to use eBPF for network monitoring as background for introducing Skywalking Rover, a metrics collector and profiler powered by eBPF to diagnose CPU and network performance.

How Applications and the Network Interact

Interactions between the application and the network can generally be divided into the following steps from higher to lower levels of abstraction:

User Code: Application code uses high-level network libraries in the application stack to exchange data across the network, like sending and receiving HTTP requests.
Network Library: When the network library receives a network request, it interacts with the language API to send the network data.
Language API: Each language provides an API for operating the network, system, etc. When a request is received, it interacts with the system API. In Linux, this API is called syscalls.
Linux API: When the Linux kernel receives the request through the API, it communicates with the socket to send the data, which is usually closer to an OSI Layer 4 protocol, such as TCP, UDP, etc.
Socket Ops: Sending or receiving the data to/from the NIC.

Our hypothesis is that eBPF can monitor the network. There are two ways to implement the interception: User space (uprobe) or Kernel space (kprobe). The table below summarizes the differences.

	Pros	Cons
uprobe	• Get more application-related contexts, such as whether the current request is HTTP or HTTPS. • Requests and responses can be intercepted by a single method	• Data structures can be unstable, so it is more difficult to get the desired data. • Implementation may differ between language/library versions. • Does not work in applications without symbol tables.
kprobe	• Available for all languages. • The data structure and methods are stable and do not require much adaptation. • Easier correlation with underlying data, such as getting the destination address of TCP, OSI Layer 4 protocol metrics, etc.	• A single request and response may be split into multiple probes. • Contextual information is not easy to get for stateful requests. For example header compression in HTTP/2.

For the general network performance monitor, we chose to use the kprobe (intercept the syscalls) for the following reasons:

It’s available for applications written in any programming language, and it’s stable, so it saves a lot of development/adaptation costs.
It can be correlated with metrics from the system level, which makes it easier to troubleshoot.
As a single request and response are split into multiple probes, we can use technology to correlate them.
For contextual information, It’s usually used in OSI Layer 7 protocol network analysis. So, if we just monitor the network performance, then they can be ignored.

Kprobes and network monitoring

Following the network syscalls of Linux documentation, we can implement network monitoring by intercepting two types of methods: socket operations and send/receive methods.

Socket Operations

When accepting or connecting with another socket, we can get the following information:

Connection information: Includes the remote address from the connection which helps us to understand which pod is connected.
Connection statics: Includes basic metrics from sockets, such as round-trip time (RTT), lost packet count in TCP, etc.
Socket and file descriptor (FD) mapping: Includes the relationship between the Linux file descriptor and socket object. It is useful when sending and receiving data through a Linux file descriptor.

Send/Receive

The interface related to sending or receiving data is the focus of performance analysis. It mainly contains the following parameters:

Socket file descriptor: The file descriptor of the current operation corresponding to the socket.
Buffer: The data sent or received, passed as a byte array.

Based on the above parameters, we can analyze the following data:

Bytes: The size of the packet in bytes.
Protocol: The protocol analysis according to the buffer data, such as HTTP, MySQL, etc.
Execution Time: The time it takes to send/receive the data.

At this point (Figure 1) we can analyze the following steps for the whole lifecycle of the connection:

Connect/Accept: When the connection is created.
Transform: Sending and receiving data on the connection.
Close: When the connection is closed.

Figure 1

Protocol and TLS

The previous section described how to analyze connections using send or receive buffer data. For example, following the HTTP/1.1 message specification to analyze the connection. However, this does not work for TLS requests/responses.

Figure 2

When TLS is in use, the Linux Kernel transmits data encrypted in user space. In the figure above, The application usually transmits SSL data through a third-party library (such as OpenSSL). For this case, the Linux API can only get the encrypted data, so it cannot recognize any higher layer protocol. To decrypt inside eBPF, we need to follow these steps:

Read unencrypted data through uprobe: Compatible multiple languages, using uprobe to capture the data that is not encrypted before sending or after receiving. In this way, we can get the original data and associate it with the socket.
Associate with socket: We can associate unencrypted data with the socket.

OpenSSL Use case

For example, the most common way to send/receive SSL data is to use OpenSSL as a shared library, specifically the SSL_read and SSL_write methods to submit the buffer data with the socket.

Following the documentation, we can intercept these two methods, which are almost identical to the API in Linux. The source code of the SSL structure in OpenSSL shows that the Socket FD exists in the BIO object of the SSL structure, and we can get it by the offset.

In summary, with knowledge of how OpenSSL works, we can read unencrypted data in an eBPF function.

Introducing SkyWalking Rover, an eBPF-based Metrics Collector and Profiler

SkyWalking Rover introduces the eBPF network profiling feature into the SkyWalking ecosystem. It’s currently supported in a Kubernetes environment, so must be deployed inside a Kubernetes cluster. Once the deployment is complete, SkyWalking Rover can monitor the network for all processes inside a given Pod. Based on the monitoring data, SkyWalking can generate the topology relationship diagram and metrics between processes.

Topology Diagram

The topology diagram can help us understand the network access between processes inside the same Pod, and between the process and external environment (other Pod or service). Additionally, it can identify the data direction of traffic based on the line flow direction.

In Figure 3 below, all nodes within the hexagon are the internal process of a Pod, and nodes outside the hexagon are externally associated services or Pods. Nodes are connected by lines, which indicate the direction of requests or responses between nodes (client or server). The protocol is indicated on the line, and it’s either HTTP(S), TCP, or TCP(TLS). Also, we can see in this figure that the line between Envoy and Python applications is bidirectional because Envoy intercepts all application traffic.

Figure 3

Metrics

Once we recognize the network call relationship between processes through the topology, we can select a specific line and view the TCP metrics between the two processes.

The diagram below (Figure 4) shows the metrics of network monitoring between two processes. There are four metrics in each line. Two on the left side are on the client side, and two on the right side are on the server side. If the remote process is not in the same Pod, only one side of the metrics is displayed.

Figure 4

The following two metric types are available:

Counter: Records the total number of data in a certain period. Each counter contains the following data: a. Count: Execution count. b. Bytes: Packet size in bytes. c. Execution time: Execution duration.
Histogram: Records the distribution of data in the buckets.

Based on the above data types, the following metrics are exposed:

Name	Type	Unit	Description
Write	Counter and histogram	Millisecond	The socket write counter.
Read	Counter and histogram	Millisecond	The socket read counter.
Write RTT	Counter and histogram	Microsecond	The socket write round trip time (RTT) counter.
Connect	Counter and histogram	Millisecond	The socket connect/accept with another server/client counter.
Close	Counter and histogram	Millisecond	The socket with other socket counter.
Retransmit	Counter	Millisecond	The socket retransmit package counter.
Drop	Counter	Millisecond	The socket drop package counter.

Demo

In this section, we demonstrate how to perform network profiling in the service mesh. To follow along, you will need a running Kubernetes environment.

NOTE: All commands and scripts are available in this GitHub repository.

Install Istio

Istio is the most widely deployed service mesh, and comes with a complete demo application that we can use for testing. To install Istio and the demo application, follow these steps:

Install Istio using the demo configuration profile.
Label the default namespace, so Istio automatically injects Envoy sidecar proxies when we’ll deploy the application.
Deploy the bookinfo application to the cluster.
Deploy the traffic generator to generate some traffic to the application.

export ISTIO_VERSION=1.13.1

# install istio
istioctl install -y --set profile=demo
kubectl label namespace default istio-injection=enabled

# deploy the bookinfo applications
kubectl apply -f https://raw.githubusercontent.com/istio/istio/$ISTIO_VERSION/samples/bookinfo/platform/kube/bookinfo.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/$ISTIO_VERSION/samples/bookinfo/networking/bookinfo-gateway.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/$ISTIO_VERSION/samples/bookinfo/networking/destination-rule-all.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/$ISTIO_VERSION/samples/bookinfo/networking/virtual-service-all-v1.yaml

# generate traffic
kubectl apply -f https://raw.githubusercontent.com/mrproliu/skywalking-network-profiling-demo/main/resources/traffic-generator.yaml

Install SkyWalking

The following will install the storage, backend, and UI needed for SkyWalking:

git clone https://github.com/apache/skywalking-kubernetes.git
cd skywalking-kubernetes
cd chart
helm dep up skywalking
helm -n istio-system install skywalking skywalking \
 --set fullnameOverride=skywalking \
 --set elasticsearch.minimumMasterNodes=1 \
 --set elasticsearch.imageTag=7.5.1 \
 --set oap.replicas=1 \
 --set ui.image.repository=apache/skywalking-ui \
 --set ui.image.tag=9.2.0 \
 --set oap.image.tag=9.2.0 \
 --set oap.envoy.als.enabled=true \
 --set oap.image.repository=apache/skywalking-oap-server \
 --set oap.storageType=elasticsearch \
 --set oap.env.SW_METER_ANALYZER_ACTIVE_FILES='network-profiling'

Install SkyWalking Rover

SkyWalking Rover is deployed on every node in Kubernetes, and it automatically detects the services in the Kubernetes cluster. The network profiling feature has been released in the version 0.3.0 of SkyWalking Rover. When a network monitoring task is created, the SkyWalking rover sends the data to the SkyWalking backend.

kubectl apply -f https://raw.githubusercontent.com/mrproliu/skywalking-network-profiling-demo/main/resources/skywalking-rover.yaml

Start the Network Profiling Task

Once all deployments are completed, we must create a network profiling task for a specific instance of the service in the SkyWalking UI.

To open SkyWalking UI, run:

kubectl port-forward svc/skywalking-ui 8080:80 --namespace
istio-system

Currently, we can select the specific instances that we wish to monitor by clicking the Data Plane item in the Service Mesh panel and the Service item in the Kubernetes panel.

In the figure below, we have selected an instance with a list of tasks in the network profiling tab. When we click the start button, the SkyWalking Rover starts monitoring this instance’s network.

Figure 5

Done!

After a few seconds, you will see the process topology appear on the right side of the page.

Figure 6

When you click on the line between processes, you can see the TCP metrics between the two processes.

Figure 7

Conclusion

In this article, we detailed a problem that makes troubleshooting service mesh architectures difficult: lack of context between layers in the network stack. These are the cases when eBPF begins to really help with debugging/productivity when existing service mesh/envoy cannot. Then, we researched how eBPF could be applied to common communication, such as TLS. Finally, we demo the implementation of this process with SkyWalking Rover.

For now, we have completed the performance analysis for OSI layer 4 (mostly TCP). In the future, we will also introduce the analysis for OSI layer 7 protocols like HTTP.

Get Started with Istio

To get started with service mesh today, Tetrate Istio Distro is the easiest way to install, manage, and upgrade Istio. It provides a vetted upstream distribution of Istio that’s tested and optimized for specific platforms by Tetrate plus a CLI that facilitates acquiring, installing, and configuring multiple Istio versions. Tetrate Istio Distro also offers FIPS certified Istio builds for FedRAMP environments.

For enterprises that need a unified and consistent way to secure and manage services and traditional workloads across complex, heterogeneous deployment environments, we offer Tetrate Service Bridge, our flagship edge-to-workload application connectivity platform built on Istio and Envoy.

Additional Resources

Blog: Pinpoint Service Mesh Critical Performance Impact by using eBPF

Tue, 05 Jul 2022 00:00:00 +0000

Content

Background

Apache SkyWalking observes metrics, logs, traces, and events for services deployed into the service mesh. When troubleshooting, SkyWalking error analysis can be an invaluable tool helping to pinpoint where an error occurred. However, performance problems are more difficult: It’s often impossible to locate the root cause of performance problems with pre-existing observation data. To move beyond the status quo, dynamic debugging and troubleshooting are essential service performance tools. In this article, we’ll discuss how to use eBPF technology to improve the profiling feature in SkyWalking and analyze the performance impact in the service mesh.

Trace Profiling in SkyWalking

Since SkyWalking 7.0.0, Trace Profiling has helped developers find performance problems by periodically sampling the thread stack to let developers know which lines of code take more time. However, Trace Profiling is not suitable for the following scenarios:

Thread Model: Trace Profiling is most useful for profiling code that executes in a single thread. It is less useful for middleware that relies heavily on async execution models. For example Goroutines in Go or Kotlin Coroutines.
Language: Currently, Trace Profiling is only supported in Java and Python, since it’s not easy to obtain the thread stack in the runtimes of some languages such as Go and Node.js.
Agent Binding: Trace Profiling requires Agent installation, which can be tricky depending on the language (e.g., PHP has to rely on its C kernel; Rust and C/C++ require manual instrumentation to make install).
Trace Correlation: Since Trace Profiling is only associated with a single request it can be hard to determine which request is causing the problem.
Short Lifecycle Services: Trace Profiling doesn’t support short-lived services for (at least) two reasons:
1. It’s hard to differentiate system performance from class code manipulation in the booting stage.
2. Trace profiling is linked to an endpoint to identify performance impact, but there is no endpoint to match these short-lived services.

Fortunately, there are techniques that can go further than Trace Profiling in these situations.

Introduce eBPF

We have found that eBPF — a technology that can run sandboxed programs in an operating system kernel and thus safely and efficiently extend the capabilities of the kernel without requiring kernel modifications or loading kernel modules — can help us fill gaps left by Trace Profiling. eBPF is a trending technology because it breaks the traditional barrier between user and kernel space. Programs can now inject bytecode that runs in the kernel, instead of having to recompile the kernel to customize it. This is naturally a good fit for observability.

In the figure below, we can see that when the system executes the execve syscalls, the eBPF program is triggered, and the current process runtime information is obtained by using function calls.

Using eBPF technology, we can expand the scope of Skywalking’s profiling capabilities:

Global Performance Analysis: Before eBPF, data collection was limited to what agents can observe. Since eBPF programs run in the kernel, they can observe all threads. This is especially useful when you are not sure whether a performance problem is caused by a particular request.
Data Content: eBPF can dump both user and kernel space thread stacks, so if a performance issue happens in kernel space, it’s easier to find.
Agent Binding: All modern Linux kernels support eBPF, so there is no need to install anything. This means it is an orchestration-free vs an agent model. This reduces friction caused by built-in software which may not have the correct agents installed, such as Envoy in a Service Mesh.
Sampling Type: Unlike Trace Profiling, eBPF is event-driven and, therefore, not constrained by interval polling. For example, eBPF can trigger events and collect more data depending on a transfer size threshold. This can allow the system to triage and prioritize data collection under extreme load.

eBPF Limitations

While eBPF offers significant advantages for hunting performance bottlenecks, no technology is perfect. eBPF has a number of limitations described below. Fortunately, since SkyWalking does not require eBPF, the impact is limited.

Linux Version Requirement: eBPF programs require a Linux kernel version above 4.4, with later kernel versions offering more data to be collected. The BCC has documented the features supported by different Linux kernel versions, with the differences between versions usually being what data can be collected with eBPF.
Privileges Required: All processes that intend to load eBPF programs into the Linux kernel must be running in privileged mode. As such, bugs or other issues in such code may have a big impact.
Weak Support for Dynamic Language: eBPF has weak support for JIT-based dynamic languages, such as Java. It also depends on what data you want to collect. For Profiling, eBPF does not support parsing the symbols of the program, which is why most eBPF-based profiling technologies only support static languages like C, C++, Go, and Rust. However, symbol mapping can sometimes be solved through tools provided by the language. For example, in Java, perf-map-agent can be used to generate the symbol mapping. However, dynamic languages don’t support the attach (uprobe) functionality that would allow us to trace execution events through symbols.

Introducing SkyWalking Rover

SkyWalking Rover introduces the eBPF profiling feature into the SkyWalking ecosystem. The figure below shows the overall architecture of SkyWalking Rover. SkyWalking Rover is currently supported in Kubernetes environments and must be deployed inside a Kubernetes cluster. After establishing a connection with the SkyWalking backend server, it saves information about the processes on the current machine to SkyWalking. When the user creates an eBPF profiling task via the user interface, SkyWalking Rover receives the task and executes it in the relevant C, C++, Golang, and Rust language-based programs.

Other than an eBPF-capable kernel, there are no additional prerequisites for deploying SkyWalking Rover.

CPU Profiling with Rover

CPU profiling is the most intuitive way to show service performance. Inspired by Brendan Gregg‘s blog post, we’ve divided CPU profiling into two types that we have implemented in Rover:

On-CPU Profiling: Where threads are spending time running on-CPU.
Off-CPU Profiling: Where time is spent waiting while blocked on I/O, locks, timers, paging/swapping, etc.

Profiling Envoy with eBPF

Envoy is a popular proxy, used as the data plane by the Istio service mesh. In a Kubernetes cluster, Istio injects Envoy into each service’s pod as a sidecar where it transparently intercepts and processes incoming and outgoing traffic. As the data plane, any performance issues in Envoy can affect all service traffic in the mesh. In this scenario, it’s more powerful to use eBPF profiling to analyze issues in production caused by service mesh configuration.

Demo Environment

If you want to see this scenario in action, we’ve built a demo environment where we deploy an Nginx service for stress testing. Traffic is intercepted by Envoy and forwarded to Nginx. The commands to install the whole environment can be accessed through GitHub.

On-CPU Profiling

On-CPU profiling is suitable for analyzing thread stacks when service CPU usage is high. If the stack is dumped more times, it means that the thread stack occupies more CPU resources.

When installing Istio using the demo configuration profile, we found there are two places where we can optimize performance:

Zipkin Tracing: Different Zipkin sampling percentages have a direct impact on QPS.
Access Log Format: Reducing the fields of the Envoy access log can improve QPS.

Zipkin Tracing

Zipkin with 100% sampling

In the default demo configuration profile, Envoy is using 100% sampling as default tracing policy. How does that impact the performance?

As shown in the figure below, using the on-CPU profiling, we found that it takes about 16% of the CPU overhead. At a fixed consumption of 2 CPUs, its QPS can reach 5.7K.

Disable Zipkin tracing

At this point, we found that if Zipkin is not necessary, the sampling percentage can be reduced or we can even disable tracing. Based on the Istio documentation, we can disable tracing when installing the service mesh using the following command:

istioctl install -y --set profile=demo \
   --set 'meshConfig.enableTracing=false' \
   --set 'meshConfig.defaultConfig.tracing.sampling=0.0'

After disabling tracing, we performed on-CPU profiling again. According to the figure below, we found that Zipkin has disappeared from the flame graph. With the same 2 CPU consumption as in the previous example, the QPS reached 9K, which is an almost 60% increase.

Tracing with Throughput

With the same CPU usage, we’ve discovered that Envoy performance greatly improves when the tracing feature is disabled. Of course, this requires us to make trade-offs between the number of samples Zipkin collects and the desired performance of Envoy (QPS).

The table below illustrates how different Zipkin sampling percentages under the same CPU usage affect QPS.

Zipkin sampling %	QPS	CPUs	Note
100% (default)	5.7K	2	16% used by Zipkin
1%	8.1K	2	0.3% used by Zipkin
disabled	9.2K	2	0% used by Zipkin

Access Log Format

Default Log Format

In the default demo configuration profile, the default Access Log format contains a lot of data. The flame graph below shows various functions involved in parsing the data such as request headers, response headers, and streaming the body.

Simplifying Access Log Format

Typically, we don’t need all the information in the access log, so we can often simplify it to get what we need. The following command simplifies the access log format to only display basic information:

istioctl install -y --set profile=demo \
   --set meshConfig.accessLogFormat="[%START_TIME%] \"%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%\" %RESPONSE_CODE%\n"

After simplifying the access log format, we found that the QPS increased from 5.7K to 5.9K. When executing the on-CPU profiling again, the CPU usage of log formatting dropped from 2.4% to 0.7%.

Simplifying the log format helped us to improve the performance.

Off-CPU Profiling

Off-CPU profiling is suitable for performance issues that are not caused by high CPU usage. For example, when there are too many threads in one service, using off-CPU profiling could reveal which threads spend more time context switching.

We provide data aggregation in two dimensions:

Switch count: The number of times a thread switches context. When the thread returns to the CPU, it completes one context switch. A thread stack with a higher switch count spends more time context switching.
Switch duration: The time it takes a thread to switch the context. A thread stack with a higher switch duration spends more time off-CPU.

Write Access Log

Enable Write

Using the same environment and settings as before in the on-CPU test, we performed off-CPU profiling. As shown below, we found that access log writes accounted for about 28% of the total context switches. The “__write” shown below also indicates that this method is the Linux kernel method.

Disable Write

SkyWalking implements Envoy’s Access Log Service (ALS) feature which allows us to send access logs to the SkyWalking Observability Analysis Platform (OAP) using the gRPC protocol. Even by disabling the access logging, we can still use ALS to capture/aggregate the logs. We’ve disabled writing to the access log using the following command:

istioctl install -y --set profile=demo --set meshConfig.accessLogFile=""

After disabling the Access Log feature, we performed the off-CPU profiling. File writing entries have disappeared as shown in the figure below. Envoy throughput also increased from 5.7K to 5.9K.

Conclusion

In this article, we’ve examined the insights Apache Skywalking’s Trace Profiling can give us and how much more can be achieved with eBPF profiling. All of these features are implemented in skywalking-rover. In addition to on- and off-CPU profiling, you will also find the following features:

Continuous profiling, helps you automatically profile without manual intervention. For example, when Rover detects that the CPU exceeds a configurable threshold, it automatically executes the on-CPU profiling task.
More profiling types to enrich usage scenarios, such as network, and memory profiling.