Link™ Capture Software
for Napatech FPGA-based SmartNICs
Link™ Capture Software is ideal for performing high-speed packet capture with nanosecond timestamping and replay with precise inter-frame gap control, which is critical when replaying captured traffic for troubleshooting or simulation of traffic flows.
The software supports a broad range of applications and use cases – and can immediately improve an organization’s ability to monitor and react to events that occur within its network infrastructure.
• Zero packet loss under all conditions
• Full throughput up to 100 Gbps bi-directional
• Nanosecond timestamping and packet merge
• 50 million flows with stateful match/action
• Flow records with metrics for both directions
• PCAP and DPDK API support
• Achieve complete network visibility and limit massive costs from cyberattacks or infrastructure issues
• Increase application performance from every server by offloading heavy workloads
• Reduce system costs by using fewer servers to achieve target performance
• Limit OPEX by cutting rack space, power, cooling and management
Multiplied Performance for 3rd Party Applications
Link™ Capture Software has been benchmarked across a wide range of third-party, commercial and open source networking and cybersecurity applications. Common to these is the unconditional requirement for line rate throughput for all packet sizes, with 100% lossless packet forwarding and capture, for a multitude of sessions, users and flows. With Link™ Capture Software, the performance improvements are outstanding, delivering more than triple the performance over servers with standard NIC configurations. This means a third of the required server resources to run the same application.
With guaranteed zero packet loss and deterministic performance under all conditions, Link™ Capture Software allows enterprises to develop and deploy their own applications based on low-cost servers.
Intrusion detection and prevention
Link™ Capture Software is uniquely suited for lossless acceleration of Suricata. Optimized to capture all traffic at full line rate with almost no CPU load on the host server, the solution demonstrates outstanding performance advantages.
Network security monitor
For network security monitors like Zeek, missing the slightest fraction of traffic is unacceptable. Link™ Capture Software provides 100% lossless packet forwarding and capture, ensuring complete traffic visibility for the application.
Network traffic recorder
An unconditional prerequisite for the n2disk™ network traffic recorder application to be successful is that all network packets are captured with zero loss. This is where Link™ Capture can help.
Intrusion detection and prevention
Snort is an ideal example of the type of security app that can achieve better performance with Link™ Capture. The tool is designed to keep up with line rates on commodity hardware, but this requires 100% packet capture.
Optimized for lossless transmit and receive, the Link™ Capture Software offers substantial performance advantages for TRex: up to 4x traffic transmit performance and 16x reception performance.
To decode all traffic, it is a fundamental requirement that Wireshark sees everything. If the capture server is overburdened, packets are discarded and information lost forever. Link™ Capture gives Wireshark a flawless vision.
Napatech Link™ Capture Software transfers data to your application using a non-blocking data delivery mechanism that ensures efficient utilization of the PCIe bus and maximum throughput for all packet sizes.
Link™ Capture Software for Napatech
Napatech Link™ Capture Software provides the Napatech SmartNIC family with a common feature set and driver software architecture allowing plug-and-play support for any SmartNIC combination. Key features are listed below.
Stateful Flow Management
Multi-port packet sequence
CPU Socket Load Balancer
The intelligent feature set offloads processing and analysis of Ethernet data from application software while ensuring optimal use of the standard server’s resources leading to effective application acceleration. See below for details.
Stateful Flow Management (NT200A02 only)
As network speeds continue to rise, the challenge for CPU-bound monitoring and security applications is keeping up with the massive volumes of traffic they need to process. Stateful flow management can help alleviate this challenge by enabling performance improvements at two levels:
– Reducing the load on the application by shunting irrelevant traffic flows
– Accelerating the application using per-flow match/action processing
As offloaded flows are processed entirely by the SmartNIC, stateful flow management enables applications to significantly increase their throughput and save valuable compute cycles. Per-flow match/action in hardware gives control back to the user, providing additional computation to the application by reducing the amount of data needed for processing, as certain flows or protocols no longer need monitoring and can be blocked in hardware.
For every packet, the stateful flow management feature can perform a lookup in the flow table and perform any needed updates, like changing timestamp, incrementing metrics, changing state, and writing the updated flow record back to the flow table. This significantly offloads CPU-bound applications, such as application/network monitoring apps, telecom subscriber analytics apps, or intrusion detection systems (IDS) with a similar requirement for forwarding or dropping packets based on a flow table lookup.
In this diagram, the stateful flow management feature is used to offload an application, in this instance a NetFlow generator. When the first packet arrives, it is not recognized by the flow management and the packet is therefore forwarded to the host application (slow path). The host application saves the needed meta data and metrics from the first packet and offloads the following flows to the hardware by configuring the flow management to shunt similar flows. If more packets arrive before the feature has been configured for flow shunting, these will also be forwarded to the application which will collect the meta data and handle the packet as configured.
Next, the flow is terminated through timeout or TCP flow termination, and a flow termination event is generated by the flow management and sent to the application. This record will contain information about the flow and the meta data collected. The application can then combine the meta data collected by the flow management with the meta data collected by the application, build the NetFlow record and send it off to an external flow collector.
Napatech FPGA SmartNICs are highly optimized to capture network traffic at full line-rate, with almost no CPU load on the host server, for all frame sizes. Zero-loss packet capture is critical for applications that need to analyze all the network traffic. If anything needs to be discarded, it is a matter of choice by the application, not a limitation of the SmartNIC.
Standard network interface cards (NICs) are not designed for analysis applications where all traffic on a connection or link needs to be analyzed. NICs are designed for communication where data that is not addressed to the sender or receiver is simply discarded. This means that NICs are not designed to have the capacity to handle the amount of data that is regularly transmitted in bursts on Ethernet connections. In these burst situations, all of the bandwidth of a connection is used, requiring the capacity to analyze all Ethernet frames. Napatech FPGA SmartNICs are designed specifically for this task and provide the maximum theoretical packet capture capacity.
Napatech FPGA SmartNICs provide on-board memory for buffering of Ethernet frames. Buffering assures guaranteed delivery of data, even when there is congestion in the delivery of data to the application. There are three potential sources of congestion: the PCI interface, the server platform, and the analysis application.
PCI interfaces provide a fixed bandwidth for transfer of data from the SmartNIC to the application. This limits the amount of data that can be continuously transferred from the network to the application. For example, a 16-lane PCIe Gen3 interface can transfer up to 115 Gbps of data to the application. If the network speed is 2×100 Gbps, a burst of data cannot be transferred over the PCIe Gen3 interface in real time, since the data rate is twice the maximum PCIe bandwidth. In this case, the onboard packet buffering on the Napatech SmartNIC can absorb the burst and ensure that none of the data is lost, allowing the frames to be transferred once the burst has passed.
Servers and applications can be configured in such a way that congestion can occur in the server infrastructure or in the application itself. The CPU cores can be busy processing or retrieving data from remote caches and memory locations, which means that new Ethernet frames cannot be transferred from the SmartNIC.
In addition, the application can be configured with only one or a few processing threads, which can result in the application being overloaded, meaning that new Ethernet frames cannot be transferred. With onboard packet buffering, the Ethernet frames can be delayed until the server or the application is ready to accept them. This ensures that no Ethernet frames are lost and that all the data is made available for analysis when needed.
Modern servers provide unprecedented processing power with multi-core CPU implementations. This makes standard servers an ideal platform for appliance development. But, to fully harness the processing power of modern servers, it is important that the analysis application is multi-threaded and that the right Ethernet frames are provided to the right CPU core for processing. Not only that, but the frames must be provided at the right time to ensure that analysis can be performed in real time.
Napatech Multi-CPU distribution is built and optimized from our extensive knowledge of server architecture, as well as real life experience from our customers.
Napatech FPGA SmartNICs ensure that identified flows of related Ethernet frames are distributed in an optimal way to the available CPU cores. This ensures that the processing load is balanced across the available processing resources, and that the right frames are being processed by the right CPU cores.
With flow distribution to multiple CPU cores, the throughput performance of the analysis application can be increased linearly with the number of cores, up to 128. Not only that, but the performance can also be scaled by faster processing cores. This highly flexible mechanism enables many different ways of designing a solution and provides the ability to optimize for cost and/or performance.
Napatech FPGA SmartNICs support different distribution schemes that are fully configurable:
• Distribution per port: all frames captured on a physical port are transferred to the same CPU or a range of CPU cores for processing
• Distribution per traffic type: frames of the same protocol type are transferred to the same CPU or a range of CPU cores for processing
• Distribution by flows: frames with the same hash value are sent to the same CPU or a range of CPU cores for processing
• Combinations of the above
The ability to establish the precise time when frames have been captured is critical to many applications.
To achieve this, all Napatech FPGA SmartNICs are capable of providing a high-precision time stamp, sampled with 1 nanosecond resolution, for every frame captured and transmitted.
At 10 Gbps, an Ethernet frame can be received and transmitted every 67 nanoseconds. At 100 Gbps, this time is reduced to 6.7 nanoseconds. This makes nanosecond-precision time-stamping essential for uniquely identifying when a frame is received. This incredible precision also enables you to sequence and merge frames from multiple ports on multiple FPGA SmartNICs into a single, time-ordered analysis stream.
In order to work smoothly in the different operating systems supported, Napatech FPGA SmartNICs support a range of industry standard time stamp formats, and also offer a choice of resolution to suit different types of applications.
64-bit time stamp formats:
• 2 Windows formats with 10-ns or 100-ns resolution
• Native UNIX format with 10-ns resolution
• 2 PCAP formats with 1-ns or 1000-ns resolution
Napatech FPGA SmartNICs use a buffering strategy that allocates a number of large memory buffers where as many packets as possible are placed back-to-back in each buffer. Using this implementation, only the first access to a packet in the buffer is affected by the access time to external memory. Thanks to cache pre-fetch, the subsequent packets are already in the level 1 cache before the CPU needs them. As hundreds or even thousands of packets can be placed in a buffer, a very high CPU cache performance can be achieved leading to application acceleration.
Buffer configuration can have a dramatic effect on the performance of analysis applications. Different applications have different requirements when it comes to latency or processing. It is therefore extremely important that the number and size of buffers can be optimized for the given application. Napatech FPGA SmartNICs make this possible.
The flexible server buffer structure supported by Napatech FPGA SmartNICs can be optimized for different application requirements. For example, applications needing short latency can have frames delivered in small chunks, optionally with a fixed maximum latency. Applications without latency requirements can benefit data delivered in large chunks, providing more effective server CPU processing by having the data. Applications that need to correlate information distributed across packets can configure larger server buffers (up to 128 GB).
Up to 128 buffers can be configured and combined with Napatech multi-CPU distribution (see “Multi-CPU distribution”).
Multi-port packet sequence
Napatech FPGA SmartNICs typically provide multiple ports. Ports are usually paired, with one port receiving upstream packets and another port receiving downstream packets. Since these two flows going in different directions need to be analyzed as one, packets from both ports must be merged into a single analysis stream. Napatech FPGA SmartNICs can sequence and merge packets received on multiple ports in hardware using the precise time stamps of each Ethernet frame. This is highly efficient and offloads a significant and costly task from the analysis application.
There is a growing need for analysis appliances that are able to monitor and analyze multiple points in the network, and even provide a network-wide view of what is happening. Not only does this require multiple FPGA SmartNICs to be installed in a single appliance, but it also requires that the analysis data from all ports on every accelerator be correlated.
With the Napatech Software Suite, it is possible to sequence and merge the analysis data from multiple FPGA SmartNICs into a single analysis stream. The merging is based on the nanosecond precision time stamps of each Ethernet frame, allowing a time-ordered merge of individual data streams.
In mobile networks, all subscriber Internet traffic is carried in GTP (GPRS Tunneling Protocol) or IP-in-IP tunnels between nodes in the mobile core. IP-in-IP tunnels are also used in enterprise networks. Monitoring traffic over interfaces between these nodes is crucial for assuring Quality of Service (QoS).
Napatech FPGA SmartNICs decode these tunnels, providing the ability to correlate and load balance based on flows inside the tunnels. Analysis applications can use this capability to test, secure, and optimize mobile networks and services. To effectively analyze the multiple services associated with each subscriber, it is important to separate them and analyze each one individually. Napatech FPGA SmartNICs have the capability to identify the contents of tunnels, allowing for analysis of each service used by a subscriber. This quickly provides the needed information to the application, and allows for efficient analysis of network and application traffic. The Napatech features for frame classification, flow identification, filtering, coloring, slicing, and intelligent multi-CPU distribution can thus be applied to the contents of the tunnel rather than the tunnel itself, leading to a more balanced processing and a more efficient analysis.
GTP and IP-in-IP tunneling are powerful features for telecom equipment vendors who need to build mobile network monitoring products. With this feature, Napatech can off-load and accelerate data analysis, allowing customers to focus on optimizing the application, and thereby maximizing the processing resources in standard servers.
IP fragmentation occurs when larger Ethernet frames need to be broken into several fragments in order to be transmitted across the network. This can be due to limitations in certain parts of the network, typically when GTP tunneling protocols are used. Fragmented frames are a challenge for analysis applications, as all fragments must be identified and potentially reassembled before analysis can be performed. Napatech FPGA SmartNICs can identify fragments of the same frame and ensure that these are associated and sent to the same CPU core for processing. This significantly reduces the processing burden for analysis applications.
For network security purposes, different traffic scenarios need to be recreated and simulated to toughen the infrastructure. The packets also need to be replayed to understand delays and disruptions caused by traffic bursts/peaks to improve Quality of Service (QoS). With Napatech FPGA SmartNICs, it is easy to setup and specify the test scenario to replay the same PCAP files from real network events at 10G, 40G and 100G link speeds.
Get highest precision timestamping for traffic that needs to be redistributed to multiple network devices. Napatech FPGA SmartNICs systems can forward and/or split traffic captured on a single tapping point to a cluster of servers for processing, without using additional equipment. This is achieved by the Napatech FPGA SmartNICs acting as both Smart Taps and packet capture devices and is apt for multi-box solutions with single tapping points. This feature eliminates the need to implement expensive SmartTaps, time stamping switches, packet brokers and other time sync components.
Access control and authentication solutions can now implement full line rate solutions, that can cope with small packets, with a SmartNIC that does robust packet delivery at high network loads. Session control propels traffic in and out of the SmartNIC, at low latency (<5us), while simultaneously copying a subset to the host CPU for analysis. With the session control feature, inline use cases can benefit from low latency at speeds 1-100G.
The Napatech SmartNIC family supports 100 Gbps in-line applications enabling customers to create powerful, yet flexible in-line solutions on standard servers. The more CPU-demanding the application is, and the higher the speeds of links, the higher the value of this solution. Features include:
• Full throughput bidirectional Rx/Tx up to 100G link speed for any packet size
• Multi-core processing support with up to 128 Rx/Tx streams per SmartNIC
• Customizable hash-based load distribution
• Efficient zero copy roundtrip from Rx to Tx
• Single bit flip selection to discard or forward each individual packet
• Typical 50 us roundtrip latency from Rx to Tx fiber
CPU Socket Load Balancer
Further enhance your CPU utilization with the CPU Socket Load Balancer capability offered by Napatech NT40E3 FPGA SmartNICs. Improve CPU performance by up to 30% per server for 4x10G analysis with Napatech FPGA SmartNICs that can efficiently distribute traffic to 2 CPU sockets, making the packets available to multiple analysis threads on both CPU sockets, simultaneously. This frees up CPU resources needed for copying data between the two sockets and eliminates the need for expensive QPI bus transfers.
The Napatech correlation key makes it possible to identify and trace the packet propagation through the entire network. The feature adds a unique ‘fingerprint’ to each packet and performs intelligent comparison, taking into consideration potential changes in the header information and checksums.
The correlation key is extremely valuable for applications that need insight into packet latency, timing or route for a variety of purposes, e.g. to analyze and remedy Quality of Experience (QoE) issues caused by peaks or bursts. With this feature, network service providers can perform powerful and cost-effective measurements of QoE characteristics in real time, at all link speeds up to 100G. It enables configuration of up to 16 different conditions depending on traffic category and can be based on packet type or IP address as needed by the application.
The correlation key is also extremely useful for deduplication, e.g. to offload applications by efficiently identifying and discarding duplicate packets.
Whether you are in the business of application performance monitoring, network monitoring, telecom subscriber analytics or network recording, incorrect switch span port configuration is common, which can result in up to 50% duplicate packets. Duplicates can cause a lot of issues. The obvious issue is that double the amount of data requires double the amount of processing power, memory, power, etc. However, the main issue is false positives: errors that are not really errors or threats that are not really threats. Debugging these issues takes a lot of time.
With deduplication built in via a SmartNIC in the appliance, it is possible to detect up to 99.99% of duplicate packets produced by SPAN ports. By filtering out irrelevant packets and discarding redundant data, you benefit by offloading the application as well as saving valuable disk space. Similar functionality is available on packet brokers, but for a sizeable extra license fee. On Napatech SmartNICs, this is just one of several powerful features delivered at no extra charge. Napatech SmartNICs provide the following deduplication features:
• Deduplication in hardware up to 2x100G
• Deduplication key is calculated as a hash over configurable sections of the frame
• Dynamic header information (e.g. TTL) can be masked out from the key calculation
• Enable/disable deduplication per network port or per network port group
• Configurable action per port group: Discard or pass duplicates
• Duplicate counters per port group
• Configurable deduplication window from 10 microseconds to 2 seconds
Compatible Napatech FPGA-based SmartNICs
The Link™ Capture Software is available for our family of FPGA-based SmartNICs.
Link™ NT200A02 SmartNIC
8x10G, 2×10/25G, 2x40G, 2x100G
The Link™ NT200A02 SmartNIC is based on Xilinx’s powerful UltraScale+ VU5P FPGA architecture and enables 8x10G, 2×10/25G, 2x40G or 2x100G applications. The QSFP28 form factor offers flexibility to create high-performance solutions in 1U server platforms for existing 40G network infrastructures, with the freedom to repurpose the solution for 100G installations when necessary. Also available in NEBS variants.
Link™ NT40E3 SmartNIC
The Link™ NT40E3 SmartNIC provides full packet capture and analysis of Ethernet LAN at 40 Gbps with zero packet loss for all frame sizes. Intelligent features accelerate application performance with extremely low CPU load. Flexible time synchronization support is included with a dedicated PPS/PTP port. Also available in a NEBS level 3 compliant variant.
Link™ NT20E3 SmartNIC
The Link™ NT20E3 SmartNIC provides full packet capture and analysis of Ethernet LAN at 20 Gbps with zero packet loss for all frame sizes. Intelligent features accelerate application performance with extremely low CPU load. Flexible time synchronization support is included with a dedicated PTP port. Also available in a NEBS level 3 compliant variant.
Link™ NT40A01 SmartNIC
The Link™ NT40A01 SmartNIC provide full packet capture and analysis of network data at 4 Gbps with zero packet loss. The Napatech SmartNIC will capture all frames, including erroneous frames normally discarded by standard NICs. Also available in a NEBS level 3 compliant variant.
|FEATURES||Link™ Capture Software for Napatech FPGA SmartNICs|
|Rx Packet Processing||• Line rate Rx up to 100 Gbps for packet size 64 – 10,000 bytes Zero packet loss|
• HW time stamping with 1 ns resolution
• Multi-port packet merge sequenced in time stamp order
|L2, L3 and L4 protocol classification||• L2: Ether II, IEEE 802.3 LLC, IEEE 802.3/802.2 SNAP|
• L2: PPPoE Discovery, PPPoE Session, Raw Novell
• L2: ISL, 3x VLAN, 7x MPLS
• L3: IPv4, IPv6
• L4: TCP, UDP, ICMP, SCTP
|Tunneling support||GTP, IP-in-IP, GRE, NVGRE, VxLAN, Pseudowire|
|Filter match conditions||• Network port, protocol, length check and error condition filters|
• Configurable flow definitions, based on 2, 3, 4 or 5-tupple
• Up to 36,000 IPv4 or up to 7,500 IPv6 2-tupple flows
|Filter actions||• Drop|
• Forward to port
• Forward to specific host Rx queue
• Load distribute over host Rx queues
• Select packet descriptor type
• Optional flow ID in packet descriptor
|Hash keys||• Custom 2 x 128 bits and 2 x 32 bits with separate bit masks|
• Symmetric hash keys
• Protocol field from inner or outer headers
|CPU load distribution||Hash key and filter-based|
|Packet descriptors||• PCAP and Napatech descriptor formats|
• Time stamp and network port ID
• Header offsets
• Hash key
• Correlation key 64 bit maskable fields (packet finger print)
• Protocol and error informationIP fragment handling
• First level
|IP fragmentation||• Filter actions on inner header fields applied to all fragments|
|Slicing||Slicing at dynamic offset or fixed offset from start or end of packet|
|Tx Packet Processing||• Line rate Tx up to 100 Gbps for packet size 64 – 10,000 bytes|
• Replay as captured with nanoseconds precision
• Per port traffic shaping
• Port to any port forwarding
|Rx burst buffer capacity||• NT20E3-2, NT40E3-4, NT40A01: 4GB|
• NT100E3-1: 8GB
• NT200A02: 12GB
|Host Buffers and Queues||• Rx queues: 128|
• Tx queues: 128
• Rx buffer size: 16 MB – 1 TB
• Tx buffer size: 4 MB
|Advanced Statistics||• Extended RMON1 per port|
• Packets and bytes per filter/color
• Packets and bytes per stream/queue
|Time Synchronization||• OS time|
• IEEE 1588-2008 PTP V2
• NT-TS synchronization between Napatech SmartNICs
|Time stamp formats||Unix 10 ns, Unix 1 ns, PCAP 1 us PCAP 1 ns|
|Monitoring sensors||• PCB temperature level with alarm|
• FPGA temperature level with alarm and automatic shutdown
• Temperature of critical components
• Individual optical port temperature or light level with alar
• Voltage or current overrange with alarm
• Cooling fan speed with alarm
|Supported OS||• Linux kernel 3.0 through 3.19 64-bit|
• Linux kernel 4.3 through 4.18 64-bit
• Windows Server 2016 64-bit and Server 2019 64-bit
|Supported API’s||• PCAP v. 1.8.1 and WinPcap 4.1.3|
• DPDK v. 18.08
• NTAPI (Napatech API)
|Supported Hardware and Transceivers||NT200A02:|
• 8x 10 Gbps: QSFP+ breakout to 10GBASE-SR, 10GBASE-CR
• 2x 40 Gbps: QSFP+ 40GBASE-SR4, QSFP+ 40GBASE-CR4, 40GBASE-LR4, 40GBASE-BiDi
• 2x 100 Gbps: QSFP28 100GBASE-SR4, 100GBASE-LR4
• 4x 1 Gbps: SFP 100/1000BASE-T, 1000BASE-T, 1000BASE-SX, 1000BASE-LX, 1000BASE-ZX
• 4x 10 Gbps: SFP+ 10GBASE-SR, 10GBASE-CR, 10GBASE-LR, 10GBASE-ER
• 4x 1/10 Gbps: SFP+ 1000BASE-SX/10GBASE-SR, 1000BASE-LX/10GBASE-LR
• 2x 1 Gbps: SFP 100/1000BASE-T, 1000BASE-T, 1000BASE-SX, 1000BASE-LX, 1000BASE-ZX
• 2x 10 Gbps: SFP+ 10GBASE-SR, 10GBASE-CR, 10GBASE-LR, 10GBASE-ER
• 2x 1/10 Gbps: SFP+ 1000BASE-SX/10GBASE-SR, 1000BASE-LX/10GBASE-LR
• 4x 1 Gbps: SFP 100/1000BASE-T, 1000BASE-T, 1000BASE-SX, 1000BASE-LX, 1000BASE-ZX
Resources and downloads
Product Brief and Feature Overview:
Application Acceleration Benchmarks: