for Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA
Napatech’s Link-Capture™ Software is now available for the Intel® Programmable Acceleration Card with Intel Arria® 10 GX FPGA. With this solution, it is possible to build high-performance platforms based on low-cost, off-the-shelf servers.
While standard Network Interface Cards (NICs) repeatedly suffer from intolerable packet loss for demanding cybersecurity and networking applications, Napatech guarantees line rate throughput with zero packet loss for all packet sizes, which is essential for reliable network performance and security analysis.
For more information on the Intel® Programmable Acceleration Card with Intel Arria® 10 GX FPGA, click here.
The hardware solution, server and Intel® PAC Arria® 10 GX FPGA can be purchased through Dell, Fujitsu and HP.
Watch video: Improving Performance and Reducing CPU Utilization For Server Based Applications and Services.
Presented by Napatech CMO Jarrod J.S. Siket.
• Increased application performance from every server
• Reduced system costs using fewer servers to achieve target performance
• Reduced operational costs of rack space, power, cooling and management
• Reduced time-consumption on complex tasks due to the additional computing power
• Full network visibility due to guaranteed lossless packet capture and forwarding
Watch video: 100% Lossless Packet Capture Solution by Intel.
The solution has been benchmarked across a wide range of third-party, commercial and open source networking and cybersecurity applications, delivering more than triple the performance over servers with standard NIC configurations. This means a third of the required server resources to run the same application.
Examples of leading security applications that greatly benefit are listed below. For these applications, it is imperative to have all data available as even a single packet lost could represent a blind spot for the security team. Napatech Link-Capture™ Software provides complete network visibility, ensuring that no traffic goes unnoticed.
4x Suricata boost
Intrusion Detection System
The combined Napatech and Intel solution is uniquely suited for lossless acceleration of Suricata. Optimized to capture all network traffic at full line rate, with almost no CPU load on the host server, the solution demonstrates outstanding performance advantages for Suricata.
Network security monitor
For network security monitors like Zeek, missing the slightest fraction of traffic is unacceptable. Combined, the Napatech software and Intel hardware provide 100% lossless packet forwarding and capture, ensuring complete traffic visibility for the application.
BENCHMARK COMING SOON
Intrusion Detection System
Snort is an ideal example of the type of enterprise security application that can achieve better performance with the Link-Capture™ Software. Snort is designed to keep up with network line rates on commodity hardware, but this requires that all traffic can be reliably captured.
BENCHMARK COMING SOON
3x n2disk boost
As capable as n2disk™ is at recording network traffic, it will only be as effective as its implementation. An unconditional prerequisite for n2disk™ to be successful is that all network packets are captured with zero loss. This is where Link-Capture™ Software can help.
Boosting Network Test & Measurement
The solution provides efficient support for traffic generator and analyzer applications such as TRex and Wireshark. The exact capture and replay capabilities delivered by the Napatech Link-Capture™ Software are essential for performing fully reliable network tests and troubleshooting, enabling optimal quality of service and avoiding network and equipment overload.
4x TRex boost
Optimized for lossless transmit and receive, the Link-Capture™ Software on Intel PAC offers substantial performance advantages for TRex: 2x traffic generation performance and 4x traffic reception performance.
7x Wireshark boost
To decode all traffic, it is a fundamental requirement that Wireshark “sees everything”. If the capture server is overburdened, packets are discarded and information lost forever. Link-Capture™ Software changes the game.
The Napatech Link-Capture™ Software for Intel® PAC also delivers measurable, repeatable and outstanding benchmark results to leading third-party, commercial and home-grown applications, demonstrating as much as 60% throughput improvement.
Third-party, commercial, home-grown
The Napatech Link-Capture™ Software ensures that data is transferred from the Intel® PAC to the application using a non-blocking data delivery mechanism that ensures efficient utilization of the PCIe bus and maximum throughput for all packet sizes.
Link-Capture™ for Intel® key features
1x100G Solution Features
Full line-rate packet capture
Multi-port packet sequence
Multi-port packet sequence and merge
Napatech FPGA SmartNICs typically provide multiple ports. Ports are usually paired, with one port receiving upstream packets and another port receiving downstream packets. Since these two flows going in different directions need to be analyzed as one, packets from both ports must be merged into a single analysis stream. Napatech FPGA SmartNICs can sequence and merge packets received on multiple ports in hardware using the precise time stamps of each Ethernet frame. This is highly efficient and offloads a significant and costly task from the analysis application.
There is a growing need for analysis appliances that are able to monitor and analyze multiple points in the network, and even provide a network-wide view of what is happening. Not only does this require multiple accelerators to be installed in a single appliance, but it also requires that the analysis data from all ports on every accelerator be correlated.
With the Napatech Software Suite, it is possible to sequence and merge the analysis data from multiple accelerators into a single analysis stream. The merging is based on the nanosecond precision time stamps of each Ethernet frame, allowing a time-ordered merge of individual data streams.
Intelligent Multi-CPU distribution
Modern servers provide unprecedented processing power with multi-core CPU implementations. This makes standard servers an ideal platform for appliance development. But, to fully harness the processing power of modern servers, it is important that the analysis application is multi-threaded and that the right Ethernet frames are provided to the right CPU core for processing. Not only that, but the frames must be provided at the right time to ensure that analysis can be performed in real time.
Napatech Multi-CPU distribution is built and optimized from our extensive knowledge of server architecture, as well as real life experience from our customers.
Napatech FPGA SmartNICs ensure that identified flows of related Ethernet frames are distributed in an optimal way to the available CPU cores. This ensures that the processing load is balanced across the available processing resources, and that the right frames are being processed by the right CPU cores.
With flow distribution to multiple CPU cores, the throughput performance of the analysis application can be increased linearly with the number of cores, up to 128. Not only that, but the performance can also be scaled by faster processing cores. This highly flexible mechanism enables many different ways of designing a solution and provides the ability to optimize for cost and/or performance.
Napatech FPGA SmartNICs support different distribution schemes that are fully configurable:
- Distribution per port: all frames captured on a physical port are transferred to the same CPU or a range of CPU cores for processing
- Distribution per traffic type: frames of the same protocol type are transferred to the same CPU or a range of CPU cores for processing
- Distribution by flows: frames with the same hash value are sent to the same CPU or a range of CPU cores for processing
- Combinations of the above
Hardware Time Stamp
The ability to establish the precise time when frames have been captured is critical to many applications.
To achieve this, all Napatech FPGA SmartNICs are capable of providing a high-precision time stamp, sampled with 1 nanosecond resolution, for every frame captured and transmitted.
At 10 Gbps, an Ethernet frame can be received and transmitted every 67 nanoseconds. At 100 Gbps, this time is reduced to 6.7 nanoseconds. This makes nanosecond-precision time-stamping essential for uniquely identifying when a frame is received. This incredible precision also enables you to sequence and merge frames from multiple ports on multiple accelerators into a single, time-ordered analysis stream.
In order to work smoothly in the different operating systems supported, Napatech FPGA SmartNICs support a range of industry standard time stamp formats, and also offer a choice of resolution to suit different types of applications.
64-bit time stamp formats:
- 2 Windows formats with 10-ns or 100-ns resolution
- Native UNIX format with 10-ns resolution
- 2 PCAP formats with 1-ns or 1000-ns resolution
Optimum Cache Utilization
Napatech FPGA SmartNICs use a buffering strategy that allocates a number of large memory buffers where as many packets as possible are placed back-to-back in each buffer. Using this implementation, only the first access to a packet in the buffer is affected by the access time to external memory. Thanks to cache pre-fetch, the subsequent packets are already in the level 1 cache before the CPU needs them. As hundreds or even thousands of packets can be placed in a buffer, a very high CPU cache performance can be achieved leading to application acceleration.
Buffer configuration can have a dramatic effect on the performance of analysis applications. Different applications have different requirements when it comes to latency or processing. It is therefore extremely important that the number and size of buffers can be optimized for the given application. Napatech FPGA SmartNICs make this possible.
The flexible server buffer structure supported by Napatech FPGA SmartNICs can be optimized for different application requirements. For example, applications needing short latency can have frames delivered in small chunks, optionally with a fixed maximum latency. Applications without latency requirements can benefit data delivered in large chunks, providing more effective server CPU processing by having the data. Applications that need to correlate information distributed across packets can configure larger server buffers (up to 128 GB).
Up to 128 buffers can be configured and combined with Napatech multi-CPU distribution (see “Multi-CPU distribution”).
On-Board Packet Buffering
Napatech FPGA SmartNICs provide on-board memory for buffering of Ethernet frames. Buffering assures guaranteed delivery of data, even when there is congestion in the delivery of data to the application. There are three potential sources of congestion: the PCI interface, the server platform, and the analysis application.
PCI interfaces provide a fixed bandwidth for transfer of data from the accelerator to the application. This limits the amount of data that can be continuously transferred from the network to the application. For example, a 16-lane PCIe Gen3 interface can transfer up to 115 Gbps of data to the application. If the network speed is 2×100 Gbps, a burst of data cannot be transferred over the PCIe Gen3 interface in real time, since the data rate is twice the maximum PCIe bandwidth. In this case, the onboard packet buffering on the Napatech accelerator can absorb the burst and ensure that none of the data is lost, allowing the frames to be transferred once the burst has passed.
Servers and applications can be configured in such a way that congestion can occur in the server infrastructure or in the application itself. The CPU cores can be busy processing or retrieving data from remote caches and memory locations, which means that new Ethernet frames cannot be transferred from the accelerator.
In addition, the application can be configured with only one or a few processing threads, which can result in the application being overloaded, meaning that new Ethernet frames cannot be transferred. With onboard packet buffering, the Ethernet frames can be delayed until the server or the application is ready to accept them. This ensures that no Ethernet frames are lost and that all the data is made available for analysis when needed.
In mobile networks, all subscriber Internet traffic is carried in GTP (GPRS Tunneling Protocol) or IP-in-IP tunnels between nodes in the mobile core. IP-in-IP tunnels are also used in enterprise networks. Monitoring traffic over interfaces between these nodes is crucial for assuring Quality of Service (QoS).
Napatech FPGA SmartNICs decode these tunnels, providing the ability to correlate and load balance based on flows inside the tunnels. Analysis applications can use this capability to test, secure, and optimize mobile networks and services. To effectively analyze the multiple services associated with each subscriber, it is important to separate them and analyze each one individually. Napatech FPGA SmartNICs have the capability to identify the contents of tunnels, allowing for analysis of each service used by a subscriber. This quickly provides the needed information to the application, and allows for efficient analysis of network and application traffic. The Napatech features for frame classification, flow identification, filtering, coloring, slicing, and intelligent multi-CPU distribution can thus be applied to the contents of the tunnel rather than the tunnel itself, leading to a more balanced processing and a more efficient analysis.
GTP and IP-in-IP tunneling are powerful features for telecom equipment vendors who need to build mobile network monitoring products. With this feature, Napatech can off-load and accelerate data analysis, allowing customers to focus on optimizing the application, and thereby maximizing the processing resources in standard servers.
IP fragment handling
For network security purposes, different traffic scenarios need to be recreated and simulated to toughen the infrastructure. The packets also need to be replayed to understand delays and disruptions caused by traffic bursts/peaks to improve Quality of Service (QoS). With Napatech FPGA SmartNICs, it is easy to setup and specify the test scenario to replay the same PCAP files from real network events at 10G, 40G and 100G link speeds.
Get highest precision timestamping for traffic that needs to be redistributed to multiple network devices. Napatech FPGA SmartNICs systems can forward and/or split traffic captured on a single tapping point to a cluster of servers for processing, without using additional equipment. This is achieved by the Napatech FPGA SmartNICs acting as both Smart Taps and packet capture devices and is apt for multi-box solutions with single tapping points. This feature eliminates the need to implement expensive SmartTaps, time stamping switches, packet brokers and other time sync components.
Access control and authentication solutions can now implement full line rate solutions, that can cope with small packets, with a SmartNIC that does robust packet delivery at high network loads. Session control propels traffic in and out of the SmartNIC, at low latency (<5us), while simultaneously copying a subset to the host CPU for analysis. With the session control feature, inline use cases can benefit from low latency at speeds 1-100G.
With Napatech Link-Capture™ Software it is possible to generate acorrelation key that can be used to monitor individual packets at multiple points in the network. The correlation keyis a unique identifier for individual packets and can be used as an alternative to IP source and destination addresses for cases where network address translation can change IP addresses in the network being monitored. With correlation keys it is possible to measure the latency at multiple points in the network on a packet-by-packet basis. The correlation key can also be used for hardware acceleration of packet deduplication in application software. The 64-bit correlation key is generated in hardware and delivered to the application in the packet descriptor. The correlation key is calculated as a hash over configurable sections of the packet, and dynamic header information (e.g.TTL) can be masked out.
|TECH SPECS||Link-Capture™ Software|
|Network Port Support||Link speeds:
• 1x 40 Gbps
• 4x 10 Gbps
|Pluggable modules||• QSFP+ 40GBASE-LR4
• QSFP+ 40GBASE-SR4
• QSFP+ 40GBASE-CR4
• QSFP+ 40GBASE-BiDi
• QSFP+ breakout to 4x 10GBASE-SR
• QSFP+ breakout to 4x 10GBASE-CR
|Performance||• Line rate Rx 40 Gbps for packet size 64 – 10.000 bytes, zero packet loss
• Line rate Tx 40 Gbps for packet size 64 – 10.000 bytes
• Rx burst buffer capacity: 600 ms at 40 Gbps
|Host Buffers and Queues||• Rx queues: 64
• Tx queues: 128
• Rx buffer size: 1 MB – 1 TB
• Tx buffer size: 4 MB
|Rx Packet Processing||• HW time stamping with 1 ns resolution
• Multi-port packet merge sequenced in time stamp order
• L2, L3 and L4 protocol classification:
– L2: Ether II, IEEE 802.3 LLC, IEEE 802.3/802.2 SNAP
– L2: PPPoE Discovery, PPPoE Session, Raw Novell
– L2: ISL, 3x VLAN, 7x MPLS
– L3: IPv4, IPv6
– L4: TCP, UDP, ICMP, SCTP
• Tunneling support: GTP, IP-in-IP, GRE, NVGRE, VxLAN, Pseudowire
• Filter match conditions:
– Network port, protocol, length check and error condition filters
– Configurable flow definitions, based on 2, 3, 4 or 5-tupple
– Up to 36000 IPv4 or up to 7500 IPv6 2-tupple flows
• Filter actions:
– Forward to port
– Forward to specific host Rx queue
– Load distribute over host Rx queues
– Select packet descriptor type
– Optional flow ID in packet descriptor
• Hash keys:
– Custom 2 x 128 bits and 2 x 32 bits with separate bit masks
– Symmetric hash keys
– Protocol field from inner or outer headers
• CPU load distribution: Hash key and filter-based
• Packet descriptors:
– PCAP and Napatech descriptor formats
– Time stamp and network port ID
– Header offsets
– Hash key
– Correlation key
– Protocol and error information
• IP fragment handling:
– First level IP fragmentation
– Filter actions on inner header fields applied to all fragments
• Correlation key (packet finger print)
• Slicing at dynamic offset or fixed offset from start or end of packet
|Tx Packet Processing||• Replay as captured with nanoseconds precision
• Per port traffic shaping
• Port to any port forwarding
|Advanced Statistics||• Extended RMON1 per port
• Packets and bytes per filter/color
• Packets and bytes per stream/queue
|Time Precision||• OS time synchronization
• Time stamp formats: Unix 10 ns, Unix 1 ns, PCAP 1 us, PCAP 1 ns
|Monitoring sensors||• FPGA temperature level with alarm and software shutdown|
|Supported OS||• Linux kernel 3.10 through 4.7Supported API’s|
|Supported API’s||• PCAP v. 1.8.1
• DPDK v. 18.08
• NTAPI (Napatech API)
|Supported Hardware||• Intel Programmable Accelerator Card A10 GX|
Resources and downloads