Skip to content
Amit Cohen edited this page Dec 30, 2024 · 125 revisions
Table of Contents
  1. Motivation
  2. mlxsw
    1. Features by Version
  3. Reporting Issues

Motivation

switchdev is an infrastructure in the Linux kernel which facilitates the offloading of the kernel's forwarding plane to capable ASICs.

switchdev allows users and developers to utilize current ASICs by using a standardized and well-known API exposed by the Linux kernel instead of relying on proprietary APIs implemented in binary user space blobs.

By using the Linux kernel to configure the hardware, users can use the same familiar tools to configure both their servers and switches. The sole difference would be the performance gained by offloading the kernel's forwarding plane to the switch's ASIC.

switch$ ip link add name br0 type bridge
switch$ ip link set dev br0 type bridge vlan_filtering 1
switch$ ip link set dev sw1p1 master br0
switch$ ip link set dev sw1p2 master br0
switch$ ip link set dev br0 up

hostA$ iperf –s –i1
hostB$ iperf -c hostA -i1 -P 8
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
...
[SUM]  1.0- 2.0 sec  10.5 GBytes  90.6 Gbits/sec

For more information about switchdev please refer to the kernel's Switchdev documentation.

mlxsw

Mellanox Technologies is the first hardware vendor to use the switchdev API to offload the kernel's forwarding plane to a real ASIC. Mellanox's/Nvidia's current switchdev-based solution is focused on Spectrum ASICs, supported generations are detailed below.

This is achieved by using an upstream driver in the Linux kernel. A user can simply buy a switch, install Linux on it like any other server and benefit from the underlying hardware.

Features by Version

Kernel Version
4.3 SwitchX-2 driver submission. Slow path only. Not in active development
4.4 Spectrum driver submission. VLAN-aware bridge offload
4.5 LAG and VLAN-unaware bridges
4.6 devlink infrastructure and port splitter
4.7 Quality of Service: DCB and shared buffers
4.8 IPv4 unicast router, port mirroring, extended ethtool statistics
4.9 Extended ethtool support, HW stats query via iproute
4.10 SwitchIB support, I2C support, CPU policer
4.11 tc-flower offload, enhanced L3 offload, packet sampling
4.12 tc-vlan offload, VRFs, ACL activity and stats dumping, OVS offload
4.13 Match on TCP flags, firmware flashing, trap action, port module info
4.14 IPv6 unicast router, match on IP TTL and TOS, tc-multichain offload, GRE tunnels
4.15 IPv4 multicast router, IPv4 non-equal-cost multi-path, multi-path hash policy, RED queueing discipline
4.16 IPv6 non-equal-cost multi-path, PRIO scheduler, flow based mirroring
4.17 RED as a child of PRIO, IPv6 multicast router, ERSPAN, Physical ports in VLAN-unaware bridges
4.18 RSPAN, ERSPAN mirroring with bridge, VLAN or LAG in underlay
4.19 Virtual Router Redundancy Protocol (VRRP), TC chain templates, Initial Spectrum-2 support, QoS Trust-DSCP and DSCP rewrite
4.20 QoS MC-awareness, VXLAN with VLAN-unaware bridges
5.0 One-armed router support, VXLAN with VLAN-aware bridges, VXLAN routing, Ad-hoc firmware upgrade, Spectrum-2 IPv4/IPv6 multicast router
5.1 Spectrum-2 GRE, VXLAN and QoS support, VXLAN FDB vetoing, ethtool discard counters, devlink info command support
5.2 SN3700C in GA
5.3 SN3700 in GA, PTP support for Spectrum based switches, match on ingress device
5.4 Packet drops monitoring: Layer 2, CPU port's shared buffer occupancy monitoring
5.5 Packet drops monitoring: Layer 3 and exceptions, Add extended ACK for EMADs, SN3800 in GA, Initial Spectrum-3 support, Initial 400G support
5.6 Packet drops monitoring: Tunnel and exceptions. ETS and TBF qdisc offloads. Default port priority
5.7 Packet drops monitoring: ACL. FIFO stats offload, RED nodrop mode, ACL actions hardware stats types, packet trap policers, TC skbedit priority and pedit TOS / traffic_class
5.8 Control Plane Policing (CoPP), TC pedit TCP / UDP sport / dport
5.9 TC police action, monitoring shared buffer drops, link down reason
5.10 Firmware fatal events using devlink health, Support for DCB buffer commands, Critical and emergency alarms, transceiver_overheat counter
5.11 Nexthop objects support, Q-in-Q, Q-in-VNI
5.12 Route offload notifications, ethtool lanes support
5.13 Egress and flow-based sampling with extra metadata, Resilient next-hop groups
5.14 Inner layer 3 and custom multi-path hash policies, transceiver module EEPROM full read access
5.15 None
5.16 GRE6 for Spectrum-2 and above, Mirroring ECN-marked packets, Multi-level qdisc offload, Port shaper, transceiver module reset and power mode policy, Support for multiple router interface MAC prefixes
5.17 VXLAN with IPv6 underlay, Initial Spectrum-4 support
5.18 TC pedit IPv4 / IPv6 src / dst, L3 HW Statistics
6.0 PTP support for Spectrum-2 and above, Scalability improvements - more router interfaces (RIFs) are supported and lower number of {Port, VID}→FID mappings are used
6.2 GRE6 for Spectrum-1, Spectrum-4 800Gb/s support
6.5 tc-flower matching on layer 2 miss
6.6 tc-flower port range matching
6.7 None
6.8 PCI reset, Compressed-FID Flooding mode
6.9 Nexthop group statistics
6.10 Improve events processing performance
6.11 Hash seed configuration for multipath routing, Transceiver Module Firmware Flashing, Improve buffers allocations and reduce memory footprint

Reporting Issues

To report issues in the Wiki please send an email to: mlxsw [at] nvidia [dot] com

Clone this wiki locally