dpdk/rte_flow: dynamic bypass v1#13760
Conversation
Added pattern based drop-filter. Supported patterns are listed in "util-dpdk-rte-flow-pattern.c" in "enum index next_item[]" and their corresponding attributes in "enum index item_<pattern>[]". The syntax of the pattern used in suricata.yaml is taken from test-pmd application. Other attributes, such as actions, are omitted, and only pattern is accepted, e.g., a valid entry in suricata.yaml is: "pattern eth / ipv4 src is 192.168.11.12 / end". The user needs to validate the pattern used for a specific driver first, otherwise, Suricata shuts down if the pattern is unsupported. Source for supported patterns with different drivers: https://doc.dpdk.org/guides/nics/overview.html More patterns can be added in the future if needed. This new entry needs to be specified in the `enum index`. This pattern then needs to be present in `enum index next_item[]` and its corresponding attributes in `enum index item_<pattern>[]`. Also, new entry needs to be provided to `struct token token_list[]` with appropriate attributes. The feature was tested on NIC drivers mlx5, ice, i40e, and ixgbe. mlx5 accepts multiple rules with different patterns and various corresponding specifications of pattern items. The driver does not have any additional restrictions on the pattern. ice does not support broad patterns; some pattern item has to have specification, e.g., pattern "eth / ipv4 / end" raises an error but "eth / ipv4 src is x / end" or "eth / ipv4 / tcp src is x" works fine. i40e does not support different item sets on the same pattern item type, e.g., if the first rule has the pattern `eth / ipv4 src is x / end`, then if any other rule contains an ipv4 pattern type, it needs to have exclusively attribute src. ixgbe does not support this feature.
Added counter for packets filtered by the drop filter. These statistics are only available on mlx5 and ice drivers. Additionally, ice driver supports gathering statistic only when any of the rules does not contain range of any kind (mask on IP address, last on ports etc.). These statistics are available in eve.json under "dpdk.rte_flow_filtered".
Add function LiveGetDeviceByIdx() to retrieve LiveDev instance by numerical index. Fix description of LiveGetDevice as it does not use a numerical index, but a string name of the livedivce to retrieve the LiveDev instance.
This feature brings capture offload into the DPDK Suricata runmode. The offload is based on the DPDK's rte_flow rules, mainly the drop and count actions. The bypass utilizes the BypassManager thread and API. The number of flows offloaded to the NIC differs based on the used NIC, the current version was tested on only Mellanox ConnectX-6 and supports offload of around 32 768 flows. The real number of rte_flow rules inserted into the NIC is double of the offloaded flows (so around 65 536 rules in this case), because rte_flow needs 2 separate rules to match both directions of a flow. All of the flows that cannot be bypassed in the NIC are offloaded locally in Suricata. The process of applying the bypass is as follows: the workers enqueue a flow_key created from the bypassed flow into a rte_ring, the BypassManager constantly polls the ring, tries to create and insert the respective rte_flow rule into the NIC. The Flow Manager is in the meantime responsible for checking the activity of the flows, performed by reading the counters attached to the rte_flow rules, and potentially destroying the rules in the case the flow timeouts. The statistics from the flows are collected periodically (managed by Flow Manager) and in the shutdown stage.
|
@lukashino can you please take a look at the DPDK part of the feature? |
| return true; | ||
| } | ||
|
|
||
| const bool shutdown = (SC_ATOMIC_GET(flow_flags) & FLOW_SHUTDOWN); |
There was a problem hiding this comment.
nit: should be closer to where it is actually used, so L246/247
| uint32_t checked = 0; | ||
| Flow *prev_f = NULL; | ||
|
|
||
| const bool shutdown = (SC_ATOMIC_GET(flow_flags) & FLOW_SHUTDOWN); |
| { | ||
| uint32_t cnt = 0; | ||
| const int emergency = ((SC_ATOMIC_GET(flow_flags) & FLOW_EMERGENCY)); | ||
| const bool shutdown = (SC_ATOMIC_GET(flow_flags) & FLOW_SHUTDOWN); |
|
|
||
| /** Flow engine is in shutdown mode. This means it will not accept new flows | ||
| * and will only process existing flows. | ||
| * todo*/ |
There was a problem hiding this comment.
Leftover comment, will delete.
| SCReturnInt(0); | ||
| } | ||
|
|
||
| // if possible, configure dynamic bypass with rte_flow -> can return error, ASK ABOUT THIS |
There was a problem hiding this comment.
I think the behavior is ok at this point (fail on error), we might change it in the future if we find a mlx/ice NIC that fails but so far so good. (if that was the intention of the question).
There was a problem hiding this comment.
Okay, I will delete the note and keep the behaviour.
| SCReturnInt(-1); | ||
| } | ||
| BypassedFlowManagerRegisterCheckFunc( | ||
| RteFlowBypassRuleLoad, RteFlowBypassCheckFlowInit, (void *)rte_flow_bypass_data); |
There was a problem hiding this comment.
can't you NULL the RteFlowBypassCheckFlowInit ?
There was a problem hiding this comment.
Not quite, because there is a check for at least one non-NULL FuncInit function in flow-bypass.c in BypassedFlowManager. I could theoretically remove the check and pass NULL into the BypassedFlowManagerRegisterCheckFunc.
`
bool found = false;
for (i = 0; i < g_bypassed_func_max_index; i++) {
if (bypassedfunclist[i].FuncInit) {
found = true;
break;
}
}
if (!found)
return TM_ECODE_OK;
`
|
|
||
| #define INITIAL_RTE_FLOW_RULE_COUNT_CAPACITY 5 | ||
| #define DATA_BUFFER_SIZE 1024 | ||
| #define COUNT_ACTION_ID 128 |
There was a problem hiding this comment.
I think it is is little weird to set it to 128 specifically, is there any reason to do that, and not 0, 1 or something else?
| @@ -827,6 +827,11 @@ dpdk: | |||
| # - ips: the same as tap mode but it also drops packets that are flagged by rules to be dropped | |||
| copy-mode: none | |||
| copy-iface: none # or PCIe address of the second interface | |||
There was a problem hiding this comment.
Here you miss the buffer size setting
There was a problem hiding this comment.
Okay, noted, I've added that in.
There was a problem hiding this comment.
Do you have any recommendations for a default size?
There was a problem hiding this comment.
not really, perhaps something that comes out as reasonable from your testing.
|
|
||
| int retval = rte_ring_mp_enqueue(rte_flow_bypass_data->bypass_ring, flow_key); | ||
| /* If ring is full, continue with local bypass */ | ||
| if (retval < 0) { |
There was a problem hiding this comment.
Important: I would appreciate stats such as the number of failed dequeues, failed rte_flow creations, failed stats queries, etc.
| if (fc->bypass_data != NULL) { | ||
| SCReturnInt(0); | ||
| } | ||
| RteFlowHandlerToFlow *flow_handler_info = SCCalloc(1, sizeof(RteFlowHandlerToFlow)); |
There was a problem hiding this comment.
This should be obtained from a mempool
|
Is this to be closed in favor of #13950 ? |
|
friendly ping @adaki4 ? |
|
Yes, the draft can be closed for now. Sorry for being inactive. |
This is a draft of a new feauture that enables Suricata to use capture bypass in DPDK
runmode utilizing features available on modern SmartNICs.
The offload is based on the DPDK's rte_flow rules, mainly the drop
and count actions. The bypass utilizes the BypassManager thread and API.
The number of flows offloaded to the NIC differs based on the used NIC,
the current version was tested only with Mellanox ConnectX-6 and
supports offload of around 32 768 flows. The real number of rte_flow
rules inserted into the NIC is double of the offloaded flows (so around
65 536 rules in this case), because rte_flow needs 2 separate rules
to match both directions of a flow. All of the flows that cannot be
bypassed in the NIC are offloaded locally in Suricata.
The process of applying the bypass is as follows: the workers enqueue a
flow_key created from the bypassed flow into a rte_ring,
the BypassManager constantly polls the ring, tries to create and insert
the respective rte_flow rule into the NIC. The Flow Manager is in the
meantime responsible for checking the activity of the flows,
performed by reading the counters attached to the rte_flow rules, and
potentially destroying the rules in the case the flow timeouts.
The statistics from the flows are collected periodically (managed by
Flow Manager) and in the shutdown stage.
This branch is branched from a different unmerged feature branch,
https://github.com/adaki4/suricata/tree/dpdk-rte-flow-drop-filter-v2
Link to ticket: https://redmine.openinfosecfoundation.org/issues/7871
Describe changes: