Skip to content

dpdk/rte_flow: dynamic bypass v1#13760

Closed
adaki4 wants to merge 5 commits into
OISF:mainfrom
adaki4:dpdk-rte-flow-dynamic-bypass-draft-v1
Closed

dpdk/rte_flow: dynamic bypass v1#13760
adaki4 wants to merge 5 commits into
OISF:mainfrom
adaki4:dpdk-rte-flow-dynamic-bypass-draft-v1

Conversation

@adaki4

@adaki4 adaki4 commented Aug 27, 2025

Copy link
Copy Markdown
Contributor

This is a draft of a new feauture that enables Suricata to use capture bypass in DPDK
runmode utilizing features available on modern SmartNICs.
The offload is based on the DPDK's rte_flow rules, mainly the drop
and count actions. The bypass utilizes the BypassManager thread and API.

The number of flows offloaded to the NIC differs based on the used NIC,
the current version was tested only with Mellanox ConnectX-6 and
supports offload of around 32 768 flows. The real number of rte_flow
rules inserted into the NIC is double of the offloaded flows (so around
65 536 rules in this case), because rte_flow needs 2 separate rules
to match both directions of a flow. All of the flows that cannot be
bypassed in the NIC are offloaded locally in Suricata.

The process of applying the bypass is as follows: the workers enqueue a
flow_key created from the bypassed flow into a rte_ring,
the BypassManager constantly polls the ring, tries to create and insert
the respective rte_flow rule into the NIC. The Flow Manager is in the
meantime responsible for checking the activity of the flows,
performed by reading the counters attached to the rte_flow rules, and
potentially destroying the rules in the case the flow timeouts.

The statistics from the flows are collected periodically (managed by
Flow Manager) and in the shutdown stage.

This branch is branched from a different unmerged feature branch,
https://github.com/adaki4/suricata/tree/dpdk-rte-flow-drop-filter-v2

Link to ticket: https://redmine.openinfosecfoundation.org/issues/7871

Describe changes:

  • Add function LiveGetDeviceByIdx() to retrieve LiveDev instance by numerical index
  • Add capture bypass for Suricata in DPDK runmode

adaki4 added 5 commits August 28, 2025 00:45
Added pattern based drop-filter.
Supported patterns are listed in "util-dpdk-rte-flow-pattern.c" in
"enum index next_item[]" and their corresponding attributes
in "enum index item_<pattern>[]".

The syntax of the pattern used in suricata.yaml is taken
from test-pmd application.
Other attributes, such as actions, are omitted, and only
pattern is accepted, e.g., a valid entry in suricata.yaml is:
"pattern eth / ipv4 src is 192.168.11.12 / end".

The user needs to validate the pattern used for a specific driver first,
otherwise, Suricata shuts down if the pattern is unsupported.
Source for supported patterns with different drivers:
https://doc.dpdk.org/guides/nics/overview.html

More patterns can be added in the future if needed. This new entry
needs to be specified in the `enum index`. This pattern then needs to be
present in `enum index next_item[]` and its corresponding attributes in
`enum index item_<pattern>[]`. Also, new entry needs to be provided to
`struct token token_list[]` with appropriate attributes.

The feature was tested on NIC drivers mlx5, ice, i40e, and ixgbe.

mlx5 accepts multiple rules with different patterns and various
corresponding specifications of pattern items. The driver does not have
any additional restrictions on the pattern.

ice does not support broad patterns; some pattern item has to have
specification, e.g., pattern "eth / ipv4 / end" raises an error but
"eth / ipv4 src is x / end" or "eth / ipv4 / tcp src is x" works fine.

i40e does not support different item sets on the same pattern item type,
e.g., if the first rule has the pattern `eth / ipv4 src is x / end`,
then if any other rule contains an ipv4 pattern type, it needs to have
exclusively attribute src.

ixgbe does not support this feature.
Added counter for packets filtered by the drop filter.
These statistics are only available on mlx5 and ice drivers.
Additionally, ice driver supports gathering statistic only when any
of the rules does not contain range of any kind (mask on IP address,
last on ports etc.). These statistics are available in eve.json under
"dpdk.rte_flow_filtered".
Add function LiveGetDeviceByIdx() to retrieve LiveDev instance by
numerical index.

Fix description of LiveGetDevice as it does not use a numerical index,
but a string name of the livedivce to retrieve the LiveDev instance.
This feature brings capture offload into the DPDK Suricata runmode.
The offload is based on the DPDK's rte_flow rules, mainly the drop
and count actions. The bypass utilizes the BypassManager thread and API.

The number of flows offloaded to the NIC differs based on the used NIC,
the current version was tested on only Mellanox ConnectX-6 and
supports offload of around 32 768 flows. The real number of rte_flow
rules inserted into the NIC is double of the offloaded flows (so around
65 536 rules in this case), because rte_flow needs 2 separate rules
to match both directions of a flow. All of the flows that cannot be
bypassed in the NIC are offloaded locally in Suricata.

The process of applying the bypass is as follows: the workers enqueue  a
flow_key created from the bypassed flow into a rte_ring,
the BypassManager constantly polls the ring, tries to create and insert
the respective rte_flow rule into the NIC. The Flow Manager is in the
meantime responsible for checking the activity of the flows,
performed by reading the counters attached to the rte_flow rules, and
potentially destroying the rules in the case the flow timeouts.

The statistics from the flows are collected periodically (managed by
Flow Manager) and in the shutdown stage.
@adaki4

adaki4 commented Aug 27, 2025

Copy link
Copy Markdown
Contributor Author

@lukashino can you please take a look at the DPDK part of the feature?

@lukashino lukashino self-requested a review August 28, 2025 07:45
Comment thread src/flow-manager.c
return true;
}

const bool shutdown = (SC_ATOMIC_GET(flow_flags) & FLOW_SHUTDOWN);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should be closer to where it is actually used, so L246/247

Comment thread src/flow-manager.c
uint32_t checked = 0;
Flow *prev_f = NULL;

const bool shutdown = (SC_ATOMIC_GET(flow_flags) & FLOW_SHUTDOWN);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment thread src/flow-manager.c
{
uint32_t cnt = 0;
const int emergency = ((SC_ATOMIC_GET(flow_flags) & FLOW_EMERGENCY));
const bool shutdown = (SC_ATOMIC_GET(flow_flags) & FLOW_SHUTDOWN);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment thread src/flow-private.h

/** Flow engine is in shutdown mode. This means it will not accept new flows
* and will only process existing flows.
* todo*/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's left here to do?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover comment, will delete.

Comment thread src/runmode-dpdk.c
SCReturnInt(0);
}

// if possible, configure dynamic bypass with rte_flow -> can return error, ASK ABOUT THIS

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the behavior is ok at this point (fail on error), we might change it in the future if we find a mlx/ice NIC that fails but so far so good. (if that was the intention of the question).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will delete the note and keep the behaviour.

Comment thread src/util-dpdk-rte-flow.c
SCReturnInt(-1);
}
BypassedFlowManagerRegisterCheckFunc(
RteFlowBypassRuleLoad, RteFlowBypassCheckFlowInit, (void *)rte_flow_bypass_data);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't you NULL the RteFlowBypassCheckFlowInit ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, because there is a check for at least one non-NULL FuncInit function in flow-bypass.c in BypassedFlowManager. I could theoretically remove the check and pass NULL into the BypassedFlowManagerRegisterCheckFunc.
`

bool found = false;
for (i = 0; i < g_bypassed_func_max_index; i++) {
    if (bypassedfunclist[i].FuncInit) {
        found = true;
        break;
    }
}
if (!found)
    return TM_ECODE_OK;

`

Comment thread src/util-dpdk-rte-flow.c

#define INITIAL_RTE_FLOW_RULE_COUNT_CAPACITY 5
#define DATA_BUFFER_SIZE 1024
#define COUNT_ACTION_ID 128

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is is little weird to set it to 128 specifically, is there any reason to do that, and not 0, 1 or something else?

Comment thread suricata.yaml.in
@@ -827,6 +827,11 @@ dpdk:
# - ips: the same as tap mode but it also drops packets that are flagged by rules to be dropped
copy-mode: none
copy-iface: none # or PCIe address of the second interface

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you miss the buffer size setting

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bypass-ring-size

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, noted, I've added that in.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any recommendations for a default size?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, perhaps something that comes out as reasonable from your testing.

Comment thread src/util-dpdk-rte-flow.c

int retval = rte_ring_mp_enqueue(rte_flow_bypass_data->bypass_ring, flow_key);
/* If ring is full, continue with local bypass */
if (retval < 0) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: I would appreciate stats such as the number of failed dequeues, failed rte_flow creations, failed stats queries, etc.

Comment thread src/util-dpdk-rte-flow.c
if (fc->bypass_data != NULL) {
SCReturnInt(0);
}
RteFlowHandlerToFlow *flow_handler_info = SCCalloc(1, sizeof(RteFlowHandlerToFlow));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be obtained from a mempool

@catenacyber

Copy link
Copy Markdown
Contributor

Is this to be closed in favor of #13950 ?

@catenacyber catenacyber added the needs rebase Needs rebase to main label Oct 30, 2025
@catenacyber

Copy link
Copy Markdown
Contributor

friendly ping @adaki4 ?

@adaki4

adaki4 commented Dec 18, 2025

Copy link
Copy Markdown
Contributor Author

Yes, the draft can be closed for now. Sorry for being inactive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs rebase Needs rebase to main

Development

Successfully merging this pull request may close these issues.

3 participants