Skip to content

T8382: Stats monitoring support for VPP via Prometheus#5048

Open
ritika0313 wants to merge 1 commit intovyos:currentfrom
ritika0313:vpp-prometheus-exporter
Open

T8382: Stats monitoring support for VPP via Prometheus#5048
ritika0313 wants to merge 1 commit intovyos:currentfrom
ritika0313:vpp-prometheus-exporter

Conversation

@ritika0313
Copy link
Copy Markdown
Contributor

@ritika0313 ritika0313 commented Mar 13, 2026

Change summary

Vyos VPP exporter implementation added to facilitate the VPP stats/metrics to be available via Prometheus. Vyos VPP exporter integrates with the VPP-exposed prometheus exporter binary and can be configured using the Vyos CLIs to fetch the stats/metrics selectively from VPP.

Following functionalities have been implemented:

  • Stat-groups stats are available on Prometheus web UI
  • Regex pattern based filtered stats are available on Prometheus web UI
  • vpp-exporter service restarts automatically upon:
    any change in vpp-exporter config
    VPP service restart
    system reboot
  • vpp-exporter service cannot be started until VPP service exists
  • vpp-exporter service stops if VPP service becomes unavailable
  • vpp-exporter service can co-exist with node-exporter (both independently) to provide VPP and kernel stats respectively.
  • Added CLI support for enabling VPP per-node-counters, which is required to export node-level statistics.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes)
  • Migration from an old Vyatta component to vyos-1x, please link to related PR inside obsoleted component
  • Other (please describe):

Related Task(s)

https://vyos.dev/T8382

Related PR(s)

How to test / Smoketest result

Testing done:

  • Verified stat-group stats and regex pattern based filtered stats on Prometheus web UI against the VPP CLI output

  • Verified VPP service restarts automatically upon below actions and correct/updated stats are available again on Prometheus.
    any change in vpp-exporter config
    VPP service restart
    system reboot

  • Verified changing the node_exporter config only restarts node_exporter. Same for vpp_exporter.

  • Verified vpp-exporter vrf endpoint option

CONFIGURE METRICS FOR BUFFER-POOL AND SYS GROUPS & CUSTOM STAT-PATTERN FOR INTERFACES

vyos@vyos# set service monitoring prometheus vpp-exporter stat-group buffer-pools 
vyos@vyos# commit
vyos@vyos#  set service monitoring prometheus vpp-exporter stat-group sys
vyos@vyos# commit
vyos@vyos#  set service monitoring prometheus vpp-exporter stat-pattern ^/interfaces/.*/rx$
vyos@vyos# commit

a@Mac vyos-build % curl -fsS http://192.168.65.18:9482/metrics
sys_heartbeat 2694.00
sys_last_stats_clear 0.00
sys_boottime 1773122098.00
sys_uptime 44219.00
sys_vector_rate 0.00
sys_vector_rate_per_worker{index="0",thread="0"} 0
sys_vector_rate_per_worker{index="1",thread="0"} 0
sys_loops_per_worker{index="0",thread="0"} 777854
sys_loops_per_worker{index="1",thread="0"} 0
buffer_pools_cached{pool="default-numa-0"} 344.00
buffer_pools_used{pool="default-numa-0"} 712.00
buffer_pools_available{pool="default-numa-0"} 15728.00
sys_num_worker_threads 0.00
sys_last_update 44220.00
sys_input_rate 0.00
interfaces_rx_packets{interface="local0",index="0",thread="0"} 0
interfaces_rx_bytes{interface="local0",index="0",thread="0"} 0
interfaces_rx_packets{interface="eth1",index="0",thread="0"} 585
interfaces_rx_bytes{interface="eth1",index="0",thread="0"} 139158
interfaces_rx_packets{interface="tap4096",index="0",thread="0"} 12
interfaces_rx_bytes{interface="tap4096",index="0",thread="0"} 1196

vyos@vyos:~$ sudo vpp_get_stats dump '^/buffer-pools'
344.00 /buffer-pools/default-numa-0/cached
712.00 /buffer-pools/default-numa-0/used
15728.00 /buffer-pools/default-numa-0/available

vyos@vyos:~$ sudo vpp_get_stats dump '^/sys'
2694.00 /sys/heartbeat
0.00 /sys/last_stats_clear
1773122098.00 /sys/boottime
0.00 /sys/vector_rate
[0 @ 0]: 0 packets /sys/vector_rate_per_worker
[1 @ 0]: 0 packets /sys/vector_rate_per_worker
[0 @ 0]: 777854 packets /sys/loops_per_worker
[1 @ 0]: 0 packets /sys/loops_per_worker
0.00 /sys/num_worker_threads
44220.00 /sys/last_update
0.00 /sys/input_rate

vyos@vyos:~$ sudo vpp_get_stats dump '^/interfaces/.*/rx$'
[0 @ 0]: 0 packets, 0 bytes /interfaces/local0/rx
[0 @ 0]: 585 packets, 139158 bytes /interfaces/eth1/rx
[0 @ 0]: 12 packets, 1196 bytes /interfaces/tap4096/rx

VRF TESTING

set interfaces ethernet eth2 address 100.64.0.8/24
set vrf name TEST table 1001
set interfaces ethernet eth2 vrf TEST
set ser moni prom vpp vrf TEST
commit

vyos@vyos:~$ sudo ip vrf exec TEST curl -s http://127.0.0.1:9482/metrics |head
sys_heartbeat 168.00
sys_last_stats_clear 0.00
sys_boottime 1773260614.00
sys_uptime 6183.00
sys_vector_rate 0.00
sys_vector_rate_per_worker{index="0",thread="0"} 0
sys_vector_rate_per_worker{index="1",thread="0"} 0
sys_loops_per_worker{index="0",thread="0"} 1135441
sys_loops_per_worker{index="1",thread="0"} 0
buffer_pools_cached{pool="default-numa-0"} 206.00
vyos@vyos:~$ 
vyos@vyos:~$ sudo ip vrf exec TEST curl -s http://100.64.0.8:9482/metrics |head
sys_heartbeat 173.00
sys_last_stats_clear 0.00
sys_boottime 1773260614.00
sys_uptime 6241.00
sys_vector_rate 0.00
sys_vector_rate_per_worker{index="0",thread="0"} 0
sys_vector_rate_per_worker{index="1",thread="0"} 0
sys_loops_per_worker{index="0",thread="0"} 1166346
sys_loops_per_worker{index="1",thread="0"} 0
buffer_pools_cached{pool="default-numa-0"} 207.00

CLI VALIDATION

  • No VPP
vyos@vyos# set serv moni prom vpp stat-group nodes
[edit]
vyos@vyos# commit
[ service monitoring prometheus ]
No VPP configuration exists!  Configure VPP before VPP-exporter
configuration.
[[service monitoring prometheus]] failed
Commit failed
  • Invalid Stat-group
vyos@vyos# set service monitoring prometheus vpp-exporter stat-group abd

  
  
  Invalid stat-group. Allowed values: interfaces, err, buffer-pools, sys, workers, nodes, mem
  Value validation failed
  Set failed
  • Invalid regex pattern
vyos@vyos# set service monitoring prometheus vpp-exporter stat-pattern mem
[edit]
vyos@vyos# commit
[ service monitoring prometheus ]
Invalid stat-pattern "mem". Pattern must start with "^/"

[[service monitoring prometheus]] failed
Commit failed
  • Per-node-counters not enabled triggers a one-time warning while configuring nodes stats
vyos@vyos# set serv moni prom vpp stat-group nodes
[edit]
vyos@vyos# commit
[ service monitoring prometheus ]

WARNING: VPP node metrics requested but per-node-counters setting is
not enabled. Enable it using below cmd for "nodes" metrics to be 
available:
"set vpp settings resource-allocation memory stats per-node-counters".

SMOKETESTS:

vyos@vyos:~$ sudo /usr/libexec/vyos/tests/smoke/cli/test_vpp.py TestVPP.test_11_4_statseg_per_node_counters
test_11_4_statseg_per_node_counters (__main__.TestVPP.test_11_4_statseg_per_node_counters) ... ok

----------------------------------------------------------------------
Ran 1 test in 89.569s

OK
vyos@vyos:~$  sudo /usr/libexec/vyos/tests/smoke/cli/test_service_monitoring_prometheus.py 
test_01_node_exporter (__main__.TestMonitoringPrometheus.test_01_node_exporter) ... ok
test_02_frr_exporter (__main__.TestMonitoringPrometheus.test_02_frr_exporter) ... ok
test_03_blackbox_exporter (__main__.TestMonitoringPrometheus.test_03_blackbox_exporter) ... ok
test_04_blackbox_exporter_with_config (__main__.TestMonitoringPrometheus.test_04_blackbox_exporter_with_config) ... ok
test_05_vpp_exporter (__main__.TestMonitoringPrometheus.test_05_vpp_exporter) ... ok
test_06_vpp_exporter_group_and_custom_patterns (__main__.TestMonitoringPrometheus.test_06_vpp_exporter_group_and_custom_patterns) ... ok

----------------------------------------------------------------------
Ran 6 tests in 159.910s

OK

Checklist:

  • I have read the CONTRIBUTING document
  • I have linked this PR to one or more Phabricator Task(s)
  • I have run the components SMOKETESTS if applicable
  • My commit headlines contain a valid Task id
  • My change requires a change to the documentation
  • I have updated the documentation accordingly

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 13, 2026

👍
No issues in PR Title / Commit Title

@ritika0313 ritika0313 force-pushed the vpp-prometheus-exporter branch from 7225fdd to 31c5cfb Compare March 13, 2026 09:03
@ritika0313 ritika0313 changed the title VD-2176:VPP vyos-1x: Prometheus monitoring support - VPP-exporter T9999: VD-2176:VPP vyos-1x: Prometheus monitoring support - VPP-exporter Mar 13, 2026
@ritika0313 ritika0313 changed the title T9999: VD-2176:VPP vyos-1x: Prometheus monitoring support - VPP-exporter T99999: VD-2176:VPP vyos-1x: Prometheus monitoring support - VPP-exporter Mar 13, 2026
Comment on lines +155 to +156
if is_node_changed(conf, exporter_base):
monitoring.update({f'{exporter_name}_restart_required': {}})
Copy link
Copy Markdown
Contributor Author

@ritika0313 ritika0313 Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier, only the changes in the vrf sub-tree of the exporters restarted the service.
Now, any change under that exporter subtree sets restart-required which I believe should be the intended behavior.
This code block needs review - whether this change looks ok or we need to restore the handling as per the previous logic.

Comment on lines +138 to +142
<leafNode name="socket-name">
<properties>
<help>VPP stats socket path</help>
</properties>
<defaultValue>/run/vpp/stats.sock</defaultValue>
Copy link
Copy Markdown
Member

@sever-sever sever-sever Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to allow changing the path to the socket?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do need this as VPP supports configurable stat socket name under statseg.

## Statistics Segment
# statseg {
    # socket-name <filename>, name of the stats segment socket
    #     defaults to /run/vpp/stats.sock
    # size <nnn>[KMG], size of the stats segment, defaults to 32mb
    # page-size <nnn>, page size, ie. 2m, defaults to 4k
    # per-node-counters on | off, defaults to none
    # update-interval <f64-seconds>, sets the segment scrape / update interval
# }

This CLI will allow the user to change the socket name and keep it consistent with the one being used by VPP.
This option can alternatively be exposed with the other stats segment parameters under:
"set vpp settings resource-allocation memory stats"

I kept it under vpp-exporter to keep it consistent with the other exporter CLIs taking connection port/address info under their subtree.

vyos@vyos# set service monitoring prometheus node-exporter 
Possible completions:
 > collectors           Collectors specific configuration
+  listen-address       Local IP addresses to listen on
   port                 Port number used by connection (default: 9100)
   vrf                  VRF instance name

      
[edit]
vyos@vyos# set service monitoring prometheus frr-exporter 
Possible completions:
+  listen-address       Local IP addresses to listen on
   port                 Port number used by connection (default: 9342)
   vrf                   VRF instance name     

I can change option location if grouping with the other stats segment parameters seems more logical/required, please let me know.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the VPP socket name will not get changed out of vyos cli, I have removed the socket-name config option.
Also, added a check for the availability of port for binding for the port option.

Copy link
Copy Markdown
Member

@sever-sever sever-sever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be a phorge task on the https://vyos.dev/
You have to reffer the task number in the PR title and commit message.
Remove vyos-1x from the PR title.
Txxxx: Add VPP-exporter for service monitoring prometheus

Comment on lines +182 to +185
<leafNode name="stat-pattern">
<properties>
<help>Regex pattern to export filtered VPP stats. Examples: ^/interfaces, ^/interfaces/.*/rx$ </help>
<multi/>
Copy link
Copy Markdown
Member

@sever-sever sever-sever Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some "predefined patterns"?
I'm a user who doesn't know anything about those patterns but wants an easy setup.

Copy link
Copy Markdown
Contributor Author

@ritika0313 ritika0313 Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To provide a user friendly CLI, I added support for a separate CLI option - 'stat-group' along with stat-pattern. Since it isn’t practical to expose CLI support for every possible regex pattern, this option allows users to access top-level groups by specifying a simple name string without needing to specify a full regex pattern, making the CLI usage simpler while still keeping the flexibility of stat-pattern.

vyos@vyos# set service monitoring prometheus vpp-exporter stat-group 
Possible completions:
   interfaces           Interface counters
   err                  Error counters
   buffer-pools         Buffer pool counters
   sys                  System counters
   workers              Worker counters
   nodes                Node counters
   mem                  Memory counters
                        

I can change the help string examples for stat-pattern if there is a preference, please let me know.

@sever-sever sever-sever requested a review from zdc March 13, 2026 11:04
@ritika0313 ritika0313 changed the title T99999: VD-2176:VPP vyos-1x: Prometheus monitoring support - VPP-exporter T8382: Monitoring support for VPP via Prometheus Mar 13, 2026
@ritika0313 ritika0313 force-pushed the vpp-prometheus-exporter branch from 31c5cfb to 54ba4cb Compare March 13, 2026 18:02
@ritika0313 ritika0313 changed the title T8382: Monitoring support for VPP via Prometheus T8382: Stats monitoring support for VPP via Prometheus Mar 13, 2026
@ritika0313 ritika0313 marked this pull request as draft March 13, 2026 19:44
@ritika0313 ritika0313 force-pushed the vpp-prometheus-exporter branch 4 times, most recently from 7a3b4ec to f1761c4 Compare March 14, 2026 15:27
@ritika0313 ritika0313 marked this pull request as ready for review March 14, 2026 15:29
@sever-sever sever-sever self-requested a review March 17, 2026 16:00
@ritika0313 ritika0313 marked this pull request as draft March 17, 2026 20:13
@ritika0313 ritika0313 force-pushed the vpp-prometheus-exporter branch from d3f5f31 to 1c1d8c2 Compare March 17, 2026 22:04
@ritika0313 ritika0313 marked this pull request as ready for review March 17, 2026 23:47
CLI added: set service monitoring prometheus vpp-exporter

- Implement Vyos Prometheus VPP exporter
- Integrate with upstream VPP-exposed Prometheus metrics
- Allow metric selection via stat group or regex
- Ensure exporter survives reboot and VPP restarts
- Add CLI for enabling per-node-counters
@ritika0313 ritika0313 force-pushed the vpp-prometheus-exporter branch from 1c1d8c2 to a8524c6 Compare March 18, 2026 20:51
@github-actions
Copy link
Copy Markdown

CI integration ❌ failed!

Details

CI logs

  • CLI Smoketests 👍 passed
  • CLI Smoketests (interfaces only) 👍 passed
  • Config tests 👍 passed
  • RAID1 tests 👍 passed
  • CLI Smoketests VPP ❌ failed
  • Config tests VPP 👍 passed
  • TPM tests 👍 passed

Copy link
Copy Markdown
Member

@sever-sever sever-sever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are conflicts at the moment:

<<<<<<< vpp-prometheus-exporter
    def test_22_vpp_exporter(self):
        self.cli_set(prometheus_base_path + ['vpp-exporter'])

        # commit changes
        self.cli_commit()

        file_content = read_file(vpp_exporter_service_file)
        self.assertIn('port 9482', file_content)
        self.assertIn('socket-name /run/vpp/stats.sock', file_content)
        self.assertIn('^/interfaces', file_content)
        self.assertIn('^/err', file_content)
        self.assertIn('^/buffer-pools', file_content)
        self.assertIn('^/sys', file_content)
        self.assertIn('^/workers', file_content)
        self.assertIn('^/mem', file_content)
        self.assertIn('PartOf=vpp.service', file_content)
        self.assertIn('BindsTo=vpp.service', file_content)
        self.assertIn('WantedBy=vpp.service', file_content)

        # Check for running process
        self.assertTrue(process_named_running(VPP_EXPORTER_PROCESS_NAME))

    def test_23_vpp_exporter_group_and_custom_patterns(self):
        self.cli_set(prometheus_base_path + ['vpp-exporter'])
        self.cli_set(
            prometheus_base_path + ['vpp-exporter', 'stat-group', 'interfaces']
        )
        self.cli_set(prometheus_base_path + ['vpp-exporter', 'stat-pattern', '^/sys'])

        # commit changes
        self.cli_commit()

        file_content = read_file(vpp_exporter_service_file)
        self.assertIn('^/interfaces', file_content)
        self.assertIn('^/sys', file_content)

        # Check for running process
        self.assertTrue(process_named_running(VPP_EXPORTER_PROCESS_NAME))
=======
    def test_22_no_vpp_kernel_bridge_cross_membership(self):
        vlan = '123'
        member = f'{interface}.{vlan}'
        bridge_iface = 'br1'

        self.cli_commit()

        # Ensure that VPP process is active
        self.assertTrue(process_named_running(PROCESS_NAME))

        # Attempt to add a VPP interface VLAN as a bridge member
        self.cli_set(['interfaces', 'ethernet', interface, 'vif', vlan])
        self.cli_set(
            ['interfaces', 'bridge', bridge_iface, 'member', 'interface', member]
        )

        # Adding a VPP interface (or its VLAN) as a bridge member is not allowed
        # expect raise ConfigError
        with self.assertRaises(ConfigSessionError):
            self.cli_commit()

        self.cli_delete(base_path)
        self.cli_commit()

        # Ensure interface is a member of bridge
        self.assertTrue(os.path.isdir(f'/sys/class/net/{bridge_iface}/lower_{member}'))

        # Adding a bridge member as a VPP interface is not allowed
        # expect raise ConfigError
        self.cli_set(base_path + ['settings', 'interface', interface])
        with self.assertRaises(ConfigSessionError):
            self.cli_commit()

        self.cli_delete(['interfaces', 'bridge'])
        self.cli_commit()

        # Ensure that VPP process is active
        self.assertTrue(process_named_running(PROCESS_NAME))
>>>>>>> current

Copy link
Copy Markdown
Contributor

@alexandr-san4ez alexandr-san4ez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my suggestions can improve the convenience of debugging the systemd service.

if rc != 0:
if service:
raise ConfigError(
f'Failed to {action} {service}. Check "journalctl -xe" for details.'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f'Failed to {action} {service}. Check "journalctl -xe" for details.'
f'Failed to {action} {service}. Check "journalctl -u {service} -xe" for details.'

return
sleep(0.5)
raise ConfigError(
f'{service} failed to reach running state after {action}. Check "journalctl -xe" for details.'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f'{service} failed to reach running state after {action}. Check "journalctl -xe" for details.'
f'{service} failed to reach running state after {action}. Check "journalctl -u {service} -xe" for details.'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants