T8382: Stats monitoring support for VPP via Prometheus#5048
T8382: Stats monitoring support for VPP via Prometheus#5048ritika0313 wants to merge 1 commit intovyos:currentfrom
Conversation
|
👍 |
7225fdd to
31c5cfb
Compare
| if is_node_changed(conf, exporter_base): | ||
| monitoring.update({f'{exporter_name}_restart_required': {}}) |
There was a problem hiding this comment.
Earlier, only the changes in the vrf sub-tree of the exporters restarted the service.
Now, any change under that exporter subtree sets restart-required which I believe should be the intended behavior.
This code block needs review - whether this change looks ok or we need to restore the handling as per the previous logic.
| <leafNode name="socket-name"> | ||
| <properties> | ||
| <help>VPP stats socket path</help> | ||
| </properties> | ||
| <defaultValue>/run/vpp/stats.sock</defaultValue> |
There was a problem hiding this comment.
Do we really want to allow changing the path to the socket?
There was a problem hiding this comment.
Yes, we do need this as VPP supports configurable stat socket name under statseg.
## Statistics Segment
# statseg {
# socket-name <filename>, name of the stats segment socket
# defaults to /run/vpp/stats.sock
# size <nnn>[KMG], size of the stats segment, defaults to 32mb
# page-size <nnn>, page size, ie. 2m, defaults to 4k
# per-node-counters on | off, defaults to none
# update-interval <f64-seconds>, sets the segment scrape / update interval
# }
This CLI will allow the user to change the socket name and keep it consistent with the one being used by VPP.
This option can alternatively be exposed with the other stats segment parameters under:
"set vpp settings resource-allocation memory stats"
I kept it under vpp-exporter to keep it consistent with the other exporter CLIs taking connection port/address info under their subtree.
vyos@vyos# set service monitoring prometheus node-exporter
Possible completions:
> collectors Collectors specific configuration
+ listen-address Local IP addresses to listen on
port Port number used by connection (default: 9100)
vrf VRF instance name
[edit]
vyos@vyos# set service monitoring prometheus frr-exporter
Possible completions:
+ listen-address Local IP addresses to listen on
port Port number used by connection (default: 9342)
vrf VRF instance name
I can change option location if grouping with the other stats segment parameters seems more logical/required, please let me know.
There was a problem hiding this comment.
Since the VPP socket name will not get changed out of vyos cli, I have removed the socket-name config option.
Also, added a check for the availability of port for binding for the port option.
sever-sever
left a comment
There was a problem hiding this comment.
There must be a phorge task on the https://vyos.dev/
You have to reffer the task number in the PR title and commit message.
Remove vyos-1x from the PR title.
Txxxx: Add VPP-exporter for service monitoring prometheus
| <leafNode name="stat-pattern"> | ||
| <properties> | ||
| <help>Regex pattern to export filtered VPP stats. Examples: ^/interfaces, ^/interfaces/.*/rx$ </help> | ||
| <multi/> |
There was a problem hiding this comment.
Can we add some "predefined patterns"?
I'm a user who doesn't know anything about those patterns but wants an easy setup.
There was a problem hiding this comment.
To provide a user friendly CLI, I added support for a separate CLI option - 'stat-group' along with stat-pattern. Since it isn’t practical to expose CLI support for every possible regex pattern, this option allows users to access top-level groups by specifying a simple name string without needing to specify a full regex pattern, making the CLI usage simpler while still keeping the flexibility of stat-pattern.
vyos@vyos# set service monitoring prometheus vpp-exporter stat-group
Possible completions:
interfaces Interface counters
err Error counters
buffer-pools Buffer pool counters
sys System counters
workers Worker counters
nodes Node counters
mem Memory counters
I can change the help string examples for stat-pattern if there is a preference, please let me know.
31c5cfb to
54ba4cb
Compare
7a3b4ec to
f1761c4
Compare
d3f5f31 to
1c1d8c2
Compare
CLI added: set service monitoring prometheus vpp-exporter - Implement Vyos Prometheus VPP exporter - Integrate with upstream VPP-exposed Prometheus metrics - Allow metric selection via stat group or regex - Ensure exporter survives reboot and VPP restarts - Add CLI for enabling per-node-counters
1c1d8c2 to
a8524c6
Compare
|
CI integration ❌ failed! Details
|
sever-sever
left a comment
There was a problem hiding this comment.
There are conflicts at the moment:
<<<<<<< vpp-prometheus-exporter
def test_22_vpp_exporter(self):
self.cli_set(prometheus_base_path + ['vpp-exporter'])
# commit changes
self.cli_commit()
file_content = read_file(vpp_exporter_service_file)
self.assertIn('port 9482', file_content)
self.assertIn('socket-name /run/vpp/stats.sock', file_content)
self.assertIn('^/interfaces', file_content)
self.assertIn('^/err', file_content)
self.assertIn('^/buffer-pools', file_content)
self.assertIn('^/sys', file_content)
self.assertIn('^/workers', file_content)
self.assertIn('^/mem', file_content)
self.assertIn('PartOf=vpp.service', file_content)
self.assertIn('BindsTo=vpp.service', file_content)
self.assertIn('WantedBy=vpp.service', file_content)
# Check for running process
self.assertTrue(process_named_running(VPP_EXPORTER_PROCESS_NAME))
def test_23_vpp_exporter_group_and_custom_patterns(self):
self.cli_set(prometheus_base_path + ['vpp-exporter'])
self.cli_set(
prometheus_base_path + ['vpp-exporter', 'stat-group', 'interfaces']
)
self.cli_set(prometheus_base_path + ['vpp-exporter', 'stat-pattern', '^/sys'])
# commit changes
self.cli_commit()
file_content = read_file(vpp_exporter_service_file)
self.assertIn('^/interfaces', file_content)
self.assertIn('^/sys', file_content)
# Check for running process
self.assertTrue(process_named_running(VPP_EXPORTER_PROCESS_NAME))
=======
def test_22_no_vpp_kernel_bridge_cross_membership(self):
vlan = '123'
member = f'{interface}.{vlan}'
bridge_iface = 'br1'
self.cli_commit()
# Ensure that VPP process is active
self.assertTrue(process_named_running(PROCESS_NAME))
# Attempt to add a VPP interface VLAN as a bridge member
self.cli_set(['interfaces', 'ethernet', interface, 'vif', vlan])
self.cli_set(
['interfaces', 'bridge', bridge_iface, 'member', 'interface', member]
)
# Adding a VPP interface (or its VLAN) as a bridge member is not allowed
# expect raise ConfigError
with self.assertRaises(ConfigSessionError):
self.cli_commit()
self.cli_delete(base_path)
self.cli_commit()
# Ensure interface is a member of bridge
self.assertTrue(os.path.isdir(f'/sys/class/net/{bridge_iface}/lower_{member}'))
# Adding a bridge member as a VPP interface is not allowed
# expect raise ConfigError
self.cli_set(base_path + ['settings', 'interface', interface])
with self.assertRaises(ConfigSessionError):
self.cli_commit()
self.cli_delete(['interfaces', 'bridge'])
self.cli_commit()
# Ensure that VPP process is active
self.assertTrue(process_named_running(PROCESS_NAME))
>>>>>>> current
alexandr-san4ez
left a comment
There was a problem hiding this comment.
I think my suggestions can improve the convenience of debugging the systemd service.
| if rc != 0: | ||
| if service: | ||
| raise ConfigError( | ||
| f'Failed to {action} {service}. Check "journalctl -xe" for details.' |
There was a problem hiding this comment.
| f'Failed to {action} {service}. Check "journalctl -xe" for details.' | |
| f'Failed to {action} {service}. Check "journalctl -u {service} -xe" for details.' |
| return | ||
| sleep(0.5) | ||
| raise ConfigError( | ||
| f'{service} failed to reach running state after {action}. Check "journalctl -xe" for details.' |
There was a problem hiding this comment.
| f'{service} failed to reach running state after {action}. Check "journalctl -xe" for details.' | |
| f'{service} failed to reach running state after {action}. Check "journalctl -u {service} -xe" for details.' |
Change summary
Vyos VPP exporter implementation added to facilitate the VPP stats/metrics to be available via Prometheus. Vyos VPP exporter integrates with the VPP-exposed prometheus exporter binary and can be configured using the Vyos CLIs to fetch the stats/metrics selectively from VPP.
Following functionalities have been implemented:
any change in vpp-exporter config
VPP service restart
system reboot
Types of changes
Related Task(s)
https://vyos.dev/T8382
Related PR(s)
How to test / Smoketest result
Testing done:
Verified stat-group stats and regex pattern based filtered stats on Prometheus web UI against the VPP CLI output
Verified VPP service restarts automatically upon below actions and correct/updated stats are available again on Prometheus.
any change in vpp-exporter config
VPP service restart
system reboot
Verified changing the node_exporter config only restarts node_exporter. Same for vpp_exporter.
Verified vpp-exporter vrf endpoint option
CONFIGURE METRICS FOR BUFFER-POOL AND SYS GROUPS & CUSTOM STAT-PATTERN FOR INTERFACES
VRF TESTING
CLI VALIDATION
SMOKETESTS:
Checklist: