Skip to content

Runtime error handling API#212

Merged
Acuadros95 merged 14 commits intogalacticfrom
feature/memory_error_handling
Dec 17, 2021
Merged

Runtime error handling API#212
Acuadros95 merged 14 commits intogalacticfrom
feature/memory_error_handling

Conversation

@pablogs9
Copy link
Copy Markdown
Member

@pablogs9 pablogs9 commented Nov 29, 2021

This PR is a proposal for an API that allows the micro-ROS user to handle static memory errors such as the unavailability of static buffer for receiving new topic data.

This is implemented as a settable (through API) function pointer that is triggered when an error (not available memory) in middleware callbacks is found. A buffer pointer is provided with the topic data that is not going to be stored.

  • Should it also be triggered in other static memory allocations such as for example entity creation?
  • Should the callback need a void * args?
  • Should the callback be global or specific for each subscriber / requester / replier ?
  • Should the deserializing function pointer be passed as an argument to the callback to allow the user just-in-time deserialization of the data? (In this case, the user should know with is the related rcl_subscription).
  • Should this feature be optional? Have a macro for not building it?

CC: @Acuadros95 @bjv-capra

@pablogs9 pablogs9 force-pushed the feature/memory_error_handling branch from 8c8dcf5 to e83c6b7 Compare November 29, 2021 11:50
@micro-ROS micro-ROS deleted a comment from codecov-commenter Nov 29, 2021
@micro-ROS micro-ROS deleted a comment from github-actions bot Nov 29, 2021
@pablogs9
Copy link
Copy Markdown
Member Author

@mergify backport main foxy

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Nov 29, 2021

backport main foxy

🟠 Waiting for conditions to match

Details
  • merged [:pushpin: backport requirement]

Hey, I reacted but my real name is @Mergifyio

@micro-ROS micro-ROS deleted a comment from github-actions bot Nov 29, 2021
@micro-ROS micro-ROS deleted a comment from codecov-commenter Nov 29, 2021
@micro-ROS micro-ROS deleted a comment from github-actions bot Nov 29, 2021
@micro-ROS micro-ROS deleted a comment from github-actions bot Nov 29, 2021
Comment thread rmw_microxrcedds_c/src/callbacks.c Outdated
Comment thread rmw_microxrcedds_c/src/callbacks.c Outdated
@bjv-capra
Copy link
Copy Markdown

Should it also be triggered in other static memory allocations such as for example entity creation?

As a user, I'd like to be able to enable some error callbacks and use them. i.e. static memory, but perhaps in the future, there could be also other interesting errors I'd like to get a callback when triggered. To be able to i.e. push them over diagnostics.

Should the callback need a void * args?

Any specific use case? For it to have different structs (i.e. per entity)?

Should the callback be global or specific for each subscriber / requester / replier ?

Maybe I'm wrong, but wouldn't it be possible to make the callback global and pass the id of the entity who triggered it as an argument to the user. ! callback per entity seems like it will bloat the code.

Should the deserializing function pointer be passed as an argument to the callback to allow the user just-in-time deserialization of the data? (In this case, the user should know with is the related rcl_subscription).

I don't know about this, I think that if I'd like to debug, then I can attach a debugger. This is mostly to escalate a run-time warning.

Should this feature be optional? Have a macro for not building it?

I think it'd be preferable for this functionality to be optional.

Lastly, yet important, I think that it should be error code-based rather than string comparisons sent to the user.

@pablogs9
Copy link
Copy Markdown
Member Author

As a user, I'd like to be able to enable some error callbacks and use them. i.e. static memory, but perhaps in the future, there could be also other interesting errors I'd like to get a callback when triggered. To be able to i.e. push them over diagnostics.

Agree. I'm going to make this generic not only for the memory errors.

Any specific use case? For it to have different structs (i.e. per entity)?

No specific use case. I guess that if this is a global callback, maybe it does not need context or args.

Maybe I'm wrong, but wouldn't it be possible to make the callback global and pass the id of the entity who triggered it as an argument to the user. ! callback per entity seems like it will bloat the code.

Sure, let pass the error context as callback arguments.

I think it'd be preferable for this functionality to be optional.

Sure.

Lastly, yet important, I think that it should be error code-based rather than string comparisons sent to the user.

+1

@pablogs9 pablogs9 changed the title Static memory handling API Runtime error handling API Nov 30, 2021
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Uncrustify

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Uncrus

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>
@pablogs9 pablogs9 force-pushed the feature/memory_error_handling branch from dc64011 to 5a12fc4 Compare December 3, 2021 07:51
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 3, 2021

Static memory analysis

Default configuration

MTU: 512 B
Input buffer size: 2048 B
Input history: 4
Output buffer size: 2048 B
Output history: 4

Entity Qty Size per unit
Context 2 5616 B
Topic 8 56 B
Service 4 248 B
Client 4 248 B
Subscription 4 272 B
Publisher 4 288 B
Node 4 208 B
Static input buffer 8 2136 B
Init options 6 64 B
Wait sets 4 56 B
Guard Condition 4 64 B

TOTAL: 34468 B

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Dec 3, 2021

Codecov Report

❌ Patch coverage is 0% with 166 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (galactic@bb581b6). Learn more about missing BASE report.

Files with missing lines Patch % Lines
rmw_microxrcedds_c/src/rmw_publisher.c 0.00% 20 Missing and 2 partials ⚠️
rmw_microxrcedds_c/src/rmw_subscription.c 0.00% 17 Missing and 2 partials ⚠️
rmw_microxrcedds_c/src/rmw_service.c 0.00% 15 Missing and 1 partial ⚠️
rmw_microxrcedds_c/src/rmw_client.c 0.00% 14 Missing and 1 partial ⚠️
rmw_microxrcedds_c/src/rmw_node.c 0.00% 5 Missing and 5 partials ⚠️
rmw_microxrcedds_c/src/rmw_take.c 0.00% 8 Missing and 1 partial ⚠️
rmw_microxrcedds_c/src/rmw_microros/init_options.c 0.00% 7 Missing ⚠️
rmw_microxrcedds_c/src/rmw_publish.c 0.00% 7 Missing ⚠️
rmw_microxrcedds_c/src/rmw_init.c 0.00% 6 Missing ⚠️
rmw_microxrcedds_c/src/rmw_microros/time_sync.c 0.00% 5 Missing ⚠️
... and 26 more
Additional details and impacted files
@@             Coverage Diff             @@
##             galactic     #212   +/-   ##
===========================================
  Coverage            ?   49.39%           
===========================================
  Files               ?       43           
  Lines               ?     1660           
  Branches            ?      478           
===========================================
  Hits                ?      820           
  Misses              ?      617           
  Partials            ?      223           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pablogs9
Copy link
Copy Markdown
Member Author

pablogs9 commented Dec 3, 2021

Two macros are included in the RMW codebase: RMW_UROS_TRACE_ERROR and RMW_UROS_TRACE_MESSAGE. The former takes one string description and the first takes arguments for identifiying entity, error source, description and rmw_uros_error_context_t.

Both of them will print a message using RMW_SET_ERROR_MSG and will call a callback of type rmw_uros_error_handling if it has been set with rmw_uros_set_error_handling_callback.

An entity can be:

typedef enum
{
  RMW_UROS_ERROR_ON_UNKNOWN = 0,
  RMW_UROS_ERROR_ON_NODE,
  RMW_UROS_ERROR_ON_SERVICE,
  RMW_UROS_ERROR_ON_CLIENT,
  RMW_UROS_ERROR_ON_SUBSCRIPTION,
  RMW_UROS_ERROR_ON_PUBLISHER,
  RMW_UROS_ERROR_ON_GRAPH,
  RMW_UROS_ERROR_ON_GUARD_CONDITION,
  RMW_UROS_ERROR_ON_TOPIC
} rmw_uros_error_entity_type_t;

in the case of the basic RMW_UROS_TRACE_MESSAGE entity will be RMW_UROS_ERROR_ON_UNKNOWN

An source can be:

typedef enum
{
  RMW_UROS_ERROR_ENTITY_CREATION = 0,
  RMW_UROS_ERROR_ENTITY_DESTRUCTION,
  RMW_UROS_ERROR_CHECK,
  RMW_UROS_ERROR_NOT_IMPLEMENTED,
  RMW_UROS_ERROR_MIDDLEWARE_ALLOCATION,
} rmw_uros_error_source_t;

in the case of the basic RMW_UROS_TRACE_MESSAGE entity will be RMW_UROS_ERROR_CHECK.

When using RMW_UROS_TRACE_ERROR the context will be:

typedef struct
{
  const char * node;
  const char * namespace;
  const char * topic_name;
  const ucdrBuffer * ucdr;
  const size_t size;
  union {
    const message_type_support_callbacks_t * message_callbacks;
    const service_type_support_callbacks_t * service_callbacks;
  } type_support;
  const char * description;
} rmw_uros_error_context_t;

in the case of the basic RMW_UROS_TRACE_MESSAGE context will be (rmw_uros_error_context_t) {0}.

Additionally, the callback will receive char * file and int line for identifying where is the origin of the trace.

Thoughts @bjv-capra @Acuadros95?

@pablogs9 pablogs9 requested a review from Acuadros95 December 3, 2021 09:07
@pablogs9 pablogs9 marked this pull request as ready for review December 3, 2021 09:08
Comment thread rmw_microxrcedds_c/include/rmw_microxrcedds_c/rmw_c_macros.h
@bjv-capra
Copy link
Copy Markdown

Overall, I think it's really well aligned with what we discussed.
Some general comments

  • w.r.t. RMW_UROS_TRACE_ERROR not being used. Aren't most of the errors known? At least based on the text on the errors, it'd seem so. So can't RMW_UROS_TRACE_ERROR be used instead in most of the cases where RMW_UROS_TRACE_MESSAGE is being used?
  • w.r.t error codes. Wouldn't this be an excellent moment to standardize the error codes/messages? Besides, All the messages have different lengths and I'm not a huge fan of having to do strcmp (or alike) on an embedded system. Could it be possible to return error codes, and if text debug is enabled (or some other special thing) actually use the text?

@pablogs9
Copy link
Copy Markdown
Member Author

pablogs9 commented Dec 9, 2021

w.r.t. RMW_UROS_TRACE_ERROR not being used. Aren't most of the errors known? At least based on the text on the errors, it'd seem so. So can't RMW_UROS_TRACE_ERROR be used instead in most of the cases where RMW_UROS_TRACE_MESSAGE is being used?

Basically, I have replaced RMW_SET_ERROR_MSG with RMW_UROS_TRACE_MESSAGE along with all the codebase in order to, not only set the error but also call the callback with some information. The only places where the new approach has been implemented are in callbacks.c. In other places, we are just leveraging the previous error check messages.

w.r.t error codes. Wouldn't this be an excellent moment to standardize the error codes/messages? Besides, All the messages have different lengths and I'm not a huge fan of having to do strcmp (or alike) on an embedded system. Could it be possible to return error codes, and if text debug is enabled (or some other special thing) actually use the text?

In general you should expect the error to be defined by rmw_uros_error_entity_type_t and rmw_uros_error_source_t. In the case of data loss in the middleware callbacks due to a lack of history space, the error should be identified by an RMW_UROS_ERROR_MIDDLEWARE_ALLOCATION in RMW_UROS_ERROR_ON_SUBSCRIPTION for example. The text string is only for "printf purposes" and it is not intended to be an identifier of the error.

@bjv-capra
Copy link
Copy Markdown

In such a case, everything seems to be addressed. It'll be a matter of testing to be able to provide better feedback. But we'll wait until it's merged to start using it.
LGTM

Comment thread rmw_microxrcedds_c/include/rmw_microros/error_handling.h
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>
@pablogs9
Copy link
Copy Markdown
Member Author

Please re-review, regarding resetting the error. Why is the error being overwritten? @Acuadros95

@pablogs9 pablogs9 requested a review from Acuadros95 December 17, 2021 10:01
Signed-off-by: Pablo Garrido <pablogs9@gmail.com>
@github-actions
Copy link
Copy Markdown

Static memory analysis

Default configuration

MTU: 512 B
Input buffer size: 2048 B
Input history: 4
Output buffer size: 2048 B
Output history: 4

Entity Qty Size per unit
Context 2 5616 B
Topic 8 56 B
Service 4 248 B
Client 4 248 B
Subscription 4 272 B
Publisher 4 288 B
Node 4 208 B
Static input buffer 8 2136 B
Init options 6 64 B
Wait sets 4 56 B
Guard Condition 4 64 B

TOTAL: 34468 B

@Acuadros95
Copy link
Copy Markdown
Contributor

@pablogs9 The error string needs to be cleared manually, so the user can have access to it after a function fails.
But it we notify the error to user space, I think we can clear it safely after the callback.

@pablogs9 pablogs9 self-assigned this Dec 17, 2021
@pablogs9
Copy link
Copy Markdown
Member Author

Ok, could you add the clearing line?

@Acuadros95
Copy link
Copy Markdown
Contributor

Done

@github-actions
Copy link
Copy Markdown

Static memory analysis

Default configuration

MTU: 512 B
Input buffer size: 2048 B
Input history: 4
Output buffer size: 2048 B
Output history: 4

Entity Qty Size per unit
Context 2 5616 B
Topic 8 56 B
Service 4 248 B
Client 4 248 B
Subscription 4 272 B
Publisher 4 288 B
Node 4 208 B
Static input buffer 8 2136 B
Init options 6 64 B
Wait sets 4 56 B
Guard Condition 4 64 B

TOTAL: 34468 B

@pablogs9
Copy link
Copy Markdown
Member Author

Ok, so LGTM, let's merge when you are done @Acuadros95

@Acuadros95 Acuadros95 merged commit 266248d into galactic Dec 17, 2021
@Acuadros95 Acuadros95 deleted the feature/memory_error_handling branch December 17, 2021 10:40
mergify bot pushed a commit that referenced this pull request Dec 17, 2021
* Initial

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Uncrustify

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Uncrus

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Refactor

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Remove include old error handler

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Client

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Uncrustify

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Fix

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Fix

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Fix

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update rmw_microxrcedds_c/src/rmw_microros_internal/error_handling_internal.h

Co-authored-by: Antonio Cuadros <49162117+Acuadros95@users.noreply.github.com>
(cherry picked from commit 266248d)
@pablogs9
Copy link
Copy Markdown
Member Author

@mergify backport foxy main

mergify bot pushed a commit that referenced this pull request Dec 17, 2021
* Initial

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Uncrustify

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Uncrus

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Refactor

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Remove include old error handler

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Client

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Uncrustify

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Fix

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Fix

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Fix

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update rmw_microxrcedds_c/src/rmw_microros_internal/error_handling_internal.h

Co-authored-by: Antonio Cuadros <49162117+Acuadros95@users.noreply.github.com>
(cherry picked from commit 266248d)

# Conflicts:
#	rmw_microxrcedds_c/src/rmw_get_endpoint_network_flow.c
#	rmw_microxrcedds_c/src/rmw_qos_profile_check_compatible.c
#	rmw_microxrcedds_c/src/rmw_wait.c
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Dec 17, 2021

backport main foxy

✅ Backports have been created

Details

@pablogs9 pablogs9 removed their assignment Dec 17, 2021
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Dec 17, 2021

backport foxy main

✅ Backports have been created

Details

Hey, I reacted but my real name is @Mergifyio

pablogs9 added a commit that referenced this pull request Dec 17, 2021
* Initial

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Uncrustify

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Uncrus

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Refactor

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Remove include old error handler

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Client

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Uncrustify

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Fix

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Fix

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Fix

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update rmw_microxrcedds_c/src/rmw_microros_internal/error_handling_internal.h

Co-authored-by: Antonio Cuadros <49162117+Acuadros95@users.noreply.github.com>
(cherry picked from commit 266248d)

Co-authored-by: Pablo Garrido <pablogs9@gmail.com>
pablogs9 added a commit that referenced this pull request Dec 17, 2021
* Runtime error handling API (#212)

* Initial

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Uncrustify

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Uncrus

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Refactor

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Remove include old error handler

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Client

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Uncrustify

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Fix

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Fix

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Fix

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Update rmw_microxrcedds_c/src/rmw_microros_internal/error_handling_internal.h

Co-authored-by: Antonio Cuadros <49162117+Acuadros95@users.noreply.github.com>
(cherry picked from commit 266248d)

# Conflicts:
#	rmw_microxrcedds_c/src/rmw_get_endpoint_network_flow.c
#	rmw_microxrcedds_c/src/rmw_qos_profile_check_compatible.c
#	rmw_microxrcedds_c/src/rmw_wait.c

* Fix conflicts

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

* Revert "Fix conflicts"

This reverts commit 1db99e0.

* Fix conflicts

Signed-off-by: Pablo Garrido <pablogs9@gmail.com>

Co-authored-by: Pablo Garrido <pablogs9@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants