Skip to content

operator threading#90

Draft
nopeslide wants to merge 1 commit into
alrevuelta:masterfrom
nopeslide:parallel_operators
Draft

operator threading#90
nopeslide wants to merge 1 commit into
alrevuelta:masterfrom
nopeslide:parallel_operators

Conversation

@nopeslide

Copy link
Copy Markdown
Collaborator

With the prepare_* functions we now have the possibility to multi-thread operator execution.
I propose a minimal change in the node_context to make this possible and would like to discuss this:
Instead of making the executer_context inside the node_context a list as discussed in #56 I would add a single pointer and flag to the node_context:

--- a/include/operators/operator.h
+++ b/include/operators/operator.h
@@ -20,6 +20,7 @@ struct node_context {
   Onnx__TensorProto  **outputs;
   operator_executer    executer;
   void                *executer_context;
+  node_context        *next;
+  bool                 threadsafe; 
 };

By doing so we can introduce new nodes to the inference which are explicitly marked as threadsafe and may be executed in parallel to the current node.
The prepare_* functions can inject as many nodes as necessary and may even specify a different executer and context for each new node.
When a node is encountered that is not threadsafe, the inference function needs to wait for all previous jobs to finish before executing the next:

--- a/src/inference.c
+++ b/src/inference.c
@@ -77,7 +77,14 @@ Onnx__TensorProto** inference(Onnx__ModelProto *model, Onnx__TensorProto **input
   for (int nodeIdx = 0; nodeIdx < model->graph->n_node; nodeIdx++)
   {
     TRACE(1, true, "Running node %d, operator=%s", nodeIdx, model->graph->node[nodeIdx]->op_type);
-    all_context[nodeIdx].executer(&all_context[nodeIdx]);
+    for (node_context *ctx = &all_context[nodeIdx]; ctx; ctx=ctx->next) {
+      if (!ctx->threadsafe) {
+        // wait for all threads to finish
+      }
+      // issue new thread
+      ctx->executer(ctx);
+    }
+    // wait for all threads to finish
   }

@nopeslide nopeslide force-pushed the parallel_operators branch from 60c8140 to 010bc95 Compare March 8, 2021 16:56
@Coderitter-GmbH

Copy link
Copy Markdown
Contributor

Its a good Idea to support threading or something else to support parallel execution. But the difficulty is, that different platforms use different libraries. Not every platform support pthread for example. How can we solve this?

@nopeslide

nopeslide commented Mar 9, 2021

Copy link
Copy Markdown
Collaborator Author

Its a good Idea to support threading or something else to support parallel execution. But the difficulty is, that different platforms use different libraries. Not every platform support pthread for example. How can we solve this?

Why should we solve this? We can provide a reference implementation with pthread and if someone wants to use sth else, he/she has to implement it. How should we know what other people are gonna use? The only thing we can do is to abstract it a little and handle the threading in one place so its is more easy to port.

@Coderitter-GmbH

Copy link
Copy Markdown
Contributor

What I meant is that i hope there will be an option to disable pthreads and run the code in sequence, or something like that. So that this lib is ready to use even when pthread is missing.

@nopeslide

nopeslide commented Mar 9, 2021

Copy link
Copy Markdown
Collaborator Author

Ah sorry, I misunderstood you there.
I would extend the inference function to take 2 function pointer.
one for issuing a thread and one for waiting for all threads to finish.
so you can pass your own thread implementation. if these are nulled they will be skipped.

@nopeslide

Copy link
Copy Markdown
Collaborator Author

@alrevuelta following your emoji reaction, you seem quite happy with this minimal approach? :D

@alrevuelta

Copy link
Copy Markdown
Owner

@alrevuelta following your emoji reaction, you seem quite happy with this minimal approach? :D

I haven't looked much into parallelising the operators but as longs as it doesn't increase much the complexity I'm fine with it :D

@nopeslide

Copy link
Copy Markdown
Collaborator Author

@alrevuelta
another option would be to actually build the dependency graph of the operators, this would also solve our static node_context array problem.

--- a/include/operators/operator.h
+++ b/include/operators/operator.h
@@ -20,6 +20,7 @@ struct node_context {
   Onnx__TensorProto  **outputs;
   operator_executer    executer;
   void                *executer_context;
+  node_context       **next;
+  node_context       **prev;
+  bool                 threadsafe; 
 };

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants