Wb data iss1#16
Conversation
|
added solution to issue #17 currently running tests on cluster |
… prevents unpickling errors when too large chunks of data are sent between nodes.
… prevents unpickling errors when too large chunks of data are sent between nodes.
jakobkolb
left a comment
There was a problem hiding this comment.
Nice work!!
I would like to see minor changes (cleanup and more extensive documentation) before merge to master.
|
|
||
| # METHOD 0) hopefully even faster (incl. restructering of `tasks`) | ||
| # # # --- obtaining computed tasks (ct) | ||
| with SafeHDFStore(self.path_raw) as store: |
There was a problem hiding this comment.
This needs more comprehensive inline doc to be comprehensible..
| rt = rt[rt["__computed"] == False] | ||
| tasks = rt.drop("__computed", axis=1) | ||
|
|
||
| # method 1: task dataframe |
There was a problem hiding this comment.
This needs a cleanup before merging to master
|
|
||
| # store results | ||
| # completed runs send thier (task, result) as return | ||
| self._obtain_store_function(n_return[0])(n_return[1]) |
There was a problem hiding this comment.
I suggest moving this to slave nodes, since sending large packages of data between nodes might lead to unpickling errors (in my experience). Will open issue and commit solution.
|
|
||
| return mix_names | ||
|
|
||
| def _obtain_store_function(self, task): |
There was a problem hiding this comment.
This function needs more inline documentation to be comprehensive..
|
Also, I approve of the deletion of the resaving function and everything related to it. |
…ction to the save function itself in _get_store_function() to prevent duplicate entries in computed tasks, that would lead to errors in the _get_computed_tasks() routine. 2) Changed the typecasting in the construction of the dataframe of all tasks in the _get_computed_tasks() routine to correct for weird typecasting in pandas' dataframe construction from dict.
…o wb_data_iss1 # Conflicts: # pymofa/experiment_handling.py
…un and supressing furter acess to database to make sure, it is closed, when the cluster finally terminates the run
change log:
To keep in mind:
resaving can be replaced by a second computation of a new handle (see tutorial 1 for example)
==> resaving functions and tutorial 3 can be removed from my side