Loading batcher extension#305
Conversation
| start = obs.start | ||
| if end == None: | ||
| end = obe.stop | ||
| samprate = 4e3 / (status.downsample_enabled * status.downsample_factor) |
There was a problem hiding this comment.
Doesn't this divide by 0 if status.downsample_enabled is False?
There was a problem hiding this comment.
Yeah, that's definitely what it does. Will fix
There was a problem hiding this comment.
I expect status.downsample_factor is 1 when it isn't downsampled. But I may be wrong.
| if specified, each entry in the list is successively passed to load the | ||
| AxisManagers as `load_file(... samples = list[i] ... )` | ||
| start: Datetime or timestamp. Begin batching at this time. | ||
| end: Datetime or timestamp. End betching at this time. |
| Arguments | ||
| ---------- | ||
| obs_id : string | ||
| obs_id : string, or Observatin object |
| if start == None: | ||
| start = obs.start | ||
| if end == None: | ||
| end = obe.stop |
kmharrington
left a comment
There was a problem hiding this comment.
I started reviewing this piecemeal but then I realized I don't like the strategy of guessing the samples vs time with all those down sample assumptions. We know that information already we don't have to guess.
The Files table has start, stop, sample_start, and sample_stop (I see the docstring says the sample ones aren't implemented, they are actually implemented I missed that in an update).
The Frames table (files have a lists of frames) has time, and n_samples. Each frame is only a couple seconds long so you should be able to use frame edges very effectively to determine what sample ranges you want to load for a specific time range.
| status = SmurfStatus.from_file(filenames[0]) | ||
|
|
||
| if start == None: | ||
| start = obs.start |
There was a problem hiding this comment.
if start is None: (I don't think == is standard for getting Nones in python)
if start is None can we just set samp_s to 0 and skip the math? that will avoid rounding errors and missed edge cases by default. Same if stop is None, just set samp_e to obs_samps.
|
Oh, yeah, I like that better too. I had started by trying to just load all the timestamps and no data, which I thought (and still do) should be fast, but was actually not at all fast. But using tables is even better, good idea. I'll update to that strategy when I get a chance |
|
@jeffiuliano what is status of finishing this up? Also the option to only give you one in every n batches would be useful for when we want to take a quick look at a long timestream (ie: grab the first 10 mins of each hour in a overnight TOD). |
|
Hi folks -- can we retreat this to a "Draft PR", or is anyone passionate about whipping it into shape for final review in the near term? |
It will be useful to be able to batch subsets of observation, so this extends get_batch to take start, end times and loop only through that piece of the observation.
Other changes:
Observationobject and not just anobs_idstring (for convenience)make_datetimetool in get_batch too, it seemed better to make it it's own function that can be imported rather than a static method ofG3tSmurf.