Skip to content

Loading batcher extension#305

Draft
jeffiuliano wants to merge 8 commits intomasterfrom
loading_batcher_extension
Draft

Loading batcher extension#305
jeffiuliano wants to merge 8 commits intomasterfrom
loading_batcher_extension

Conversation

@jeffiuliano
Copy link
Copy Markdown
Contributor

It will be useful to be able to batch subsets of observation, so this extends get_batch to take start, end times and loop only through that piece of the observation.

Other changes:

  • get_batch can take an Observation object and not just an obs_id string (for convenience)
  • Since I'm now using the make_datetime tool in get_batch too, it seemed better to make it it's own function that can be imported rather than a static method of G3tSmurf.

start = obs.start
if end == None:
end = obe.stop
samprate = 4e3 / (status.downsample_enabled * status.downsample_factor)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this divide by 0 if status.downsample_enabled is False?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's definitely what it does. Will fix

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect status.downsample_factor is 1 when it isn't downsampled. But I may be wrong.

if specified, each entry in the list is successively passed to load the
AxisManagers as `load_file(... samples = list[i] ... )`
start: Datetime or timestamp. Begin batching at this time.
end: Datetime or timestamp. End betching at this time.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling

Arguments
----------
obs_id : string
obs_id : string, or Observatin object
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling

if start == None:
start = obs.start
if end == None:
end = obe.stop
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo. did you test this?

Copy link
Copy Markdown
Member

@kmharrington kmharrington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reviewing this piecemeal but then I realized I don't like the strategy of guessing the samples vs time with all those down sample assumptions. We know that information already we don't have to guess.

The Files table has start, stop, sample_start, and sample_stop (I see the docstring says the sample ones aren't implemented, they are actually implemented I missed that in an update).

The Frames table (files have a lists of frames) has time, and n_samples. Each frame is only a couple seconds long so you should be able to use frame edges very effectively to determine what sample ranges you want to load for a specific time range.

status = SmurfStatus.from_file(filenames[0])

if start == None:
start = obs.start
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if start is None: (I don't think == is standard for getting Nones in python)

if start is None can we just set samp_s to 0 and skip the math? that will avoid rounding errors and missed edge cases by default. Same if stop is None, just set samp_e to obs_samps.

@jeffiuliano
Copy link
Copy Markdown
Contributor Author

Oh, yeah, I like that better too. I had started by trying to just load all the timestamps and no data, which I thought (and still do) should be fast, but was actually not at all fast. But using tables is even better, good idea. I'll update to that strategy when I get a chance

@skhrg
Copy link
Copy Markdown
Member

skhrg commented Apr 7, 2023

@jeffiuliano what is status of finishing this up? Also the option to only give you one in every n batches would be useful for when we want to take a quick look at a long timestream (ie: grab the first 10 mins of each hour in a overnight TOD).

@mhasself
Copy link
Copy Markdown
Member

mhasself commented Oct 6, 2023

Hi folks -- can we retreat this to a "Draft PR", or is anyone passionate about whipping it into shape for final review in the near term?

@mhasself mhasself marked this pull request as draft December 5, 2023 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants