Loading batcher extension by jeffiuliano · Pull Request #305 · simonsobs/sotodlib

jeffiuliano · 2022-09-14T19:11:17Z

It will be useful to be able to batch subsets of observation, so this extends get_batch to take start, end times and loop only through that piece of the observation.

Other changes:

get_batch can take an Observation object and not just an obs_id string (for convenience)
Since I'm now using the make_datetime tool in get_batch too, it seemed better to make it it's own function that can be imported rather than a static method of G3tSmurf.

…her_extension

…ation

skhrg · 2022-09-19T14:54:36Z

+        start = obs.start
+    if end == None:
+        end = obe.stop
+    samprate = 4e3 / (status.downsample_enabled * status.downsample_factor)


Doesn't this divide by 0 if status.downsample_enabled is False?

Yeah, that's definitely what it does. Will fix

I expect status.downsample_factor is 1 when it isn't downsampled. But I may be wrong.

kmharrington · 2022-09-20T19:14:28Z

        if specified, each entry in the list is successively passed to load the
        AxisManagers as  `load_file(... samples = list[i] ... )`
+    start: Datetime or timestamp. Begin batching at this time.
+    end: Datetime or timestamp.  End betching at this time.


kmharrington · 2022-09-20T19:14:35Z

    Arguments
    ----------
-    obs_id : string
+    obs_id : string, or Observatin object


kmharrington · 2022-09-22T21:29:02Z

+    if start == None:
+        start = obs.start
+    if end == None:
+        end = obe.stop


typo. did you test this?

kmharrington

I started reviewing this piecemeal but then I realized I don't like the strategy of guessing the samples vs time with all those down sample assumptions. We know that information already we don't have to guess.

The Files table has start, stop, sample_start, and sample_stop (I see the docstring says the sample ones aren't implemented, they are actually implemented I missed that in an update).

The Frames table (files have a lists of frames) has time, and n_samples. Each frame is only a couple seconds long so you should be able to use frame edges very effectively to determine what sample ranges you want to load for a specific time range.

kmharrington · 2022-09-22T21:33:24Z

+        status = SmurfStatus.from_file(filenames[0])
+
+    if start == None:
+        start = obs.start


if start is None: (I don't think == is standard for getting Nones in python)

if start is None can we just set samp_s to 0 and skip the math? that will avoid rounding errors and missed edge cases by default. Same if stop is None, just set samp_e to obs_samps.

jeffiuliano · 2022-09-22T23:08:54Z

Oh, yeah, I like that better too. I had started by trying to just load all the timestamps and no data, which I thought (and still do) should be fast, but was actually not at all fast. But using tables is even better, good idea. I'll update to that strategy when I get a chance

skhrg · 2023-04-07T19:34:12Z

@jeffiuliano what is status of finishing this up? Also the option to only give you one in every n batches would be useful for when we want to take a quick look at a long timestream (ie: grab the first 10 mins of each hour in a overnight TOD).

mhasself · 2023-10-06T18:39:45Z

Hi folks -- can we retreat this to a "Draft PR", or is anyone passionate about whipping it into shape for final review in the near term?

jeffiuliano and others added 8 commits September 13, 2022 17:06

more flexibility with data loading batcher

dccb1d8

Added missing import

ecd2435

Define obs_id

7c03554

Merge remote-tracking branch 'origin/load_file_fix' into loading_batc…

8882523

…her_extension

get_batch can now take start, end to batch only a subset of an observ…

302e09f

…ation

typo fixing

863ed34

fixing mistakes in docstrings

a86bd1f

fixing issue when using only one of start, end

ea387e3

skhrg reviewed Sep 19, 2022

View reviewed changes

kmharrington reviewed Sep 20, 2022

View reviewed changes

kmharrington reviewed Sep 22, 2022

View reviewed changes

kmharrington requested changes Sep 22, 2022

View reviewed changes

mhasself marked this pull request as draft December 5, 2023 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading batcher extension#305

Loading batcher extension#305
jeffiuliano wants to merge 8 commits intomasterfrom
loading_batcher_extension

jeffiuliano commented Sep 14, 2022

Uh oh!

skhrg Sep 19, 2022

Uh oh!

jeffiuliano Sep 19, 2022

Uh oh!

skhrg Sep 19, 2022

Uh oh!

kmharrington Sep 20, 2022

Uh oh!

kmharrington Sep 20, 2022

Uh oh!

kmharrington Sep 22, 2022

Uh oh!

kmharrington left a comment

Uh oh!

kmharrington Sep 22, 2022

Uh oh!

jeffiuliano commented Sep 22, 2022

Uh oh!

skhrg commented Apr 7, 2023

Uh oh!

mhasself commented Oct 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jeffiuliano commented Sep 14, 2022

Uh oh!

skhrg Sep 19, 2022

Choose a reason for hiding this comment

Uh oh!

jeffiuliano Sep 19, 2022

Choose a reason for hiding this comment

Uh oh!

skhrg Sep 19, 2022

Choose a reason for hiding this comment

Uh oh!

kmharrington Sep 20, 2022

Choose a reason for hiding this comment

Uh oh!

kmharrington Sep 20, 2022

Choose a reason for hiding this comment

Uh oh!

kmharrington Sep 22, 2022

Choose a reason for hiding this comment

Uh oh!

kmharrington left a comment

Choose a reason for hiding this comment

Uh oh!

kmharrington Sep 22, 2022

Choose a reason for hiding this comment

Uh oh!

jeffiuliano commented Sep 22, 2022

Uh oh!

skhrg commented Apr 7, 2023

Uh oh!

mhasself commented Oct 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants