Skip to content

Add Calibration.from_data()#373

Open
philsmt wants to merge 8 commits intomasterfrom
feat/caldata-from-data
Open

Add Calibration.from_data()#373
philsmt wants to merge 8 commits intomasterfrom
feat/caldata-from-data

Conversation

@philsmt
Copy link
Copy Markdown
Collaborator

@philsmt philsmt commented Aug 22, 2025

One of the original ideas behind the CalibrationData API is to integrate the determination of detector condition from data, i.e. the various properties of control devices and/or the data itself. Due to various changes over the years to hardware and/or software, this can unfortunately be somewhat messy and is nowadays encoded in calibration notebooks, amongst other places.

This is an attempt to bring this functionality to the CalibrationData API in EXtra. Ideally, it makes obtaining calibration data as simply as:

run = open_run(...)
cd = CalibrationData.from_data(run)

As it happens, the new design lends itself to this quite nicely by separating the detector-specific and -agnostic parts as before across the CalibrationData class and ConditionsBase implementation:

  • Each detector-specific ConditionsBase object gains a .from_data() class method taking an extra_data.DataCollection as well as the CalCat detector name. From that, it attempts to initialize itself by inferring the necessary fields from the data. For AGIPDConditions e.g., its signature is:

    @classmethod
    def from_data(cls, data, detector_name, modules=None,
                  fpga_comp=None, mpod=None, fpga_control=None, xtdf=None,
                  client=None, **params):

    fpga_comp, mpod, fpga_control, xtdf are inferred from the detector entry in CalCat and what's in data, or can be overwritten manually. The same applies to the modules list, which allows taking only the given modules into account (e.g. when a module is powered down, but part of the data).

    **params can be any of the regular parameter conditions, and prevents it from being inferred from data. The object attempts to be as greedy as possible and determine it from any applicable source, e.g. acquisition rate can be taken from the AGIPD FPGA comp device or the difference of pulse IDs in the data stream. If the sources in data are insufficient to construct the condition, an exception is thrown. This can be prevented by passing the missing parameters manually.

    For diagnostics, the actual extraction of any given parameter from data is implemented in its own static method, but that is an optional pattern.

  • The CalibrationData class itself gains a .from_data() class method taking an extra_data.DataCollection and optionally a detector name. From that, it attempts to infer the detector name if missing via EXtra-data components, its detector type via CalCat and this pick the corresponding ConditionsBase type. Also, it infers begin_at from the data and thus attempt to initialize a proper .CalibrationData object via .from_condition. Any additional keyword arguments are passed to the conditions object, e.g. to overwrite parameters or sources.

For now, I implemented the necessary code to infer the condition for AGIPD, LPD and Jungfrau based on what we use in pycalibration, including (most) of its idiosyncracies. For AGIPD, I isolated the exact pycalibration code and ran it against this new API for a sample run of every proposal an AGIPD correction ever ran on with full agreement on all of them.

You can find a minimal example showcasing several situations here.

@philsmt philsmt requested a review from takluyver August 22, 2025 15:25
@philsmt philsmt changed the title Add Calibration.from_data() Draft: Add Calibration.from_data() Aug 22, 2025
@philsmt philsmt changed the title Draft: Add Calibration.from_data() Add Calibration.from_data() Aug 22, 2025
@philsmt philsmt marked this pull request as draft August 22, 2025 18:37
CALCAT_PROXY_URL = "http://exflcalproxy.desy.de:8080/"


def any_is_none(*values):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not remotely important to the overall PR, but this can be one line with the builtin any() function.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, but explicitly checking for is None seemed to become fairly clunky: any([x is None for x in [a, b, c]])

As I was already writing a function for it, the allocation cost for the comprehension seemed unnecessary.

self.constant_groups = constant_groups
self.module_details = module_details
self.detector_name = detector_name
self.condition = condition
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a use case in mind, or is this a general 'don't throw away information'?

If we don't have a concrete use case, I might be inclined to get rid of it, or make it ._condition, just for debugging. Making it easy to assume constants always come with a condition object could make it harder to use the flexibility of finding constants in other ways - like from a report in CalCat, from the correction metadata of a run, or combining different groups of constants.

It should also be possible to look up the conditions associated with each individual constant in CalCat, at least where we have the IDs, which is a more general option.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to expose the entire condition inferred from data, i.e. access the generated ConditionsBase object. I didn't consider looking into individual CCVs for their condition. Technically one could ConditionsBase.from_data() before to obtain it, but that means leaving out on inferring detector type and creation date from data.

self.sources = sources

def __str__(self):
msg = 'required parameters could not be inferred from data: ' + \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our other error messages generally start with a capital letter (although admittedly this isn't as general a convention as I thought it was - Python itself seems to prefer lowercase)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter is indeed the reason I generally try to stick with lowercase, but you're right EXtra-data mostly uses upper cases with EXtra being somewhere in the middle.

Would you prefer to set a general rule?

@philsmt philsmt force-pushed the feat/caldata-from-data branch from f6e8ecc to a8e0ca1 Compare September 26, 2025 12:38
@philsmt philsmt added the enhancement New feature or request label Oct 27, 2025
@philsmt philsmt force-pushed the feat/caldata-from-data branch from a8e0ca1 to 33a54f5 Compare October 28, 2025 15:21
@philsmt
Copy link
Copy Markdown
Collaborator Author

philsmt commented Dec 10, 2025

Rebased on top of DetectorData, old HEAD was 9914ee67428c63d9d0f2219e4a5164c18f2ee38e

@philsmt philsmt force-pushed the feat/caldata-from-data branch from 9914ee6 to c793eee Compare December 10, 2025 13:49
@philsmt philsmt changed the base branch from master to feat/detectordata December 10, 2025 13:49
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 10, 2025

Codecov Report

❌ Patch coverage is 65.26316% with 99 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.74%. Comparing base (6c1c1b5) to head (feef5da).

Files with missing lines Patch % Lines
src/extra/calibration.py 65.26% 99 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #373      +/-   ##
==========================================
- Coverage   73.14%   72.74%   -0.40%     
==========================================
  Files          35       35              
  Lines        6248     6521     +273     
==========================================
+ Hits         4570     4744     +174     
- Misses       1678     1777      +99     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@philsmt philsmt force-pushed the feat/detectordata branch 3 times, most recently from b7cd956 to 427cceb Compare December 23, 2025 11:59
@philsmt philsmt force-pushed the feat/detectordata branch from 93146aa to caeac0c Compare March 20, 2026 14:22
@philsmt philsmt force-pushed the feat/caldata-from-data branch from 6753085 to f42f922 Compare March 20, 2026 14:25
@philsmt philsmt force-pushed the feat/detectordata branch from caeac0c to 7868e3e Compare March 23, 2026 13:36
@takluyver takluyver force-pushed the feat/detectordata branch from 7868e3e to 830bfc7 Compare April 2, 2026 08:36
@philsmt philsmt force-pushed the feat/detectordata branch from 830bfc7 to 8767ba3 Compare April 9, 2026 13:49
Base automatically changed from feat/detectordata to master April 9, 2026 15:25
@philsmt philsmt force-pushed the feat/caldata-from-data branch from 61867de to e1524a2 Compare April 9, 2026 16:27
@philsmt
Copy link
Copy Markdown
Collaborator Author

philsmt commented Apr 9, 2026

Rebased on top of the main branch after merging DetectorData, now to add some tests.

@philsmt philsmt force-pushed the feat/caldata-from-data branch from e1524a2 to 4c02cdd Compare April 10, 2026 08:57
@philsmt philsmt force-pushed the feat/caldata-from-data branch from e382416 to 4b10e0c Compare April 10, 2026 09:16
@philsmt philsmt force-pushed the feat/caldata-from-data branch from 4b10e0c to feef5da Compare April 10, 2026 10:15
@philsmt
Copy link
Copy Markdown
Collaborator Author

philsmt commented Apr 10, 2026

I've started adding tests replicating the ones done in pycalibration for operating condition detection from data. This has presented complications:

  • The mock data framework always uses the current time as train timestamps, which ends up in the CalCat query and thus breaks the cassettes.
  • All relevant parts of the mock data code is in EXtra-data, which makes adjusting it to cover more test cases difficult.
  • EXtra-data itself does not actually use that mock data for its tests

We have encountered the latter issue already in pycalibration. The proper solution to me would be to move all that mock data code to EXtra, including the mock devices only used in those. I would prefer to not grow this MR even more however, so I'd like to leave it with these tests for now and then move stuff over.

@philsmt philsmt marked this pull request as ready for review April 10, 2026 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants