Add Calibration.from_data() by philsmt · Pull Request #373 · European-XFEL/EXtra

philsmt · 2025-08-22T15:25:22Z

One of the original ideas behind the CalibrationData API is to integrate the determination of detector condition from data, i.e. the various properties of control devices and/or the data itself. Due to various changes over the years to hardware and/or software, this can unfortunately be somewhat messy and is nowadays encoded in calibration notebooks, amongst other places.

This is an attempt to bring this functionality to the CalibrationData API in EXtra. Ideally, it makes obtaining calibration data as simply as:

run = open_run(...)
cd = CalibrationData.from_data(run)

As it happens, the new design lends itself to this quite nicely by separating the detector-specific and -agnostic parts as before across the CalibrationData class and ConditionsBase implementation:

Each detector-specific ConditionsBase object gains a .from_data() class method taking an extra_data.DataCollection as well as the CalCat detector name. From that, it attempts to initialize itself by inferring the necessary fields from the data. For AGIPDConditions e.g., its signature is:
```
@classmethod
def from_data(cls, data, detector_name, modules=None,
              fpga_comp=None, mpod=None, fpga_control=None, xtdf=None,
              client=None, **params):
```
fpga_comp, mpod, fpga_control, xtdf are inferred from the detector entry in CalCat and what's in data, or can be overwritten manually. The same applies to the modules list, which allows taking only the given modules into account (e.g. when a module is powered down, but part of the data).

**params can be any of the regular parameter conditions, and prevents it from being inferred from data. The object attempts to be as greedy as possible and determine it from any applicable source, e.g. acquisition rate can be taken from the AGIPD FPGA comp device or the difference of pulse IDs in the data stream. If the sources in data are insufficient to construct the condition, an exception is thrown. This can be prevented by passing the missing parameters manually.

For diagnostics, the actual extraction of any given parameter from data is implemented in its own static method, but that is an optional pattern.
The CalibrationData class itself gains a .from_data() class method taking an extra_data.DataCollection and optionally a detector name. From that, it attempts to infer the detector name if missing via EXtra-data components, its detector type via CalCat and this pick the corresponding ConditionsBase type. Also, it infers begin_at from the data and thus attempt to initialize a proper .CalibrationData object via .from_condition. Any additional keyword arguments are passed to the conditions object, e.g. to overwrite parameters or sources.

For now, I implemented the necessary code to infer the condition for AGIPD, LPD and Jungfrau based on what we use in pycalibration, including (most) of its idiosyncracies. For AGIPD, I isolated the exact pycalibration code and ran it against this new API for a sample run of every proposal an AGIPD correction ever ran on with full agreement on all of them.

You can find a minimal example showcasing several situations here.

src/extra/calibration.py

takluyver · 2025-09-25T12:03:50Z

src/extra/calibration.py

 CALCAT_PROXY_URL = "http://exflcalproxy.desy.de:8080/"


+def any_is_none(*values):


Not remotely important to the overall PR, but this can be one line with the builtin any() function.

Certainly, but explicitly checking for is None seemed to become fairly clunky: any([x is None for x in [a, b, c]])

As I was already writing a function for it, the allocation cost for the comprehension seemed unnecessary.

src/extra/calibration.py

takluyver · 2025-09-25T16:25:51Z

src/extra/calibration.py

        self.constant_groups = constant_groups
        self.module_details = module_details
        self.detector_name = detector_name
+        self.condition = condition


Do we have a use case in mind, or is this a general 'don't throw away information'?

If we don't have a concrete use case, I might be inclined to get rid of it, or make it ._condition, just for debugging. Making it easy to assume constants always come with a condition object could make it harder to use the flexibility of finding constants in other ways - like from a report in CalCat, from the correction metadata of a run, or combining different groups of constants.

It should also be possible to look up the conditions associated with each individual constant in CalCat, at least where we have the IDs, which is a more general option.

The idea was to expose the entire condition inferred from data, i.e. access the generated ConditionsBase object. I didn't consider looking into individual CCVs for their condition. Technically one could ConditionsBase.from_data() before to obtain it, but that means leaving out on inferring detector type and creation date from data.

takluyver · 2025-09-25T16:58:01Z

src/extra/calibration.py

+        self.sources = sources
+
+    def __str__(self):
+        msg = 'required parameters could not be inferred from data: ' + \


Our other error messages generally start with a capital letter (although admittedly this isn't as general a convention as I thought it was - Python itself seems to prefer lowercase)

The latter is indeed the reason I generally try to stick with lowercase, but you're right EXtra-data mostly uses upper cases with EXtra being somewhere in the middle.

Would you prefer to set a general rule?

src/extra/calibration.py

philsmt · 2025-12-10T13:49:08Z

Rebased on top of DetectorData, old HEAD was 9914ee67428c63d9d0f2219e4a5164c18f2ee38e

codecov · 2025-12-10T13:53:37Z

Codecov Report

❌ Patch coverage is 65.26316% with 99 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.74%. Comparing base (6c1c1b5) to head (feef5da).

Files with missing lines	Patch %	Lines
src/extra/calibration.py	65.26%	99 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #373      +/-   ##
==========================================
- Coverage   73.14%   72.74%   -0.40%     
==========================================
  Files          35       35              
  Lines        6248     6521     +273     
==========================================
+ Hits         4570     4744     +174     
- Misses       1678     1777      +99

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/extra/calibration.py

philsmt · 2026-04-09T16:28:16Z

Rebased on top of the main branch after merging DetectorData, now to add some tests.

philsmt · 2026-04-10T10:57:24Z

I've started adding tests replicating the ones done in pycalibration for operating condition detection from data. This has presented complications:

The mock data framework always uses the current time as train timestamps, which ends up in the CalCat query and thus breaks the cassettes.
All relevant parts of the mock data code is in EXtra-data, which makes adjusting it to cover more test cases difficult.
EXtra-data itself does not actually use that mock data for its tests

We have encountered the latter issue already in pycalibration. The proper solution to me would be to move all that mock data code to EXtra, including the mock devices only used in those. I would prefer to not grow this MR even more however, so I'd like to leave it with these tests for now and then move stuff over.

philsmt requested a review from takluyver August 22, 2025 15:25

philsmt changed the title ~~Add Calibration.from_data()~~ Draft: Add Calibration.from_data() Aug 22, 2025

philsmt changed the title ~~Draft: Add Calibration.from_data()~~ Add Calibration.from_data() Aug 22, 2025

philsmt marked this pull request as draft August 22, 2025 18:37

takluyver reviewed Aug 26, 2025

View reviewed changes

src/extra/calibration.py Outdated Show resolved Hide resolved

takluyver reviewed Sep 25, 2025

View reviewed changes

philsmt force-pushed the feat/caldata-from-data branch from f6e8ecc to a8e0ca1 Compare September 26, 2025 12:38

philsmt added the enhancement New feature or request label Oct 27, 2025

philsmt force-pushed the feat/caldata-from-data branch from a8e0ca1 to 33a54f5 Compare October 28, 2025 15:21

philsmt force-pushed the feat/caldata-from-data branch from 9914ee6 to c793eee Compare December 10, 2025 13:49

philsmt changed the base branch from master to feat/detectordata December 10, 2025 13:49

takluyver mentioned this pull request Dec 12, 2025

Add CalibrationData.reports_info() method #430

Draft

philsmt force-pushed the feat/detectordata branch 3 times, most recently from b7cd956 to 427cceb Compare December 23, 2025 11:59

turkot reviewed Feb 25, 2026

View reviewed changes

src/extra/calibration.py Outdated Show resolved Hide resolved

turkot reviewed Feb 25, 2026

View reviewed changes

src/extra/calibration.py Outdated Show resolved Hide resolved

philsmt force-pushed the feat/detectordata branch from 93146aa to caeac0c Compare March 20, 2026 14:22

philsmt force-pushed the feat/caldata-from-data branch from 6753085 to f42f922 Compare March 20, 2026 14:25

philsmt force-pushed the feat/detectordata branch from caeac0c to 7868e3e Compare March 23, 2026 13:36

takluyver force-pushed the feat/detectordata branch from 7868e3e to 830bfc7 Compare April 2, 2026 08:36

philsmt force-pushed the feat/detectordata branch from 830bfc7 to 8767ba3 Compare April 9, 2026 13:49

Base automatically changed from feat/detectordata to master April 9, 2026 15:25

philsmt force-pushed the feat/caldata-from-data branch from 61867de to e1524a2 Compare April 9, 2026 16:27

Add source names to DetectorData

fbd029c

philsmt force-pushed the feat/caldata-from-data branch from e1524a2 to 4c02cdd Compare April 10, 2026 08:57

philsmt force-pushed the feat/caldata-from-data branch from e382416 to 4b10e0c Compare April 10, 2026 09:16

philsmt added 7 commits April 10, 2026 12:08

Set default source_energy for AGIPD condition to 9.2

d3cbcff

Add AGIPDConditions.from_data()

9877499

Add LPDConditions.from_data()

c751915

Add JUNGFRAUConditions.from_data()

76058b3

Add CalibrationData.from_data()

85dd4b5

Change AGIPDConditions to use voltage setpoint rather than measurement

7921e64

Add changelog entry for CalibrationData.from_data

feef5da

philsmt force-pushed the feat/caldata-from-data branch from 4b10e0c to feef5da Compare April 10, 2026 10:15

philsmt marked this pull request as ready for review April 10, 2026 10:57

		CALCAT_PROXY_URL = "http://exflcalproxy.desy.de:8080/"


		def any_is_none(*values):

Conversation

philsmt commented Aug 22, 2025

Uh oh!

Uh oh!

takluyver Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

philsmt Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

takluyver Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

philsmt Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

takluyver Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

philsmt Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

philsmt commented Dec 10, 2025

Uh oh!

codecov bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

philsmt commented Apr 9, 2026

Uh oh!

philsmt commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Dec 10, 2025 •

edited

Loading