Confused about the concepts of model bias and loss function in DP-GEN #1872

AlexSU1001 · 2026-03-20T09:35:41Z

AlexSU1001
Mar 20, 2026

In DP-GEN, compared to traditional machine learning models, DP-GEN utilizes the differences between multiple models to evaluate the accuracy of the models. I am a bit confused about this concept and my understanding of the loss function. Because we previously mentioned that neural network fitting is a process of minimizing the loss function — i.e., calculating the deviation between predicted data and DFT data — does that mean the loss function is used to evaluate the accuracy of a single model? Moreover, the deviation between multiple models seems to be used to assess whether the predictions of different models for the same structure/conformation are reliable, i.e., whether that structure requires DFT calculations. For me, calculating the deviation between multiple models feels more like an evaluation of the prediction results for a structure, rather than an evaluation of the models themselves. Or, can this evaluation of prediction results for a structure be considered an evaluation of the models? This part confuses me, so I would like to seek some clarification. Thank you all for your help!

@AlexSU1001 · 2026-03-20T09:45:28Z

dosubot[bot]
Bot Mar 20, 2026

Hi @AlexSU1001! I'm Dosu and I’m helping the dpgen team.

Great question! Your understanding is essentially correct—these are two different concepts serving different purposes in the DP-GEN workflow.

Loss function is used during training to evaluate a single model's accuracy against known DFT data. It's a weighted combination of errors in energy, force, and virial predictions: L = p_e·L_e + p_f·L_f + p_v·L_v. This minimization process optimizes one model's parameters and produces metrics like force RMSE in lcurve.out. (DeePMD-kit docs)

Model deviation is used during exploration to evaluate prediction uncertainty for new structures. It's the standard deviation of predictions from multiple models (trained on the same data but with different random initializations). When models disagree significantly, it signals extrapolation—the structure is in a region the training data didn't cover well. (Model deviation docs)

You're right that model deviation evaluates structures rather than models directly. It answers: "Is this specific structure well-covered by the training data?" But indirectly, this does tell us about model quality—if the ensemble disagrees, it means the models haven't learned that configuration space reliably.

The two concepts connect through trust level thresholds:

Structures with max_devi_f < trust_lo → models are confident → no DFT needed
Structures with trust_lo ≤ max_devi_f < trust_hi → uncertain → selected for DFT labeling
Structures with max_devi_f ≥ trust_hi → too uncertain, possibly unphysical → filtered out

A practical tip: model_devi_f_trust_lo should be set close to your training force RMSE from lcurve.out, since structures below that threshold are already well-learned.

To reply, just mention @dosu.

Share context across your team and agents. Try Dosu.

6 replies

dosubot[bot] Bot Mar 20, 2026

Hi Alex! Great follow-up questions. Let me address both:

Question 1: How are failed DFT calculations handled?

Failed DFT calculations are simply discarded—no information is retained from them. For VASP calculations, DP-GEN uses the custodian library for automatic error handling, which automatically retries failed calculations (up to 3 times by default) for common issues like SCF non-convergence.

After FP calculations complete, DP-GEN checks the failure rate. The default tolerance is 5% failed tasks (ratio_failed=0.05). If failures exceed this threshold, the workflow terminates; otherwise, failed structures are discarded and the workflow continues. You can increase this tolerance to 10% by setting "ratio_failed": 0.10 in param.json if needed.

Question 2: Composition of the new training dataset

The answer is: initial_data + successfully_labeled_structures_from_all_previous_iterations

Specifically:

Structures with max_devi_f < trust_lo (model confident): NOT included. These are considered already well-learned, so no DFT labeling is needed and they don't enter training.
Structures with max_devi_f ≥ trust_hi (too uncertain): NOT included. These are filtered out as potentially unphysical.
Only candidate structures (trust_lo ≤ max_devi_f < trust_hi) that successfully complete DFT are added to training.

DP-GEN uses a fully cumulative approach: iteration 0 uses only init_data_sys, and each subsequent iteration combines all initial data with FP data from all previous iterations.

To directly answer your "simply put" question: No, structures that the model "sees" during exploration but are NOT selected for DFT labeling (either confident or filtered out) do not contribute to subsequent training. They're logged in shuffle files for diagnostics (overview docs) but are typically cleaned up afterward to save disk space.

To reply, just mention @dosu.

Share context across your team and agents. Try Dosu.

AlexSU1001 Mar 21, 2026
Author

Hi, @dousu!

Apologies for the delayed response—I've been occupied with other matters and didn't get a chance to reply sooner.

Thank you again for your detailed explanations earlier. They helped me clearly understand the distinction between the loss function and model deviation, as well as how the trust thresholds connect the two concepts. Your insights were incredibly helpful!

Regarding your points about failed DFT handling and the composition of the new training dataset, I later saw the actual dataset composition in the training logs, which confirmed what you said: "initial data + successfully labeled structures from all previous iterations" is fully cumulative. This gave me a much deeper understanding of DP-GEN's overall design.

While working through the actual process, I came up with another question I'd like to ask:

In the CH4 example from the manual, the initial data for the init stage is generated from a single methane molecular structure. I'm curious about how DP-GEN ensures the quality of this initial data. If some physically unreasonable configurations (e.g., bond lengths significantly deviating from equilibrium due to excessive perturbation) accidentally end up in the initial dataset, do they remain in the training set forever? Or does DP-GEN have some mechanism in later iterations to "weed out" or down-weight such low-quality data?

Looking forward to your thoughts. Thank you again for your time and help!

dosubot[bot] Bot Mar 21, 2026

Hi Alex! No worries about the delay—glad the previous explanations were helpful!

This is an excellent question about initial data quality. Here's what I found:

Short answer: DP-GEN does not retroactively remove initial data from previous iterations. However, several mechanisms help mitigate the impact of low-quality initial configurations:

1. Prevention at init stage:
The perturbation parameters are designed to be conservative. The Practical Guidelines recommend:

pert_box ≈ 0.03 (±3% cell perturbation)
pert_atom ≈ 0.01 Å (should be <1% of nearest-neighbor bond length)

These small values help avoid generating unphysical configurations in the first place.

2. DFT failure filtering:
Unphysical structures often fail DFT convergence (e.g., VASP errors like "BRMIX: very serious problems" when atoms are too close). These are automatically excluded from training data—DP-GEN tolerates up to 5% failures by default.

3. Iterative dilution:
As iterations progress, high-quality exploration data progressively dilutes the proportion of initial data in the cumulative training set. This naturally reduces the influence of any problematic initial configurations.

4. No explicit "weeding out" mechanism:
Unfortunately, there's no built-in mechanism to down-weight or remove specific data points once they're in the training set. If you suspect significant quality issues with your initial data, the practical solution is to manually inspect and clean the data before starting, or restart with better-prepared initial data.

Best practices for generating quality initial data:

Use conservative perturbation values appropriate for your system
Validate initial configurations by checking minimum interatomic distances (e.g., no atoms closer than ~0.7 Å, bond lengths within reasonable ranges)
Include temperature diversity in AIMD-generated data
Skip equilibration frames when loading AIMD trajectories
Critically: never use fixed atom constraints in initial AIMD—this is the most common cause of persistent data quality issues

For the CH4 example, the small perturbations should keep bond lengths reasonable, but for your own systems, it's worth validating the initial data before starting the run workflow.

To reply, just mention @dosu.

Share context across your team and agents. Try Dosu.

AlexSU1001 Mar 21, 2026
Author

Hi, dousu!

Thank you once again for such a detailed and patient response! You not only answered my specific questions but also provided many practical tips and important considerations—I truly appreciate it.

Your explanation about initial data quality really cleared up my confusion. In particular, the "iterative dilution" mechanism you mentioned helped me understand that problematic configurations in the initial data won't dominate the training forever; their impact naturally diminishes as high-quality new data is added over time. This gave me a much deeper appreciation for the design of DP-GEN's active learning loop.

I also found your suggestions on preventive measures, DFT failure filtering, and the caution about fixed atom constraints very practical—I'll be sure to pay attention to these in my future work.

Thank you again for your patience and for all your help!

AlexSU1001 Mar 21, 2026
Author

@dousu,

Apologies for bothering you again. In my last reply, I intended to ask a question but ended up forgetting to include it. If you have a moment, I'd like to ask:

Regarding the quality of initial data, I'm curious: in DP-GEN's design, can there be duplicates between newly labeled structures from different DFT batches, or between new structures and the initial data? If duplicates occur, does DP-GEN have a deduplication mechanism? Or does duplicate data have minimal impact on model training, making it unnecessary to handle specially?

Sorry to bother you again—I truly appreciate your patience!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confused about the concepts of model bias and loss function in DP-GEN #1872

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Confused about the concepts of model bias and loss function in DP-GEN #1872

Uh oh!

AlexSU1001 Mar 20, 2026

Replies: 1 comment · 6 replies

Uh oh!

Uh oh!

dosubot[bot] Bot Mar 20, 2026

Uh oh!

Uh oh!

dosubot[bot] Bot Mar 20, 2026

Uh oh!

AlexSU1001 Mar 21, 2026 Author

Uh oh!

Uh oh!

dosubot[bot] Bot Mar 21, 2026

Uh oh!

AlexSU1001 Mar 21, 2026 Author

Uh oh!

AlexSU1001 Mar 21, 2026 Author

AlexSU1001
Mar 20, 2026

Replies: 1 comment 6 replies

dosubot[bot]
Bot Mar 20, 2026

AlexSU1001 Mar 21, 2026
Author

AlexSU1001 Mar 21, 2026
Author

AlexSU1001 Mar 21, 2026
Author