I noticed that with the default psi4 configuration, the analytical gradient might have large errors. See psi4/psi4#3161.
Then I looked at whether the SPICE dataset has such a problem. I found yes. Here is an example:
>> import h5py
>> import numpy as np
>> f=h5py.File("SPICE-1.1.3.hdf5")
>>> f["[cl-] [k+]"]["dft_total_gradient"][:]
array([[[-0.11935428, -0.05381549, -0.04868505],
[ 0.11906447, 0.05368482, 0.04856684]],
[[-0.07920539, -0.03571265, -0.03230853],
[ 0.07944967, 0.03582279, 0.03240817]],
[[-0.05042415, -0.02273561, -0.02056828],
[ 0.05006452, 0.02257346, 0.02042158]],
[[-0.02906785, -0.01310628, -0.01185692],
[ 0.02893096, 0.01304456, 0.01180108]],
[[-0.01390282, -0.00626859, -0.00567105],
[ 0.01385674, 0.00624782, 0.00565226]],
[[-0.00304641, -0.00137359, -0.00124264],
[ 0.00291706, 0.00131527, 0.00118988]],
[[ 0.00452754, 0.0020414 , 0.0018468 ],
[-0.00475733, -0.00214501, -0.00194053]],
[[ 0.00991673, 0.00447132, 0.00404509],
[-0.00998873, -0.00450378, -0.00407446]],
[[ 0.01348626, 0.00608075, 0.00550111],
[-0.01354828, -0.00610872, -0.00552641]],
[[ 0.01583315, 0.00713894, 0.00645845],
[-0.01597644, -0.00720355, -0.0065169 ]],
[[ 0.01726558, 0.00778484, 0.00704274],
[-0.01739115, -0.00784146, -0.00709396]]], dtype=float32)
>>> np.sum(f["[cl-] [k+]"]["dft_total_gradient"][:],axis=1)
array([[-2.8980523e-04, -1.3067201e-04, -1.1821464e-04],
[ 2.4427474e-04, 1.1014193e-04, 9.9644065e-05],
[-3.5963207e-04, -1.6215444e-04, -1.4669634e-04],
[-1.3688765e-04, -6.1720610e-05, -5.5837445e-05],
[-4.6082772e-05, -2.0778272e-05, -1.8797349e-05],
[-1.2934324e-04, -5.8319303e-05, -5.2759773e-05],
[-2.2978242e-04, -1.0360568e-04, -9.3729002e-05],
[-7.1995892e-05, -3.2461714e-05, -2.9367395e-05],
[-6.2023290e-05, -2.7965289e-05, -2.5299378e-05],
[-1.4329515e-04, -6.4610038e-05, -5.8451667e-05],
[-1.2556836e-04, -5.6618359e-05, -5.1220413e-05]], dtype=float32)
The sum gradients over atoms indicate that some gradients have at least the error of 2e-4 Hatree/Bohr (~0.23 kcal/mol/A), which looks quite large for machine learning training.
I suggest rerunning these points using a tighter integration grid, as suggested in psi4/psi4#3161.
I noticed that with the default psi4 configuration, the analytical gradient might have large errors. See psi4/psi4#3161.
Then I looked at whether the SPICE dataset has such a problem. I found yes. Here is an example:
The sum gradients over atoms indicate that some gradients have at least the error of 2e-4 Hatree/Bohr (~0.23 kcal/mol/A), which looks quite large for machine learning training.
I suggest rerunning these points using a tighter integration grid, as suggested in psi4/psi4#3161.