Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 53 additions & 40 deletions dptb/nn/energy.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,47 +163,60 @@ def solve(self, h_container, s_container, kpoints: Union[list, torch.Tensor, np
eigvals_list = []
eigvecs_list = []

for k in kpoints:
hk = h_container.sample_k(k, symm=True)

if s_container is not None:
sk = s_container.sample_k(k, symm=True)
hk -= self.sigma * sk
A = hk.to_scipy(format="csr")
M = sk
else:
hk.shift(-self.sigma)
A = hk.to_scipy(format="csr")
M = None

A.sort_indices()
A.sum_duplicates()
N = A.shape[0]

# Try PARDISO first, fall back to scipy SuperLU if PARDISO fails
# (MKL PARDISO has a known bug with certain block-structured patterns)
solver = PyPardisoSolver(mtype=13)
solver.factorize(A)

def matvec(b):
return solver.solve(A, b)
# Create a single solver instance and reuse it across k-points.
# PARDISO stores LU factors in opaque `pt` handles; creating a new
# solver per k-point without calling free_memory() leaks all that
# internal MKL memory.
solver = PyPardisoSolver(mtype=13)

try:
for k in kpoints:
hk = h_container.sample_k(k, symm=True)

Op = LinearOperator((N, N), matvec=matvec, dtype=A.dtype)

try:
# Use larger NCV to help convergence, especially for clustered eigenvalues
ncv = max(2*self.neig + 1, 20)
vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
except Exception:
# Retry with larger NCV if ARPACK fails (e.g. error 3: No shifts could be applied)
# This often happens when eigenvalues are clustered near the shift
ncv = max(5*self.neig, 50)
vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)

eigvals_list.append(vals + self.sigma)
if return_eigenvectors:
eigvecs_list.append(vecs)

if s_container is not None:
sk = s_container.sample_k(k, symm=True)
hk -= self.sigma * sk
A = hk.to_scipy(format="csr")
M = sk
else:
hk.shift(-self.sigma)
A = hk.to_scipy(format="csr")
M = None

A.sort_indices()
A.sum_duplicates()
N = A.shape[0]

solver.factorize(A)

def matvec(b):
return solver.solve(A, b)

Op = LinearOperator((N, N), matvec=matvec, dtype=A.dtype)

try:
# Use larger NCV to help convergence, especially for clustered eigenvalues
ncv = max(2*self.neig + 1, 20)
vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
except Exception:
# Retry with larger NCV if ARPACK fails (e.g. error 3: No shifts could be applied)
# This often happens when eigenvalues are clustered near the shift
ncv = max(5*self.neig, 50)
vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
Comment on lines +197 to +205
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd dptb && find . -name "energy.py" -type f | head -5

Repository: deepmodeling/DeePTB

Length of output: 77


🏁 Script executed:

cat -n dptb/nn/energy.py | head -50

Repository: deepmodeling/DeePTB

Length of output: 2261


🏁 Script executed:

cat -n dptb/nn/energy.py | sed -n '150,250p'

Repository: deepmodeling/DeePTB

Length of output: 4994


🏁 Script executed:

rg "import.*eigsh|from.*eigsh" dptb/nn/energy.py -A 2 -B 2

Repository: deepmodeling/DeePTB

Length of output: 279


🏁 Script executed:

web_search
scipy.sparse.linalg.eigsh ncv parameter constraints requirements

Repository: deepmodeling/DeePTB

Length of output: 177


🌐 Web query:

scipy.sparse.linalg.eigsh ncv parameter documentation constraints

💡 Result:

The ncv parameter in scipy.sparse.linalg.eigsh specifies the number of Lanczos vectors generated. Its constraints are: ncv must be greater than k (number of desired eigenvalues) and smaller than n (matrix dimension). It is recommended that ncv > 2k. The default value is min(n, max(2k + 1, 20)).

Citations:


🌐 Web query:

scipy.sparse.linalg ArpackNoConvergence exception

💡 Result:

The scipy.sparse.linalg.ArpackNoConvergence exception is raised by ARPACK-based functions like eigsh and eigs when the iterative Arnoldi/Lanczos process fails to converge to the requested eigenvalues within the specified maximum iterations (default maxiter=n10) and tolerance (default tol=0, machine precision). Key details from official SciPy documentation (v1.17.0): - Exception class: scipy.sparse.linalg.ArpackNoConvergence(msg, eigenvalues, eigenvectors) - Attributes: eigenvalues (ndarray of converged eigenvalues), eigenvectors (ndarray of converged eigenvectors) - use these partial results if available. - Raised explicitly when convergence criteria are not met. Common causes: - Targeting non-extremal eigenvalues (e.g., which='SM' for small magnitude) - ARPACK excels at extreme eigenvalues (large magnitude). - Insufficient maxiter, tight tol, or small ncv (number of Lanczos vectors; default min(n, max(2k+1,20)); recommend ncv > 2k). - Clustered eigenvalues or ill-conditioned matrices. Handling strategies (from SciPy ARPACK tutorial and eigsh docs): 1. Increase maxiter (e.g., maxiter=5000). 2. Relax tolerance (e.g., tol=1e-2), but at precision cost. 3. Increase ncv (e.g., ncv=2k + extra). 4. Use shift-invert mode: Set sigma near target eigenvalues (e.g., sigma=0 for near-zero), which='LM' to transform interior eigenvalues to extreme ones. Requires solving linear systems internally. 5. Provide good initial vector v0. 6. Catch exception and use partial eigenvalues/eigenvectors: try: vals, vecs = eigsh(A, k=6) except scipy.sparse.linalg.ArpackNoConvergence as err: print("Converged:", len(err.eigenvalues), "of 6") vals, vecs = err.eigenvalues, err.eigenvectors 7. Alternatives: lobpcg for symmetric problems, or dense solvers for small matrices. Example from tutorial (fails with which='SM', succeeds with shift-invert): from scipy.sparse.linalg import eigsh evals_small, evecs_small = eigsh(X, 3, which='SM') # May raise ArpackNoConvergence evals_small, evecs_small = eigsh(X, 3, sigma=0, which='LM') # Succeeds faster Distinguish from ArpackError (other ARPACK issues like invalid params).

Citations:


Cap ncv by N and only retry ARPACK non-convergence.

Lines 199 and 204 can set ncv above N on small systems. According to scipy.sparse.linalg.eigsh documentation, ncv must satisfy k < ncv < N (where N is the matrix dimension). Because line 201 catches all Exception types, scipy's validation error (when ncv > N) gets masked and retried with an even larger ncv instead of surfacing immediately. Only ArpackNoConvergence (the actual convergence failure) should be caught for retry.

🐛 Proposed fix
+                if not 0 < self.neig < N:
+                    raise ValueError(f"`neig` must satisfy 0 < neig < N, got neig={self.neig}, N={N}")
+
                 try:
                     # Use larger NCV to help convergence, especially for clustered eigenvalues
-                    ncv =  max(2*self.neig + 1, 20)
+                    ncv = min(N, max(2 * self.neig + 1, 20))
                     vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
-                except Exception:
+                except ArpackNoConvergence:
                     # Retry with larger NCV if ARPACK fails (e.g. error 3: No shifts could be applied)
                     # This often happens when eigenvalues are clustered near the shift
-                    ncv =  max(5*self.neig, 50)
+                    ncv = min(N, max(5 * self.neig, 50))
                     vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)

Update the import on line 17 to include ArpackNoConvergence:

from scipy.sparse.linalg import eigsh, LinearOperator, ArpackNoConvergence
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 201-201: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dptb/nn/energy.py` around lines 197 - 205, The try/except should only catch
ARPACK convergence failures and must cap ncv to the matrix dimension N so we
never call eigsh with ncv >= N; update the import to include ArpackNoConvergence
and change the except to except ArpackNoConvergence: (so other validation errors
propagate), and before calling eigsh compute N = hk.shape[0] (or use M.shape[0])
and set ncv = min(ncv, N-1) (and similarly for the retry ncv) so ncv satisfies k
< ncv < N when calling eigsh with hk, M, Op, self.neig, sigma=0.0,
mode=self.mode, which="LM".

Comment on lines +200 to +205
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "energy.py" -path "*/nn/*" | head -5

Repository: deepmodeling/DeePTB

Length of output: 82


🏁 Script executed:

cat -n ./dptb/nn/energy.py | sed -n '190,225p'

Repository: deepmodeling/DeePTB

Length of output: 1994


🏁 Script executed:

cat -n ./dptb/nn/energy.py | sed -n '150,230p'

Repository: deepmodeling/DeePTB

Length of output: 4139


🏁 Script executed:

cat -n ./dptb/nn/energy.py | sed -n '1,50p'

Repository: deepmodeling/DeePTB

Length of output: 2261


🏁 Script executed:

python3 << 'EOF'
from scipy.sparse.linalg import eigsh
import inspect

# Get the signature of eigsh
sig = inspect.signature(eigsh)
print("eigsh signature:")
print(sig)
print("\nParameters:")
for param_name, param in sig.parameters.items():
    print(f"  {param_name}: {param.default if param.default != inspect.Parameter.empty else 'required'}")
EOF

Repository: deepmodeling/DeePTB

Length of output: 180


🌐 Web query:

scipy sparse linalg eigsh return_eigenvectors parameter

💡 Result:

The return_eigenvectors parameter in scipy.sparse.linalg.eigsh is a boolean option that determines whether to return eigenvectors in addition to eigenvalues. It defaults to True. When True (default), the function returns a tuple (w, v) where w is an array of k eigenvalues and v is an array of k corresponding eigenvectors (v[:, i] is the eigenvector for eigenvalue w[i]). When False, it returns only the array of k eigenvalues w. This parameter also affects eigenvalue sorting: for which='LM' or 'SA', eigenvalues are sorted by algebraic value if return_eigenvectors=True, or by absolute value if False. For other which options, sorting is by algebraic value regardless.

Citations:


🏁 Script executed:

cat -n ./dptb/nn/energy.py | sed -n '120,165p'

Repository: deepmodeling/DeePTB

Length of output: 2031


Pass return_eigenvectors parameter to both eigsh calls to avoid unnecessary eigenvector allocation.

Lines 200 and 205 always retrieve and unpack both eigenvalues and eigenvectors from eigsh, even when callers only request eigenvalues. Since scipy.sparse.linalg.eigsh defaults to return_eigenvectors=True, the N × neig eigenvector matrix is always allocated. This allocation is only conditionally used at line 214, defeating the purpose of the return_eigenvectors parameter.

Pass return_eigenvectors=return_eigenvectors to both eigsh calls and conditionally unpack the result to avoid the unnecessary allocation when eigenvectors aren't needed.

Proposed fix
-                    vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
+                    result = eigsh(
+                        A=hk,
+                        M=M,
+                        k=self.neig,
+                        sigma=0.0,
+                        OPinv=Op,
+                        mode=self.mode,
+                        which="LM",
+                        ncv=ncv,
+                        return_eigenvectors=return_eigenvectors,
+                    )

Apply the same change to line 205.

Then update the unpacking logic:

-                eigvals_list.append(vals + self.sigma)
-                if return_eigenvectors:
-                    eigvecs_list.append(vecs)
+                if return_eigenvectors:
+                    vals, vecs = result
+                    eigvecs_list.append(vecs)
+                else:
+                    vals = result
+                eigvals_list.append(vals + self.sigma)
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 201-201: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dptb/nn/energy.py` around lines 200 - 205, The eigsh calls in energy.py
always allocate eigenvectors because they unpack vals, vecs regardless of
whether callers requested eigenvectors; modify both eigsh invocations (the calls
with A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM",
ncv=...) to pass return_eigenvectors=return_eigenvectors, and change the
unpacking so you only unpack vecs when return_eigenvectors is True (e.g., if
return_eigenvectors: vals, vecs = eigsh(... ) else: vals = eigsh(... )); ensure
this change is applied to both the initial call and the retry call that
increases ncv and keep references to hk, M, Op, and self.neig intact.


# Release PARDISO's internal LU factorization memory for this
# k-point before moving to the next one. Without this, the
# factorization buffers from *every* k-point accumulate.
solver.free_memory()
del Op

eigvals_list.append(vals + self.sigma)
if return_eigenvectors:
eigvecs_list.append(vecs)
finally:
# Guarantee all internal PARDISO memory is released even on error.
solver.free_memory(everything=True)

if return_eigenvectors:
return eigvals_list, eigvecs_list
else:
Expand Down