-
Notifications
You must be signed in to change notification settings - Fork 31
fix the memory issue #322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix the memory issue #322
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -163,47 +163,60 @@ def solve(self, h_container, s_container, kpoints: Union[list, torch.Tensor, np | |
| eigvals_list = [] | ||
| eigvecs_list = [] | ||
|
|
||
| for k in kpoints: | ||
| hk = h_container.sample_k(k, symm=True) | ||
|
|
||
| if s_container is not None: | ||
| sk = s_container.sample_k(k, symm=True) | ||
| hk -= self.sigma * sk | ||
| A = hk.to_scipy(format="csr") | ||
| M = sk | ||
| else: | ||
| hk.shift(-self.sigma) | ||
| A = hk.to_scipy(format="csr") | ||
| M = None | ||
|
|
||
| A.sort_indices() | ||
| A.sum_duplicates() | ||
| N = A.shape[0] | ||
|
|
||
| # Try PARDISO first, fall back to scipy SuperLU if PARDISO fails | ||
| # (MKL PARDISO has a known bug with certain block-structured patterns) | ||
| solver = PyPardisoSolver(mtype=13) | ||
| solver.factorize(A) | ||
|
|
||
| def matvec(b): | ||
| return solver.solve(A, b) | ||
| # Create a single solver instance and reuse it across k-points. | ||
| # PARDISO stores LU factors in opaque `pt` handles; creating a new | ||
| # solver per k-point without calling free_memory() leaks all that | ||
| # internal MKL memory. | ||
| solver = PyPardisoSolver(mtype=13) | ||
|
|
||
| try: | ||
| for k in kpoints: | ||
| hk = h_container.sample_k(k, symm=True) | ||
|
|
||
| Op = LinearOperator((N, N), matvec=matvec, dtype=A.dtype) | ||
|
|
||
| try: | ||
| # Use larger NCV to help convergence, especially for clustered eigenvalues | ||
| ncv = max(2*self.neig + 1, 20) | ||
| vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv) | ||
| except Exception: | ||
| # Retry with larger NCV if ARPACK fails (e.g. error 3: No shifts could be applied) | ||
| # This often happens when eigenvalues are clustered near the shift | ||
| ncv = max(5*self.neig, 50) | ||
| vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv) | ||
|
|
||
| eigvals_list.append(vals + self.sigma) | ||
| if return_eigenvectors: | ||
| eigvecs_list.append(vecs) | ||
|
|
||
| if s_container is not None: | ||
| sk = s_container.sample_k(k, symm=True) | ||
| hk -= self.sigma * sk | ||
| A = hk.to_scipy(format="csr") | ||
| M = sk | ||
| else: | ||
| hk.shift(-self.sigma) | ||
| A = hk.to_scipy(format="csr") | ||
| M = None | ||
|
|
||
| A.sort_indices() | ||
| A.sum_duplicates() | ||
| N = A.shape[0] | ||
|
|
||
| solver.factorize(A) | ||
|
|
||
| def matvec(b): | ||
| return solver.solve(A, b) | ||
|
|
||
| Op = LinearOperator((N, N), matvec=matvec, dtype=A.dtype) | ||
|
|
||
| try: | ||
| # Use larger NCV to help convergence, especially for clustered eigenvalues | ||
| ncv = max(2*self.neig + 1, 20) | ||
| vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv) | ||
| except Exception: | ||
| # Retry with larger NCV if ARPACK fails (e.g. error 3: No shifts could be applied) | ||
| # This often happens when eigenvalues are clustered near the shift | ||
| ncv = max(5*self.neig, 50) | ||
| vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv) | ||
|
Comment on lines
+200
to
+205
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: find . -name "energy.py" -path "*/nn/*" | head -5Repository: deepmodeling/DeePTB Length of output: 82 🏁 Script executed: cat -n ./dptb/nn/energy.py | sed -n '190,225p'Repository: deepmodeling/DeePTB Length of output: 1994 🏁 Script executed: cat -n ./dptb/nn/energy.py | sed -n '150,230p'Repository: deepmodeling/DeePTB Length of output: 4139 🏁 Script executed: cat -n ./dptb/nn/energy.py | sed -n '1,50p'Repository: deepmodeling/DeePTB Length of output: 2261 🏁 Script executed: python3 << 'EOF'
from scipy.sparse.linalg import eigsh
import inspect
# Get the signature of eigsh
sig = inspect.signature(eigsh)
print("eigsh signature:")
print(sig)
print("\nParameters:")
for param_name, param in sig.parameters.items():
print(f" {param_name}: {param.default if param.default != inspect.Parameter.empty else 'required'}")
EOFRepository: deepmodeling/DeePTB Length of output: 180 🌐 Web query:
💡 Result: The return_eigenvectors parameter in scipy.sparse.linalg.eigsh is a boolean option that determines whether to return eigenvectors in addition to eigenvalues. It defaults to True. When True (default), the function returns a tuple (w, v) where w is an array of k eigenvalues and v is an array of k corresponding eigenvectors (v[:, i] is the eigenvector for eigenvalue w[i]). When False, it returns only the array of k eigenvalues w. This parameter also affects eigenvalue sorting: for which='LM' or 'SA', eigenvalues are sorted by algebraic value if return_eigenvectors=True, or by absolute value if False. For other which options, sorting is by algebraic value regardless. Citations:
🏁 Script executed: cat -n ./dptb/nn/energy.py | sed -n '120,165p'Repository: deepmodeling/DeePTB Length of output: 2031 Pass Lines 200 and 205 always retrieve and unpack both eigenvalues and eigenvectors from Pass Proposed fix- vals, vecs = eigsh(A=hk, M=M, k=self.neig, sigma=0.0, OPinv=Op, mode=self.mode, which="LM", ncv=ncv)
+ result = eigsh(
+ A=hk,
+ M=M,
+ k=self.neig,
+ sigma=0.0,
+ OPinv=Op,
+ mode=self.mode,
+ which="LM",
+ ncv=ncv,
+ return_eigenvectors=return_eigenvectors,
+ )Apply the same change to line 205. Then update the unpacking logic: - eigvals_list.append(vals + self.sigma)
- if return_eigenvectors:
- eigvecs_list.append(vecs)
+ if return_eigenvectors:
+ vals, vecs = result
+ eigvecs_list.append(vecs)
+ else:
+ vals = result
+ eigvals_list.append(vals + self.sigma)🧰 Tools🪛 Ruff (0.15.6)[warning] 201-201: Do not catch blind exception: (BLE001) 🤖 Prompt for AI Agents |
||
|
|
||
| # Release PARDISO's internal LU factorization memory for this | ||
| # k-point before moving to the next one. Without this, the | ||
| # factorization buffers from *every* k-point accumulate. | ||
| solver.free_memory() | ||
| del Op | ||
|
|
||
| eigvals_list.append(vals + self.sigma) | ||
| if return_eigenvectors: | ||
| eigvecs_list.append(vecs) | ||
| finally: | ||
| # Guarantee all internal PARDISO memory is released even on error. | ||
| solver.free_memory(everything=True) | ||
|
|
||
| if return_eigenvectors: | ||
| return eigvals_list, eigvecs_list | ||
| else: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: deepmodeling/DeePTB
Length of output: 77
🏁 Script executed:
cat -n dptb/nn/energy.py | head -50Repository: deepmodeling/DeePTB
Length of output: 2261
🏁 Script executed:
Repository: deepmodeling/DeePTB
Length of output: 4994
🏁 Script executed:
rg "import.*eigsh|from.*eigsh" dptb/nn/energy.py -A 2 -B 2Repository: deepmodeling/DeePTB
Length of output: 279
🏁 Script executed:
Repository: deepmodeling/DeePTB
Length of output: 177
🌐 Web query:
scipy.sparse.linalg.eigsh ncv parameter documentation constraints💡 Result:
The ncv parameter in scipy.sparse.linalg.eigsh specifies the number of Lanczos vectors generated. Its constraints are: ncv must be greater than k (number of desired eigenvalues) and smaller than n (matrix dimension). It is recommended that ncv > 2k. The default value is min(n, max(2k + 1, 20)).
Citations:
🌐 Web query:
scipy.sparse.linalg ArpackNoConvergence exception💡 Result:
The scipy.sparse.linalg.ArpackNoConvergence exception is raised by ARPACK-based functions like eigsh and eigs when the iterative Arnoldi/Lanczos process fails to converge to the requested eigenvalues within the specified maximum iterations (default maxiter=n10) and tolerance (default tol=0, machine precision). Key details from official SciPy documentation (v1.17.0): - Exception class: scipy.sparse.linalg.ArpackNoConvergence(msg, eigenvalues, eigenvectors) - Attributes: eigenvalues (ndarray of converged eigenvalues), eigenvectors (ndarray of converged eigenvectors) - use these partial results if available. - Raised explicitly when convergence criteria are not met. Common causes: - Targeting non-extremal eigenvalues (e.g., which='SM' for small magnitude) - ARPACK excels at extreme eigenvalues (large magnitude). - Insufficient maxiter, tight tol, or small ncv (number of Lanczos vectors; default min(n, max(2k+1,20)); recommend ncv > 2k). - Clustered eigenvalues or ill-conditioned matrices. Handling strategies (from SciPy ARPACK tutorial and eigsh docs): 1. Increase maxiter (e.g., maxiter=5000). 2. Relax tolerance (e.g., tol=1e-2), but at precision cost. 3. Increase ncv (e.g., ncv=2k + extra). 4. Use shift-invert mode: Set sigma near target eigenvalues (e.g., sigma=0 for near-zero), which='LM' to transform interior eigenvalues to extreme ones. Requires solving linear systems internally. 5. Provide good initial vector v0. 6. Catch exception and use partial eigenvalues/eigenvectors: try: vals, vecs = eigsh(A, k=6) except scipy.sparse.linalg.ArpackNoConvergence as err: print("Converged:", len(err.eigenvalues), "of 6") vals, vecs = err.eigenvalues, err.eigenvectors 7. Alternatives: lobpcg for symmetric problems, or dense solvers for small matrices. Example from tutorial (fails with which='SM', succeeds with shift-invert): from scipy.sparse.linalg import eigsh evals_small, evecs_small = eigsh(X, 3, which='SM') # May raise ArpackNoConvergence evals_small, evecs_small = eigsh(X, 3, sigma=0, which='LM') # Succeeds faster Distinguish from ArpackError (other ARPACK issues like invalid params).
Citations:
Cap
ncvbyNand only retry ARPACK non-convergence.Lines 199 and 204 can set
ncvaboveNon small systems. According to scipy.sparse.linalg.eigsh documentation,ncvmust satisfyk < ncv < N(whereNis the matrix dimension). Because line 201 catches allExceptiontypes, scipy's validation error (whenncv > N) gets masked and retried with an even largerncvinstead of surfacing immediately. OnlyArpackNoConvergence(the actual convergence failure) should be caught for retry.🐛 Proposed fix
Update the import on line 17 to include
ArpackNoConvergence:🧰 Tools
🪛 Ruff (0.15.6)
[warning] 201-201: Do not catch blind exception:
Exception(BLE001)
🤖 Prompt for AI Agents