Implement all these code in Cuda, reduce CPU <-> GPU communication, data transfer between CPU and GPU only happens once.
Starting from here: "commit to quotient polys", PolynomialBatch::<F, C, D>::from_coeffs
All data should remain in GPU. After finishing proving, transfer the final proof to CPU
https://github.com/okx/plonky2/blob/b74c2ac48bd25df0cbdee2e4b3871f63b631178d/plonky2/src/plonk/prover.rs#L281-L337
This is a meta issue
Implement all these code in Cuda, reduce CPU <-> GPU communication, data transfer between CPU and GPU only happens once.
Starting from here: "commit to quotient polys",
PolynomialBatch::<F, C, D>::from_coeffsAll data should remain in GPU. After finishing proving, transfer the final proof to CPU
https://github.com/okx/plonky2/blob/b74c2ac48bd25df0cbdee2e4b3871f63b631178d/plonky2/src/plonk/prover.rs#L281-L337
This is a meta issue