Skip to content

Add a callback to show elapsed and remaining time#2082

Open
kyouma wants to merge 3 commits intolululxvi:masterfrom
kyouma:add_time_tracker_callback
Open

Add a callback to show elapsed and remaining time#2082
kyouma wants to merge 3 commits intolululxvi:masterfrom
kyouma:add_time_tracker_callback

Conversation

@kyouma
Copy link
Copy Markdown
Contributor

@kyouma kyouma commented Apr 2, 2026

Hello.

There is no estimation of remaining training time, so I have made a callback to print in in tqdm style. Integration of tqdm itself seems very easy (change 1 line in _train_sgd()), but is actually impossible due to multiple outputs during training that break the progress bar even with tqdm.tqdm.write().

@echen5503
Copy link
Copy Markdown
Contributor

When you say "multiple outputs during training", are you referring to things like tensorflow C-level printing, or just normal printing from DeepXDE?

@kyouma
Copy link
Copy Markdown
Contributor Author

kyouma commented Apr 3, 2026

I tried replacing all print() to stdout calls (except for the before-training-begins ones) with tqdm.tqdm.write(), and even the VariableValue broke the tqdm output, even though I have used tqdm.tqdm.write() there, too.

Also, I think that we can't guarantee that in future no stderr outputs, warnings or outputs inside third-party optimization libraries break the tqdm progress bar, so I have decided to add a callback.

@echen5503
Copy link
Copy Markdown
Contributor

In this case, your design choice for custom class is correct, in my opinion.

Copy link
Copy Markdown
Contributor

@echen5503 echen5503 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also show some logs, after running on some of the deepXDE examples, to make sure this will display properly?

Comment thread deepxde/callbacks.py
self.starting_epoch = self._get_iteration()
self.last_display_epoch = self._get_iteration()

def _get_iteration(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can _get_iteration, why not _get_iterations (total iterations) as well, and reduce overhead in model.py code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an argument of the methods train() and _train_sgd() in Model, but not a property of Model, so callbacks do not have access to it.

A similar callback parameter setting method is already used in the _train_tensorflow_compat_v1_scipy() method, so I have followed the same way.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. You are right.

@kyouma
Copy link
Copy Markdown
Contributor Author

kyouma commented Apr 3, 2026

Here it is. The training log (display_every is 250), variable tracker (period is 200) and time tracker (period is 200) are active.

Step      Train loss                                  Test loss                                   Test metric
0         [1.59e+01, 9.45e-02, 4.36e-01, 1.62e-01]    [1.80e+01, 9.45e-02, 4.36e-01, 1.62e-01]    [4.03e-01, 1.39e+00]
0 [1.00e+00]
200 [00:01<00:29, 130.01it/s]
250       [1.39e-01, 2.20e-02, 1.12e-01, 1.95e-02]    [1.09e-01, 2.20e-02, 1.12e-01, 1.95e-02]    [5.13e-03, 1.57e-01]
250 [1.08e+00]
400 [00:03<00:27, 129.96it/s]
500       [2.32e-02, 9.74e-03, 8.38e-02, 6.00e-03]    [2.14e-02, 9.74e-03, 8.38e-02, 6.00e-03]    [3.25e-03, 1.25e-01]
500 [1.30e+00]
600 [00:04<00:25, 131.01it/s]
750       [8.03e-03, 7.10e-03, 7.58e-02, 1.07e-03]    [1.06e-02, 7.10e-03, 7.58e-02, 1.07e-03]    [3.83e-03, 1.35e-01]
750 [1.39e+00]
800 [00:06<00:24, 132.58it/s]
1000      [4.69e-03, 7.54e-03, 7.32e-02, 5.79e-04]    [6.32e-03, 7.54e-03, 7.32e-02, 5.79e-04]    [4.16e-03, 1.41e-01]
1000 [1.46e+00]
1000 [00:07<00:22, 133.28it/s]
1200 [00:08<00:20, 133.98it/s]
1250      [3.54e-03, 7.10e-03, 6.57e-02, 7.26e-04]    [3.55e-03, 1.11e-02, 6.57e-02, 9.88e-02]    [5.39e-03, 1.61e-01]
1250 [1.58e+00]
1400 [00:10<00:19, 134.47it/s]

@echen5503
Copy link
Copy Markdown
Contributor

Ok. looks good. We should think about improving clarity of multiple logging callbacks in a future PR, #2084

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants