After reading your paper and code, some questions confused me. I would be very appreciate if someone can explain my question.
First, it was argued swin-T is used in this paper, but actually, in your model definition code, it seems to be swin-L.
Second, your evaluation code calculates top-5 in function _average_top_k_result(eval.py),using all output of modules, such as select, drop, FPN layers, combiner, original output, and cat those output to get the final metrics, is this reasonable?
Thirdly, the highest-5 acc in your code are not equal to any layer, printed as this picture. So, what is the meaning of highest 1-5? And how to get them?

After reading your paper and code, some questions confused me. I would be very appreciate if someone can explain my question.

First, it was argued swin-T is used in this paper, but actually, in your model definition code, it seems to be swin-L.
Second, your evaluation code calculates top-5 in function
_average_top_k_result(eval.py),using all output of modules, such asselect,drop,FPN layers,combiner,original output, andcatthose output to get the final metrics, is this reasonable?Thirdly, the highest-5
accin your code are not equal to any layer, printed as this picture. So, what is the meaning of highest 1-5? And how to get them?