Skip to content

[Feature] Support Multi-Scale Training for Text Detection#1714

Open
Mountchicken wants to merge 3 commits into
open-mmlab:dev-1.xfrom
Mountchicken:jq/ms_train
Open

[Feature] Support Multi-Scale Training for Text Detection#1714
Mountchicken wants to merge 3 commits into
open-mmlab:dev-1.xfrom
Mountchicken:jq/ms_train

Conversation

@Mountchicken

Copy link
Copy Markdown
Collaborator

Multiscale training is an attractive trick for text detection since the scale of text is highly variable.

Supporting multi-scale training is simple, we only need to modify the generation of text target to use data_sample.batch_input_shape instead of data_sample.img_shape. This modification will not affect the existing detectors in mmocr, because their input size is fixed, i.e. data_sample.img_shape=data_sample.batch_input_shape.

To use multi-scale training, here is a simple config

train_pipeline = [
    dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
    dict(
        type='LoadOCRAnnotations',
        with_bbox=True,
        with_polygon=True,
        with_label=True,
    ),
    dict(
        type='RandomResize',
        scale=[(1280, 800), (1280, 1024)],
        keep_ratio=True),
    dict(
        type='PackTextDetInputs',
        meta_keys=('img_path', 'ori_shape', 'img_shape'))
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants