DINOv2 cannot run on older GPUs (e.g. Nvidia Titan)

Trying to run UniCeption inference on Titan partition of the cluster results in this error:

![Screenshot 2024-11-19 at 7 48 27 PM](https://github.com/user-attachments/assets/450748b5-bbd4-48b1-b78d-b889de2c8ce5)

Might related to the auto `bf16` casting used in UniCeption. Should probably provide a switch in config / auto detect hardware compatibility to decide whether use mix-precision inference.

Full stderr is provided below:

```
Scripts/UnitTest/test_matching.py:27: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Module/Frontend/Matching.py:468: in estimate
    result = self.context["model"](view1, view2)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1553: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1562: in _call_impl
    return forward_call(*args, **kwargs)
Module/Network/UniCeption/uniception/models/factory/match_anything.py:344: in forward
    feat1, feat2 = self._encode_symmetrized(view1, view2)
Module/Network/UniCeption/uniception/models/factory/match_anything.py:298: in _encode_symmetrized
    feat1, feat2 = self._encode_image_pairs(img1, img2, data_norm_type=view1["data_norm_type"])
Module/Network/UniCeption/uniception/models/factory/match_anything.py:277: in _encode_image_pairs
    encoder_output = self.encoder(encoder_input)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1553: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1562: in _call_impl
    return forward_call(*args, **kwargs)
Module/Network/UniCeption/uniception/models/encoders/dinov2.py:109: in forward
    features = self.model.forward_features(encoder_input.image)["x_norm_patchtokens"]
../.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:261: in forward_features
    x = blk(x)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1553: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1562: in _call_impl
    return forward_call(*args, **kwargs)
../.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/block.py:254: in forward
    return super().forward(x_or_x_list)
../.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/block.py:112: in forward
    x = x + attn_residual_func(x)
../.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/block.py:91: in attn_residual_func
    return self.ls1(self.attn(self.norm1(x)))
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1553: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1562: in _call_impl
    return forward_call(*args, **kwargs)
../.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/attention.py:84: in forward
    x = memory_efficient_attention(q, k, v, attn_bias=attn_bias)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py:276: in memory_efficient_attention
    return _memory_efficient_attention(
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py:395: in _memory_efficient_attention
    return _memory_efficient_attention_forward(
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py:414: in _memory_efficient_attention_forward
    op = _dispatch_fw(inp, False)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/xformers/ops/fmha/dispatch.py:119: in _dispatch_fw
    return _run_priority_list(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

name = 'memory_efficient_attention_forward', priority_list = deque([<class 'xformers.ops.fmha.decoder.FwOp'>, <class 'xformers.ops.fmha.flash.FwOp'>, <class 'xformers.ops.fmha.cutlass.FwOp'>, <class 'xformers.ops.fmha.small_k.FwOp'>])
inp = Inputs(query=tensor([[[[ 2.7969e+00, -6.8750e-01,  9.1016e-01,  ..., -1.1094e+00,
           -6.9336e-02,  3.0625e+00]...02]]]], device='cuda:0', dtype=torch.bfloat16), attn_bias=None, p=0.0, scale=None, output_dtype=None, is_partial=False)

    def _run_priority_list(name: str, priority_list: Sequence[T], inp: Inputs) -> T:
        not_supported_reasons: List[List[str]] = []
        for op in priority_list:
            not_supported = op.not_supported_reasons(inp)
            if not not_supported:
                return op
            not_supported_reasons.append(not_supported)
    
        # Let's write a nice message explaining what we tried and why it's not supported
        msg = f"""No operator found for `{name}` with inputs:
    {textwrap.indent(_format_inputs_description(inp), '     ')}"""
        for op, not_supported in zip(priority_list, not_supported_reasons):
            msg += "\n" + _format_not_supported_reasons(op, not_supported)
>       raise NotImplementedError(msg)
E       NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
E            query       : shape=(2, 257, 16, 64) (torch.bfloat16)
E            key         : shape=(2, 257, 16, 64) (torch.bfloat16)
E            value       : shape=(2, 257, 16, 64) (torch.bfloat16)
E            attn_bias   : <class 'NoneType'>
E            p           : 0.0
E       `decoderF` is not supported because:
E           requires device with capability > (7, 0) but your GPU has capability (6, 1) (too old)
E           attn_bias type is <class 'NoneType'>
E           bf16 is only supported on A100+ GPUs
E       `flshattF@2.5.6-pt` is not supported because:
E           requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old)
E           bf16 is only supported on A100+ GPUs
E       `cutlassF-pt` is not supported because:
E           bf16 is only supported on A100+ GPUs
E       `smallkF` is not supported because:
E           max(query.shape[-1] != value.shape[-1]) > 32
E           dtype=torch.bfloat16 (supported: {torch.float32})
E           bf16 is only supported on A100+ GPUs
E           unsupported embed per head: 64

/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/xformers/ops/fmha/dispatch.py:55: NotImplementedError
--------------------------------------------------------------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------------------------------------------------------------
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
Loading pretrained dinov2_vitl14 from torch hub
--------------------------------------------------------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------------------------------------------------------
Using cache found in /home/yutianch/.cache/torch/hub/facebookresearch_dinov2_main
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DINOv2 cannot run on older GPUs (e.g. Nvidia Titan) #7

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

DINOv2 cannot run on older GPUs (e.g. Nvidia Titan) #7

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions