Trying to run UniCeption inference on Titan partition of the cluster results in this error:
Scripts/UnitTest/test_matching.py:27:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Module/Frontend/Matching.py:468: in estimate
result = self.context["model"](view1, view2)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1553: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1562: in _call_impl
return forward_call(*args, **kwargs)
Module/Network/UniCeption/uniception/models/factory/match_anything.py:344: in forward
feat1, feat2 = self._encode_symmetrized(view1, view2)
Module/Network/UniCeption/uniception/models/factory/match_anything.py:298: in _encode_symmetrized
feat1, feat2 = self._encode_image_pairs(img1, img2, data_norm_type=view1["data_norm_type"])
Module/Network/UniCeption/uniception/models/factory/match_anything.py:277: in _encode_image_pairs
encoder_output = self.encoder(encoder_input)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1553: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1562: in _call_impl
return forward_call(*args, **kwargs)
Module/Network/UniCeption/uniception/models/encoders/dinov2.py:109: in forward
features = self.model.forward_features(encoder_input.image)["x_norm_patchtokens"]
../.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:261: in forward_features
x = blk(x)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1553: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1562: in _call_impl
return forward_call(*args, **kwargs)
../.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/block.py:254: in forward
return super().forward(x_or_x_list)
../.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/block.py:112: in forward
x = x + attn_residual_func(x)
../.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/block.py:91: in attn_residual_func
return self.ls1(self.attn(self.norm1(x)))
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1553: in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/torch/nn/modules/module.py:1562: in _call_impl
return forward_call(*args, **kwargs)
../.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/attention.py:84: in forward
x = memory_efficient_attention(q, k, v, attn_bias=attn_bias)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py:276: in memory_efficient_attention
return _memory_efficient_attention(
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py:395: in _memory_efficient_attention
return _memory_efficient_attention_forward(
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/xformers/ops/fmha/__init__.py:414: in _memory_efficient_attention_forward
op = _dispatch_fw(inp, False)
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/xformers/ops/fmha/dispatch.py:119: in _dispatch_fw
return _run_priority_list(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'memory_efficient_attention_forward', priority_list = deque([<class 'xformers.ops.fmha.decoder.FwOp'>, <class 'xformers.ops.fmha.flash.FwOp'>, <class 'xformers.ops.fmha.cutlass.FwOp'>, <class 'xformers.ops.fmha.small_k.FwOp'>])
inp = Inputs(query=tensor([[[[ 2.7969e+00, -6.8750e-01, 9.1016e-01, ..., -1.1094e+00,
-6.9336e-02, 3.0625e+00]...02]]]], device='cuda:0', dtype=torch.bfloat16), attn_bias=None, p=0.0, scale=None, output_dtype=None, is_partial=False)
def _run_priority_list(name: str, priority_list: Sequence[T], inp: Inputs) -> T:
not_supported_reasons: List[List[str]] = []
for op in priority_list:
not_supported = op.not_supported_reasons(inp)
if not not_supported:
return op
not_supported_reasons.append(not_supported)
# Let's write a nice message explaining what we tried and why it's not supported
msg = f"""No operator found for `{name}` with inputs:
{textwrap.indent(_format_inputs_description(inp), ' ')}"""
for op, not_supported in zip(priority_list, not_supported_reasons):
msg += "\n" + _format_not_supported_reasons(op, not_supported)
> raise NotImplementedError(msg)
E NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
E query : shape=(2, 257, 16, 64) (torch.bfloat16)
E key : shape=(2, 257, 16, 64) (torch.bfloat16)
E value : shape=(2, 257, 16, 64) (torch.bfloat16)
E attn_bias : <class 'NoneType'>
E p : 0.0
E `decoderF` is not supported because:
E requires device with capability > (7, 0) but your GPU has capability (6, 1) (too old)
E attn_bias type is <class 'NoneType'>
E bf16 is only supported on A100+ GPUs
E `flshattF@2.5.6-pt` is not supported because:
E requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old)
E bf16 is only supported on A100+ GPUs
E `cutlassF-pt` is not supported because:
E bf16 is only supported on A100+ GPUs
E `smallkF` is not supported because:
E max(query.shape[-1] != value.shape[-1]) > 32
E dtype=torch.bfloat16 (supported: {torch.float32})
E bf16 is only supported on A100+ GPUs
E unsupported embed per head: 64
/data2/datasets/yutianch/.conda/envs/AirVIO/lib/python3.11/site-packages/xformers/ops/fmha/dispatch.py:55: NotImplementedError
--------------------------------------------------------------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------------------------------------------------------------
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
Loading pretrained dinov2_vitl14 from torch hub
--------------------------------------------------------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------------------------------------------------------
Using cache found in /home/yutianch/.cache/torch/hub/facebookresearch_dinov2_main
Trying to run UniCeption inference on Titan partition of the cluster results in this error:
Might related to the auto
bf16casting used in UniCeption. Should probably provide a switch in config / auto detect hardware compatibility to decide whether use mix-precision inference.Full stderr is provided below: