Skip to content

VideoLLaMA on CPU Server(without GPU or CUDA Support) #169

Description

@harshmoothat

Issue 1: FlashAttention Compatibility

The first issue we encountered was related to FlashAttention. This can be resolved by disabling Flash Attention explicitly:

Wherever use_flash_attention is referenced, set its value to "eager" to ensure compatibility and prevent errors on systems where Flash Attention is not supported.

Changes made in config.json file

changed "mm_vision_tower": "google/siglip-so400m-patch14-384" to "mm_vision_tower": "openai/clip-vit-base-patch32"
- Set "use_flash_attention": false
- Set "sliding_window": 0

Issue 2: No CUDA GPUs Available

Installed the CPU-only versions of PyTorch, TorchVision, and TorchAudio using:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Replaced hardcoded .cuda() calls in videollama2/init.py, videollama2/model/init.py

input_ids = tokenizer_multimodal_token(prompt, tokenizer, modal_token, return_tensors='pt').unsqueeze(0).long()
attention_masks = input_ids.ne(tokenizer.pad_token_id).long()

if device != "cpu":
    kwargs['device_map'] = {"": device}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions