Skip to content

Model loses information very quickly #25

@Lazy3valuation

Description

@Lazy3valuation

Hi! I trained the model with LoRA and 8 bit precision down to 1.5/2.5 training loss. The generation is segment-wise, but the model seems to not generate correct text. It cannot perform a needle-in-a-sack test even in small tests (less tokens than the segment size, aka 400 for me). It starts to spit out nosense very quickly. For example:
I've tried a NIAS test with this pattern:
"There is an important info hidden inside a lot of irrelevant text. Find it and memorize it. I will quiz you about the important information there."
Then a loop of "\nThe grass is green. The sky is blue. The sun is yellow. Here we go. There and back again.\nThe grass is green. The sky is blue. The sun is yellow. Here we go. There and back again.\nThe grass is green. The sky is blue. The sun is yellow. Here we go. There and back again." continues for many times (I've repeated it as long as to reach 400 tokens, 3600 tokens and 10k tokens).
Inside the loop, in a random position, there's a "\nThe pass key is 72498. Remember it. 72498 is the pass key.". In the end of the prompt, there's written "What is the pass key? The pass key is " and the base model completes correctly with 72498, up until 3600 tokens (then my GPU goes oom).

With infini attention, the model can't complete correctly even once. Moreover, the pattern repeated many times gets "broken", here's a completion example:
" The sun is yellow. Here we go. There and back again.\nThe grass is green. The sky is bluer. The sun is yellow. Here we go. There and back again.\nThe grass is green. The sky is bluer. The sun is yellow. Here we go. There and back again.\nThe grass is green. The sky is blue. The sun is yellow. Here we will. They will be a bit of the distance, at least we"

It behaves as if the model can't keep information at all, or for a very short amount of time. Has anyone tested how good those models go? I sadly noticed that the repo has not been updated in a month :-(

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions