index out of range error when corpus size is in thousands

Hi, thank you for your work on this. 
I am noticing that I run into index out of range error when my corpus is around 1k-2k documents

i have precomputed embeddings stored in my data.
```
import umap
document_vectors = np.stack(df['EMBEDDINGS'].values)
document_map = umap.UMAP(metric='cosine').fit_transform(document_vectors)
```

```
from toponymy import ToponymyClusterer
clusterer = ToponymyClusterer(min_clusters=6)
clusterer.fit(clusterable_vectors=document_map, embedding_vectors=document_vectors)
for i, layer in enumerate(clusterer.cluster_layers_):
    print(f'{len(np.unique(layer.cluster_labels))-1} clusters in layer {i}')
```
Output:
111 clusters in layer 0
36 clusters in layer 1
9 clusters in layer 2

so far, all good.


```
from sentence_transformers import SentenceTransformer
from toponymy import Toponymy, KeyphraseBuilder
from toponymy.llm_wrappers import HuggingFace

embedding_model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")
llm = HuggingFace("Qwen/Qwen2.5-1.5B-Instruct")

```


```
text = df['TITLE_TEXT'].values

topic_model = Toponymy(
    llm_wrapper=llm,
    text_embedding_model=embedding_model,
    clusterer=clusterer,
)
topic_model.fit(text, document_vectors, document_map)

topic_names = topic_model.topic_names_
```

IndexError: list index out of range
File <command-5422088315880380>, line 8
      1 text = df['TITLE_TEXT'].values
      3 topic_model = Toponymy(
      4     llm_wrapper=llm,
      5     text_embedding_model=embedding_model,
      6     clusterer=clusterer,
      7 )
----> 8 topic_model.fit(text, document_vectors, document_map)
     10 topic_names = topic_model.topic_names_
     11 topics_per_document = [cluster_layer.topic_name_vector for cluster_layer in topic_model.cluster_layers_]
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-bf30105a-4657-4047-8f9f-0e0ae989b858/lib/python3.11/site-packages/toponymy/cluster_layer.py:229, in ClusterLayer._update_topic_names(self, new_topic_names, topic_indices)
    225 """
    226 Update the topic names for the specified indices.
    227 """
    228 for i, topic_index in enumerate(topic_indices):
--> 229     self.topic_names[topic_index] = new_topic_names[i]

I am running this on databricks with cluster config as below:
databricks runtime : 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12)
nodetype Standard_DS3_v2


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

index out of range error when corpus size is in thousands #57

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

index out of range error when corpus size is in thousands #57

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions