implement GetAllDocuments() by dnnspaul · Pull Request #110 · philippgille/chromem-go

dnnspaul · 2025-04-25T10:06:41Z

Hey @philippgille, thanks for this great package!

I've had a hard time finding out that there is no possibility to kinda merge 2 existing collections. I appreciate your focus on staying as a simple package (as I've read in other Issues and Pull requests), so I avoided to extend the Import.. functions with enableMerge attributes - but instead implemented the simplest approach I could come up with: getting all existing documents of a collection. This way the end-user (or -developer?) is at least able to fetch all documents and import them into another collection on their own way.

I'm interested in your feedback and leave behind some happy greetings from Hamburg

philippgille

Hi Dennis 👋, Thanks for contributing!

I think the method is useful and makes sense 👍.

But there's a DB.ListCollections(), so for consistency I'd prefer to name the new method Collection.ListDocuments().

And can you please move it between the Collection.GetByID() and Collection.Delete()?

Thanks!

dnnspaul · 2025-04-29T09:47:03Z

Hi @philippgille, your wished changes make absolute sense and are applied now. ✌️

philippgille

Hello, sorry for the long delay!

First of all, thanks for implementing the requested changes! 🙇‍♂️

I had another more thorough look and found some more things that can be improved.

Due to my delayed review I'd understand if improving the PR doesn't fit your schedule anymore, so just let me know if you prefer me to make those changes on my own.

philippgille · 2025-05-24T14:33:38Z

+// The returned documents are a copy of the original documents, so they can be safely
+// modified without affecting the collection.


The slice is new and can be modified without affecting the internal c.documents, but the documents themselves are not full copies. When you do x := y, then y's simple fields (int, string) are entirely new, but maps and slices are only_shallow_ copies.

Demonstration: https://go.dev/play/p/0OccI4ibtS2

See the above GetByID where the Metadata and Embedding fields are cloned separately to create an entirely new document.

So here we have two options:

Change the Godoc to clarify that the documents are shallow copies, and only the slice is new. This still allows the receiver to work with the slice, like iterating over it and reading the documents, without concurrency issues during regular operations. For example chromem-go can still add new documents to its c.Documents map, or delete them, and it doesn't affect the returned slice. Here's an example in chromem-go where something similar is done:

chromem-go/db.go

Lines 517 to 522 in 8311eb0

// The returned map is a copy of the internal map, so it's safe to directly modify

// the map itself. Direct modifications of the map won't reflect on the DB's map.

// To do that use the DB's methods like [DB.CreateCollection] and [DB.DeleteCollection].

// The map is not an entirely deep clone, so the collections themselves are still

// the original ones. Any methods on the collections like Add() for adding documents

// will be reflected on the DB's collections and are concurrency-safe.

Or create a deep copy of documents. This can either be done by calling the GetByID for each document, or by copying the code from that method. The former leads to less code, but one extra operation per document (the c.Documents lookup).

philippgille · 2025-05-24T14:35:10Z

+	ids := []string{"1", "2", "3", "4"}
+	metadatas := []map[string]string{{"foo": "bar"}, {"a": "b"}, {"foo": "bar"}, {"e": "f"}}
+	contents := []string{"hello world", "hallo welt", "bonjour le monde", "hola mundo"}
+	c.Add(context.Background(), ids, nil, metadatas, contents)


Here the returned error should be checked

philippgille · 2025-05-24T14:36:23Z

+	for _, doc := range docs {
+		if doc.Content == "hello world" {
+			break
+		}
+	}


Here the test doesn't assert whether the content was found or not. You can introduce a new variable found := false before the loop, set it found = true just before the break, and after the loop assert that its value is true.

philippgille · 2025-10-10T09:57:11Z

There was a later PR from another contributor, which I think supersedes this: #118

Can you check if that enables you to do the merge of collections?

implement GetAllDocuments()

d75b78a

philippgille reviewed Apr 27, 2025

View reviewed changes

applied recommendations by philipp

6b12ae6

philippgille reviewed May 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement GetAllDocuments()#110

implement GetAllDocuments()#110
dnnspaul wants to merge 2 commits into
philippgille:mainfrom
dnnspaul:main

dnnspaul commented Apr 25, 2025

Uh oh!

philippgille left a comment

Uh oh!

dnnspaul commented Apr 29, 2025

Uh oh!

philippgille left a comment

Uh oh!

philippgille May 24, 2025

Uh oh!

philippgille May 24, 2025

Uh oh!

philippgille May 24, 2025

Uh oh!

philippgille commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// The returned documents are a copy of the original documents, so they can be safely
		// modified without affecting the collection.

	// The returned map is a copy of the internal map, so it's safe to directly modify
	// the map itself. Direct modifications of the map won't reflect on the DB's map.
	// To do that use the DB's methods like [DB.CreateCollection] and [DB.DeleteCollection].
	// The map is not an entirely deep clone, so the collections themselves are still
	// the original ones. Any methods on the collections like Add() for adding documents
	// will be reflected on the DB's collections and are concurrency-safe.

Conversation

dnnspaul commented Apr 25, 2025

Uh oh!

philippgille left a comment

Choose a reason for hiding this comment

Uh oh!

dnnspaul commented Apr 29, 2025

Uh oh!

philippgille left a comment

Choose a reason for hiding this comment

Uh oh!

philippgille May 24, 2025

Choose a reason for hiding this comment

Uh oh!

philippgille May 24, 2025

Choose a reason for hiding this comment

Uh oh!

philippgille May 24, 2025

Choose a reason for hiding this comment

Uh oh!

philippgille commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants