implement GetAllDocuments()#110
Conversation
philippgille
left a comment
There was a problem hiding this comment.
Hi Dennis 👋, Thanks for contributing!
I think the method is useful and makes sense 👍.
But there's a DB.ListCollections(), so for consistency I'd prefer to name the new method Collection.ListDocuments().
And can you please move it between the Collection.GetByID() and Collection.Delete()?
Thanks!
|
Hi @philippgille, your wished changes make absolute sense and are applied now. ✌️ |
philippgille
left a comment
There was a problem hiding this comment.
Hello, sorry for the long delay!
First of all, thanks for implementing the requested changes! 🙇♂️
I had another more thorough look and found some more things that can be improved.
Due to my delayed review I'd understand if improving the PR doesn't fit your schedule anymore, so just let me know if you prefer me to make those changes on my own.
| // The returned documents are a copy of the original documents, so they can be safely | ||
| // modified without affecting the collection. |
There was a problem hiding this comment.
The slice is new and can be modified without affecting the internal c.documents, but the documents themselves are not full copies. When you do x := y, then y's simple fields (int, string) are entirely new, but maps and slices are only_shallow_ copies.
Demonstration: https://go.dev/play/p/0OccI4ibtS2
See the above GetByID where the Metadata and Embedding fields are cloned separately to create an entirely new document.
So here we have two options:
- Change the Godoc to clarify that the documents are shallow copies, and only the slice is new. This still allows the receiver to work with the slice, like iterating over it and reading the documents, without concurrency issues during regular operations. For example chromem-go can still add new documents to its
c.Documentsmap, or delete them, and it doesn't affect the returned slice. Here's an example in chromem-go where something similar is done:Lines 517 to 522 in 8311eb0
- Or create a deep copy of documents. This can either be done by calling the
GetByIDfor each document, or by copying the code from that method. The former leads to less code, but one extra operation per document (thec.Documentslookup).
| ids := []string{"1", "2", "3", "4"} | ||
| metadatas := []map[string]string{{"foo": "bar"}, {"a": "b"}, {"foo": "bar"}, {"e": "f"}} | ||
| contents := []string{"hello world", "hallo welt", "bonjour le monde", "hola mundo"} | ||
| c.Add(context.Background(), ids, nil, metadatas, contents) |
There was a problem hiding this comment.
Here the returned error should be checked
| for _, doc := range docs { | ||
| if doc.Content == "hello world" { | ||
| break | ||
| } | ||
| } |
There was a problem hiding this comment.
Here the test doesn't assert whether the content was found or not. You can introduce a new variable found := false before the loop, set it found = true just before the break, and after the loop assert that its value is true.
|
There was a later PR from another contributor, which I think supersedes this: #118 Can you check if that enables you to do the merge of collections? |
Hey @philippgille, thanks for this great package!
I've had a hard time finding out that there is no possibility to kinda merge 2 existing collections. I appreciate your focus on staying as a simple package (as I've read in other Issues and Pull requests), so I avoided to extend the Import.. functions with
enableMergeattributes - but instead implemented the simplest approach I could come up with: getting all existing documents of a collection. This way the end-user (or -developer?) is at least able to fetch all documents and import them into another collection on their own way.I'm interested in your feedback and leave behind some happy greetings from Hamburg