Update Hubspot Source to use API Incrementally#662
Update Hubspot Source to use API Incrementally#662kevin-induro wants to merge 7 commits intodlt-hub:masterfrom
Conversation
There was a problem hiding this comment.
Review Summary
Reviewed and confirmed schema-level backward compatibility, table names, columns, and keys remain unchanged.
Behaviorally, the switch to the Search API introduces a few differences:
- Associations are fetched per record → may increase latency and rate-limit risk for large datasets.
- 10k record limit → may trigger recursive fetch or full reload when timestamp overlaps occur.
- Archived records always reload in full → gradual performance slowdown as archive grows.
- Overall performance becomes variable (faster for small deltas, slower for heavy associations).
Everything else looks solid. Small improvements possible, mainly around improving the documentation to clearly describe these changes and set expectations.
PR looks good to approve.
@anuunchin, could you please have a passing look in case there’s anything else to refine or clarify?
Thanks for the great work @kevin-induro! 🙌
|
@dat-a-man I appreciate the vote of confidence! This has been pending for a while now. I have a series of small improvements I've made beyond this PR, but I didn't want to add to this while it was pending. Is there anything I should do to push this through? Should I add my additional fixes to this? Would you like me to update the documentation with the comments you made? |
Tell us what you do here
Short description
The primary motivation is to switch the basic resources (i.e. those created from
ALL_OBJECTSwith thecrm_objectsfunction) from fetching all the Hubspot objects to fetching only those needed based on the incremental load date.Additional Context
There are still a few different things that can be improved.
*_property_historytables have columns for all the properties of the object and null values for all those columns. It should really just have the columns_dlt_id,_dlt_load_id,object_id,property_name,source_id,source_type,timestamp,updated_by_user_id,value