Force TSV Column Order + Process AfusOwner Data#224
Conversation
… of columns for a given table. Refactored the code that maintains the list of columns - switched from a set to a list to maintain order. Simplified the logic that was determining when to pull out PK columns to the front of a file since we should only do that when running with the Firecloud option. For default output, the column order in the data models for each table already has the PKs at the front of the table.
Codecov Report
@@ Coverage Diff @@
## master #224 +/- ##
=======================================
Coverage 98.11% 98.11%
=======================================
Files 57 57
Lines 2179 2179
Branches 191 191
=======================================
Hits 2138 2138
Misses 41 41 Continue to review full report at Codecov.
|
aherbst-broad
left a comment
There was a problem hiding this comment.
I think we need to think about the column lists.
| # DATA MODELS - Correct Ordering of Columns | ||
| def getColumnOrder(table_name): | ||
| if (table_name == "cslb"): | ||
| column_order = ['dog_id', 'cslb_date', 'cslb_year', 'cslb_pace', 'cslb_stare', 'cslb_stuck', 'cslb_recognize', 'cslb_walk_walls', 'cslb_avoid', 'cslb_find_food', 'cslb_pace_6mo', 'cslb_stare_6mo', 'cslb_defecate_6mo', 'cslb_food_6mo', 'cslb_recognize_6mo', 'cslb_active_6mo', 'cslb_score', 'cslb_other_changes'] |
There was a problem hiding this comment.
🔧 I think these column lists are going to be unmaintainable. They should be externalized to a file and parsed from there. I also worry a bit about what happens if there are new columns etc., and how we remember to keep these updated.
We may be able to kill two birds with one stone by using the schema files as the source of truth for column ordering and parsing those instead. The tricky bit there is those are external to the dagster docker container context and we'd need to figure out a way to get those included in the built image.
Why
We would like to maintain the order of the original data model (schema) for columns in the final TSVs.
We also need to process afus_owner data and generate a TSV.
Relevant ticket
This PR
Extends the tsv_convert script to process afus_owner.
Forces the column order to hardcoded lists for tsv column order.
Checklist