Skip to content

Force TSV Column Order + Process AfusOwner Data#224

Open
quazi-h wants to merge 3 commits intomasterfrom
qh-force-tsv-column-order
Open

Force TSV Column Order + Process AfusOwner Data#224
quazi-h wants to merge 3 commits intomasterfrom
qh-force-tsv-column-order

Conversation

@quazi-h
Copy link
Copy Markdown
Contributor

@quazi-h quazi-h commented Dec 14, 2021

Why

We would like to maintain the order of the original data model (schema) for columns in the final TSVs.
We also need to process afus_owner data and generate a TSV.
Relevant ticket

This PR

Extends the tsv_convert script to process afus_owner.
Forces the column order to hardcoded lists for tsv column order.

Checklist

  • Documentation has been updated as needed.

… of columns for a given table.

Refactored the code that maintains the list of columns - switched from a set to a list to maintain order.
Simplified the logic that was determining when to pull out PK columns to the front of a file since we should only do that when running with the Firecloud option.
For default output, the column order in the data models for each table already has the PKs at the front of the table.
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 14, 2021

Codecov Report

Merging #224 (603184c) into master (aecff27) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #224   +/-   ##
=======================================
  Coverage   98.11%   98.11%           
=======================================
  Files          57       57           
  Lines        2179     2179           
  Branches      191      191           
=======================================
  Hits         2138     2138           
  Misses         41       41           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aecff27...603184c. Read the comment docs.

Copy link
Copy Markdown
Contributor

@aherbst-broad aherbst-broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to think about the column lists.

# DATA MODELS - Correct Ordering of Columns
def getColumnOrder(table_name):
if (table_name == "cslb"):
column_order = ['dog_id', 'cslb_date', 'cslb_year', 'cslb_pace', 'cslb_stare', 'cslb_stuck', 'cslb_recognize', 'cslb_walk_walls', 'cslb_avoid', 'cslb_find_food', 'cslb_pace_6mo', 'cslb_stare_6mo', 'cslb_defecate_6mo', 'cslb_food_6mo', 'cslb_recognize_6mo', 'cslb_active_6mo', 'cslb_score', 'cslb_other_changes']
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 I think these column lists are going to be unmaintainable. They should be externalized to a file and parsed from there. I also worry a bit about what happens if there are new columns etc., and how we remember to keep these updated.

We may be able to kill two birds with one stone by using the schema files as the source of truth for column ordering and parsing those instead. The tricky bit there is those are external to the dagster docker container context and we'd need to figure out a way to get those included in the built image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants