Skip to content

transpose=true returns plausible but incorrect results #1172

@LilithHafner

Description

@LilithHafner

I was doing some data handling with CSV and at the end of my pipeline, the graphs looked plausible, but results were unexpected. I tracked it down to this correctness bug in CSV.jl:

julia> using CSV; CSV.File(IOBuffer("""
       Alpha,,3982,16603,,,,"40*",95,4027,,,
       Beta,,,2664,2716,,,"0*",15,833,,,
       Gamma,,,,1641,1707,1762,1814,1861,1913,,,
       """), transpose=true)
12-element CSV.File:
 (Alpha = String7("40*"), Beta = String3("0*"), Gamma = missing)
 (Alpha = missing, Beta = missing, Gamma = missing)
 (Alpha = missing, Beta = missing, Gamma = missing)
 (Alpha = missing, Beta = missing, Gamma = 1641)
 (Alpha = missing, Beta = missing, Gamma = 1707)
 (Alpha = missing, Beta = missing, Gamma = 1762)
 (Alpha = missing, Beta = missing, Gamma = 1814)
 (Alpha = String7("95"), Beta = String3("15"), Gamma = 1861)
 (Alpha = String7("4027"), Beta = String3("833"), Gamma = 1913)
 (Alpha = missing, Beta = missing, Gamma = missing)
 (Alpha = missing, Beta = missing, Gamma = missing)
 (Alpha = missing, Beta = missing, Gamma = missing)

Or, more readable but with more deps

julia> using CSV, DataFrames; CSV.read(IOBuffer("""
       Alpha,,3982,16603,,,,"40*",95,4027,,,
       Beta,,,2664,2716,,,"0*",15,833,,,
       Gamma,,,,1641,1707,1762,1814,1861,1913,,,
       """), DataFrame, transpose=true)
12×3 DataFrame
 Row │ Alpha     Beta      Gamma   
     │ String7?  String7?  Int64?  
─────┼─────────────────────────────
   1 │ 40*       0*        missing 
   2 │ 95        15        missing 
   3 │ 4027      833       missing 
   4 │ missing   missing      1641
  ⋮  │    ⋮         ⋮         ⋮
  10 │ missing   missing   missing 
  11 │ missing   missing   missing 
  12 │ missing   missing   missing 
                     5 rows omitted

julia> using CSV, DataFrames; CSV.read(IOBuffer("""
       Alpha,,3982,16603,,,,"40*",95,4027,,,
       Beta,,,2664,2716,,,"0*",15,833,,,
       Gamma,,,,1641,1707,1762,1814,1861,1913,,,
       """), DataFrame, transpose=false)
2×13 DataFrame
 Row │ Alpha    Column2  3982     16603    Column5  Column6  Column7  40*      95     4027   Column11  Column12  Column13 
     │ String7  Missing  Missing  Int64?   Int64    Int64?   Int64?   String7  Int64  Int64  Missing   Missing   Missing  
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Beta     missing  missing     2664     2716  missing  missing  0*          15    833   missing   missing   missing 
   2 │ Gamma    missing  missing  missing     1641     1707     1762  1814      1861   1913   missing   missing   missing 

This also segfaults from time to time.

It didn't happen before I introduced the *s.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions