Skip to content

CSV header row is repeated when field is not matched #599

Description

@memen45
~/my-venv/bin/invoice2data -t . i*.pdf --debug --input-reader pdftotext --output-format csv

Processing multiple files where some fields are missing for a single pdf input.

Actual Output

This results in a broken CSV where the CSV header row is repeated halfway the file:

issuer,amount,amount_tax,date,invoice_number,vat,partner_name,country_code,partner_coc,iban,bic,currency,desc
"Company B.V.",1.00,0.21,2024/01/31,1,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/02/29,2,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/03/31,3,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/04/30,4,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/05/31,5,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/06/30,6,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
issuer,amount,amount_tax,date,invoice_number,vat,partner_name,country_code,partner_coc,currency,desc,,
"Company B.V.",1.00,0.21,2024/07/31,7,XX123456789XXX,"Company B.V.",NL,12345678,EUR,"Invoice from Company B.V.",,
issuer,amount,amount_tax,date,invoice_number,vat,partner_name,country_code,partner_coc,iban,bic,currency,desc
"Company B.V.",1.00,0.21,2024/08/31,8,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/09/30,9,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/10/31,10,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/11/30,11,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/12/31,12,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."

This makes the exported CSV file not usable for import into other software, as it is not a valid CSV document.

Expected Output

I would expect the headers to be removed and the noncritical fields remain empty for the invoices that could not be matched, like below:

issuer,amount,amount_tax,date,invoice_number,vat,partner_name,country_code,partner_coc,iban,bic,currency,desc
"Company B.V.",1.00,0.21,2024/01/31,1,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/02/29,2,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/03/31,3,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/04/30,4,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/05/31,5,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/06/30,6,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/07/31,7,XX123456789XXX,"Company B.V.",NL,12345678,,,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/08/31,8,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/09/30,9,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/10/31,10,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/11/30,11,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."
"Company B.V.",1.00,0.21,2024/12/31,12,XX123456789XXX,"Company B.V.",NL,12345678,"NLXX XXXX XXXX XXXX XX",XXXXNLXX,EUR,"Invoice from Company B.V."

Is there any setting that would fix this or is this a bug that should be fixed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions