Add date_order locale option and flexible date separator parsing#623
Open
hidekoji wants to merge 4 commits into
Open
Add date_order locale option and flexible date separator parsing#623hidekoji wants to merge 4 commits into
hidekoji wants to merge 4 commits into
Conversation
235789b to
4f2aa0a
Compare
5 tasks
`locale()` gains a `date_order` argument so dates and date-times can be
parsed with an explicit component order ("mdy", "dmy", "ymd_hms", etc.).
This makes year-last formats such as 10/02/2024 readable, which the
automatic type guesser would otherwise treat as character.
Date and date-time auto-detection now also accepts any non-alphanumeric
separator between components and falls back to a year-last heuristic that
disambiguates D/M/YYYY vs M/D/YYYY (defaulting to MDY when ambiguous).
Adds end-to-end vroom() tests covering explicit date_order, auto MDY/DMY
detection, separator variants, and YMD backward compatibility.
4f2aa0a to
7ffee32
Compare
utils::unzip(list=TRUE) on Windows R 4.2.x garbles non-ASCII entry names (e.g. ä → <84>), causing unz() to fail to locate the entry. When the archive package is installed, route zip reading through archive::archive_read() which handles UTF-8 entry names correctly. Preserves existing behaviour when archive is not installed (falls back to the utils::unzip / unz() path). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…time heuristic The year-last date heuristic only accepted 4-digit years (it read the year with consumeInteger(4, exact=true) and rejected year < 1000), so values like 5/29/26 were guessed as character. Datetime auto-detection only handled ISO8601. - Add consumeYearFlexible(): consume a 2- or 4-digit year, applying the same pivot as the %y format specifier (00-68 -> 2000s, 69-99 -> 1900s) and rejecting implausible 3-digit values. Route parseYearLastHeuristic() and the parseDateOrder() year component through it. - Add parseYearLastHeuristicDateTime() (year-last date + T/space + HH[:MM[:SS]] with optional tz) and wire it into guess_type isDateTime() and vroom_dttm.cc materialization as an ISO8601 fallback, so MDY/DMY datetimes are recognized. - Extract disambiguateDayMonth() shared by both heuristics. - Tests for 2-digit MDY/DMY/ambiguous dates, the %y pivot, invalid/3-digit rejection, 2- and 4-digit MDY datetimes, and explicit date_order with 2-digit years.
| csv <- "id,date\n1,5/29/26\n2,5/31/26\n3,12/25/26" | ||
| result <- vroom::vroom(I(csv), show_col_types = FALSE) | ||
| expect_s3_class(result$date, "Date") | ||
| expect_equal(result$date, as.Date(c("2026-05-29", "2026-05-31", "2026-12-25"))) |
There was a problem hiding this comment.
[air] reported by reviewdog 🐶
Suggested change
| expect_equal(result$date, as.Date(c("2026-05-29", "2026-05-31", "2026-12-25"))) | |
| expect_equal( | |
| result$date, | |
| as.Date(c("2026-05-29", "2026-05-31", "2026-12-25")) | |
| ) |
|
|
||
| test_that("vroom() does not treat invalid or 3-digit-year values as year-last dates", { | ||
| for (v in c("13/25/26", "100/200/300")) { | ||
| result <- vroom::vroom(I(paste0("x\n", v, "\n")), delim = ",", show_col_types = FALSE) |
There was a problem hiding this comment.
[air] reported by reviewdog 🐶
Suggested change
| result <- vroom::vroom(I(paste0("x\n", v, "\n")), delim = ",", show_col_types = FALSE) | |
| result <- vroom::vroom( | |
| I(paste0("x\n", v, "\n")), | |
| delim = ",", | |
| show_col_types = FALSE | |
| ) |
| }) | ||
|
|
||
| test_that("vroom() reads 2-digit-year dates with explicit date_order", { | ||
| res_mdy <- vroom::vroom(I("id,date\n1,5/29/26\n2,3/15/26"), locale = locale(date_order = "mdy"), show_col_types = FALSE) |
There was a problem hiding this comment.
[air] reported by reviewdog 🐶
Suggested change
| res_mdy <- vroom::vroom(I("id,date\n1,5/29/26\n2,3/15/26"), locale = locale(date_order = "mdy"), show_col_types = FALSE) | |
| res_mdy <- vroom::vroom( | |
| I("id,date\n1,5/29/26\n2,3/15/26"), | |
| locale = locale(date_order = "mdy"), | |
| show_col_types = FALSE | |
| ) |
| expect_s3_class(res_mdy$date, "Date") | ||
| expect_equal(res_mdy$date, as.Date(c("2026-05-29", "2026-03-15"))) | ||
|
|
||
| res_dmy <- vroom::vroom(I("id,date\n1,29/5/26\n2,15/3/26"), locale = locale(date_order = "dmy"), show_col_types = FALSE) |
There was a problem hiding this comment.
[air] reported by reviewdog 🐶
Suggested change
| res_dmy <- vroom::vroom(I("id,date\n1,29/5/26\n2,15/3/26"), locale = locale(date_order = "dmy"), show_col_types = FALSE) | |
| res_dmy <- vroom::vroom( | |
| I("id,date\n1,29/5/26\n2,15/3/26"), | |
| locale = locale(date_order = "dmy"), | |
| show_col_types = FALSE | |
| ) |
Author
|
Added a commit extending the year-last date heuristic to 2-digit years (e.g.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
date_orderargument tolocale()and makes date / date-time auto-detection more forgiving, so columns of year-last dates (e.g.10/02/2024) can be read as dates instead of being guessed as character.locale(date_order =)— new optional argument accepting an explicit component order:"ymd","mdy","dmy", etc., optionally with a time suffix ("mdy_hms","dmy_hm","ymd_h").NULL(default) keeps the current automatic behaviour. Validated in R with a clearcli_abort()message.DateTimeParser::parseDateOrder()— parses a value against an explicit order, including an optionalT/space-separated time part.DateTimeParser::parseYearLastHeuristic()— recognises unambiguousD/M/YYYYvsM/D/YYYY(part > 12disambiguates; defaults to MDY when ambiguous, the US convention). Used as an auto-detection fallback inisDate()/parse_date().parseISO8601()/parseDate()now accept any non-alphanumeric separator between date components (2024.10.02,2024/10/02, …), similar to lubridate'symd().parse_date()/parse_dttm()now receive theLocaleInfo*so they can honourdate_order.When
date_orderis set,guess_type()routes date-only orders toisDate()and time-suffixed orders toisDateTime(), and will not cross-match the other kind.Test plan
tests/testthat/test-datetime.Rgains end-to-endvroom()coverage:date_orderfor MDY / DMY dates andmdy_hms/dmy_hmsdate-timesdate_order