Skip to content

Add date_order locale option and flexible date separator parsing#623

Open
hidekoji wants to merge 4 commits into
tidyverse:mainfrom
hidekoji:feature/date-order-multi-format
Open

Add date_order locale option and flexible date separator parsing#623
hidekoji wants to merge 4 commits into
tidyverse:mainfrom
hidekoji:feature/date-order-multi-format

Conversation

@hidekoji

Copy link
Copy Markdown

Summary

Adds a date_order argument to locale() and makes date / date-time auto-detection more forgiving, so columns of year-last dates (e.g. 10/02/2024) can be read as dates instead of being guessed as character.

  • locale(date_order =) — new optional argument accepting an explicit component order: "ymd", "mdy", "dmy", etc., optionally with a time suffix ("mdy_hms", "dmy_hm", "ymd_h"). NULL (default) keeps the current automatic behaviour. Validated in R with a clear cli_abort() message.
  • DateTimeParser::parseDateOrder() — parses a value against an explicit order, including an optional T/space-separated time part.
  • DateTimeParser::parseYearLastHeuristic() — recognises unambiguous D/M/YYYY vs M/D/YYYY (part > 12 disambiguates; defaults to MDY when ambiguous, the US convention). Used as an auto-detection fallback in isDate() / parse_date().
  • Flexible separatorsparseISO8601() / parseDate() now accept any non-alphanumeric separator between date components (2024.10.02, 2024/10/02, …), similar to lubridate's ymd().
  • parse_date() / parse_dttm() now receive the LocaleInfo* so they can honour date_order.

When date_order is set, guess_type() routes date-only orders to isDate() and time-suffixed orders to isDateTime(), and will not cross-match the other kind.

Test plan

tests/testthat/test-datetime.R gains end-to-end vroom() coverage:

  • explicit date_order for MDY / DMY dates and mdy_hms / dmy_hms date-times
  • auto-detection of unambiguous DMY year-last dates without date_order
  • ambiguous year-last dates default to MDY
  • NA handling and dot/slash/dash separator variants
  • existing YMD / ISO-8601 behaviour unchanged (backward compatibility)

`locale()` gains a `date_order` argument so dates and date-times can be
parsed with an explicit component order ("mdy", "dmy", "ymd_hms", etc.).
This makes year-last formats such as 10/02/2024 readable, which the
automatic type guesser would otherwise treat as character.

Date and date-time auto-detection now also accepts any non-alphanumeric
separator between components and falls back to a year-last heuristic that
disambiguates D/M/YYYY vs M/D/YYYY (defaulting to MDY when ambiguous).

Adds end-to-end vroom() tests covering explicit date_order, auto MDY/DMY
detection, separator variants, and YMD backward compatibility.
@hidekoji hidekoji force-pushed the feature/date-order-multi-format branch from 4f2aa0a to 7ffee32 Compare May 19, 2026 19:23
hidekoji and others added 3 commits May 25, 2026 11:23
utils::unzip(list=TRUE) on Windows R 4.2.x garbles non-ASCII
entry names (e.g. ä → <84>), causing unz() to fail to locate
the entry. When the archive package is installed, route zip
reading through archive::archive_read() which handles UTF-8
entry names correctly.

Preserves existing behaviour when archive is not installed
(falls back to the utils::unzip / unz() path).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…time heuristic

The year-last date heuristic only accepted 4-digit years (it read the year
with consumeInteger(4, exact=true) and rejected year < 1000), so values like
5/29/26 were guessed as character. Datetime auto-detection only handled ISO8601.

- Add consumeYearFlexible(): consume a 2- or 4-digit year, applying the same
  pivot as the %y format specifier (00-68 -> 2000s, 69-99 -> 1900s) and rejecting
  implausible 3-digit values. Route parseYearLastHeuristic() and the parseDateOrder()
  year component through it.
- Add parseYearLastHeuristicDateTime() (year-last date + T/space + HH[:MM[:SS]]
  with optional tz) and wire it into guess_type isDateTime() and vroom_dttm.cc
  materialization as an ISO8601 fallback, so MDY/DMY datetimes are recognized.
- Extract disambiguateDayMonth() shared by both heuristics.
- Tests for 2-digit MDY/DMY/ambiguous dates, the %y pivot, invalid/3-digit
  rejection, 2- and 4-digit MDY datetimes, and explicit date_order with 2-digit years.
csv <- "id,date\n1,5/29/26\n2,5/31/26\n3,12/25/26"
result <- vroom::vroom(I(csv), show_col_types = FALSE)
expect_s3_class(result$date, "Date")
expect_equal(result$date, as.Date(c("2026-05-29", "2026-05-31", "2026-12-25")))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[air] reported by reviewdog 🐶

Suggested change
expect_equal(result$date, as.Date(c("2026-05-29", "2026-05-31", "2026-12-25")))
expect_equal(
result$date,
as.Date(c("2026-05-29", "2026-05-31", "2026-12-25"))
)


test_that("vroom() does not treat invalid or 3-digit-year values as year-last dates", {
for (v in c("13/25/26", "100/200/300")) {
result <- vroom::vroom(I(paste0("x\n", v, "\n")), delim = ",", show_col_types = FALSE)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[air] reported by reviewdog 🐶

Suggested change
result <- vroom::vroom(I(paste0("x\n", v, "\n")), delim = ",", show_col_types = FALSE)
result <- vroom::vroom(
I(paste0("x\n", v, "\n")),
delim = ",",
show_col_types = FALSE
)

})

test_that("vroom() reads 2-digit-year dates with explicit date_order", {
res_mdy <- vroom::vroom(I("id,date\n1,5/29/26\n2,3/15/26"), locale = locale(date_order = "mdy"), show_col_types = FALSE)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[air] reported by reviewdog 🐶

Suggested change
res_mdy <- vroom::vroom(I("id,date\n1,5/29/26\n2,3/15/26"), locale = locale(date_order = "mdy"), show_col_types = FALSE)
res_mdy <- vroom::vroom(
I("id,date\n1,5/29/26\n2,3/15/26"),
locale = locale(date_order = "mdy"),
show_col_types = FALSE
)

expect_s3_class(res_mdy$date, "Date")
expect_equal(res_mdy$date, as.Date(c("2026-05-29", "2026-03-15")))

res_dmy <- vroom::vroom(I("id,date\n1,29/5/26\n2,15/3/26"), locale = locale(date_order = "dmy"), show_col_types = FALSE)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[air] reported by reviewdog 🐶

Suggested change
res_dmy <- vroom::vroom(I("id,date\n1,29/5/26\n2,15/3/26"), locale = locale(date_order = "dmy"), show_col_types = FALSE)
res_dmy <- vroom::vroom(
I("id,date\n1,29/5/26\n2,15/3/26"),
locale = locale(date_order = "dmy"),
show_col_types = FALSE
)

@hidekoji

hidekoji commented Jun 1, 2026

Copy link
Copy Markdown
Author

Added a commit extending the year-last date heuristic to 2-digit years (e.g. 5/29/26), plus a year-last datetime heuristic:

  • consumeYearFlexible() consumes a 2- or 4-digit year and applies the standard %y pivot (00–68 → 2000s, 69–99 → 1900s), rejecting implausible 3-digit values; the year-last date heuristic and the date_order year component route through it.
  • parseYearLastHeuristicDateTime() recognizes M/D/Y / D/M/Y date + HH[:MM[:SS]] (optional tz); wired into the guesser and the materialization path so MDY/DMY datetimes are detected.
  • New tests: 2-digit MDY/DMY/ambiguous dates, the %y pivot, invalid/3-digit rejection, 2- and 4-digit MDY datetimes, and explicit date_order with 2-digit years. Existing date/datetime tests unchanged (incl. "%Y requires 4 digits" — explicit %Y/%y format paths are untouched). DESCRIPTION not bumped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant