Perf: Use hackyOffset if it parses date strings in the expected format#1580
Perf: Use hackyOffset if it parses date strings in the expected format#1580schleyfox wants to merge 2 commits intomoment:masterfrom
Conversation
The time zone offset of a date can be computed on platforms that support it (anything that's not super ancient) by using Intl.DateTimeFormat.formatToParts with en-US to output an array of the date components. For legacy reasons, you can also generate a date string using Intl.DateTimeFormat.format which can be parsed into an array using regexes. The string/regex approach (hackyOffset) is way faster (2-4x), but much more susceptible to weird client configurations. This detects whether hackyOffset is able to parse a known date correctly, and uses it if it does.
| */ | ||
| static hackyOffsetParsesCorrectly() { | ||
| if (hackyOffsetParsesCorrectly === undefined) { | ||
| const dtf = makeDTF("UTC"); |
There was a problem hiding this comment.
This is a useless DTF to cache since we use fixed offset zones for UTC. Unclear if there's a real zone that would make more sense to use here (I'm biased towards America/New_York)
|
/easycla |
|
This is a clever optimization and I'm not against it. But one question I have is how many calls it takes to amortize the cost of the upfront check. Do you have a sense of that? For all I know, it just takes one. |
|
that's a very good point import * as b from "benny"
const ts = 1710020705493
const jsDate = new Date(ts)
const utcTestDate = new Date(Date.UTC(1969, 11, 31, 15, 45, 55))
function makeDtf(zone: string) {
return new Intl.DateTimeFormat("en-US", {
hour12: false,
timeZone: zone,
year: "numeric",
month: "2-digit",
day: "2-digit",
hour: "2-digit",
minute: "2-digit",
second: "2-digit",
era: "short",
})
}
// from luxon
function hackyOffset(dtf: any, date: any) {
const formatted = dtf.format(date).replace(/\u200E/g, ""),
parsed = /(\d+)\/(\d+)\/(\d+) (AD|BC),? (\d+):(\d+):(\d+)/.exec(formatted),
[, fMonth, fDay, fYear, fadOrBc, fHour, fMinute, fSecond] = parsed ?? []
return [fYear, fMonth, fDay, fadOrBc, fHour, fMinute, fSecond]
}
const typeToPos = {
year: 0,
month: 1,
day: 2,
era: 3,
hour: 4,
minute: 5,
second: 6,
}
function partsOffset(dtf: any, date: any) {
const formatted = dtf.formatToParts(date)
const filled = []
for (let i = 0; i < formatted.length; i++) {
const { type, value } = formatted[i]
// @ts-expect-error[implicit-any-index-incorrect] hidden by suppressImplicitAnyIndexErrors
const pos = typeToPos[type]
if (type === "era") {
filled[pos] = value
} else if (pos !== undefined) {
filled[pos] = parseInt(value, 10)
}
}
return filled
}
const dtf = makeDtf("America/New_York")
void b.suite(
"Intl.DateTimeFormat",
b.add("hackyOffset", () => {
hackyOffset(dtf, jsDate)
}),
b.add("partsOffset", () => {
partsOffset(dtf, jsDate)
}),
b.add("makeDtf(UTC) no-cache", () => {
makeDtf("UTC")
}),
b.add("uncached partsOffset", () => {
const dtf = makeDtf("America/New_York")
partsOffset(dtf, jsDate)
}),
b.add("uncached hackyOffset", () => {
const utcDtf = makeDtf("UTC")
hackyOffset(utcDtf, utcTestDate)
const dtf = makeDtf("America/New_York")
hackyOffset(dtf, jsDate)
}),
b.cycle(),
b.complete()
)DTF construction is very slow, so doing it twice is quite bad. I don't know why the error bars are so high for it, however. I think this math works, but let me know if it makes sense to you. Analytically, we can see that constructing a DTF takes 1MM/27,116 = 36.9us . hackyOffset takes 1MM/589,933 = 1.7us, while partsOffset takes 1MM/234,358= 4.3us we then expect a single uncached partsOffset to take 36.9us + 4.3us = 41.2us (observed is 44us) 77.2 - 41.2 = 36us to make up. We would save 4.3 - 1.7 = 2.6us per call, so 36/2.6 = 13.8 calls before our optimization pays off. Note that we may have to call |
|
How about just adding Settings.setUseHackyOffset()? Seems like that would allow specific applications that do a ton of formats to speed up while sacrificing a few locales, and others to not worry about it. |
This is part of a series of PRs based on performance work we have done to
improve a use-case involving parsing/formatting hundreds of thousands of dates
where luxon was the bottleneck.
This includes the commit from #1574 to establish the benchmark
This is the sketchy optimization of the bunch and I would understand not wanting to support this, but it is much, much faster. In our observations, about 1.5% of web traffic outputs in a format that doesn't parse correctly, so we fall back to
formatToParts.The time zone offset of a date can be computed on platforms that support
it (anything that's not super ancient) by using
Intl.DateTimeFormat.formatToParts with en-US to output an array of the
date components. For legacy reasons, you can also generate a date string
using Intl.DateTimeFormat.format which can be parsed into an array using
regexes. The string/regex approach (hackyOffset) is way faster (2-4x),
but much more susceptible to weird client configurations.
This detects whether hackyOffset is able to parse a known date
correctly, and uses it if it does.
Benchmark Comparison (
name | before | after | after/before):