Skip to content

Improved intersection#13

Open
psy-man wants to merge 3 commits into
bcherny:masterfrom
psy-man:master
Open

Improved intersection#13
psy-man wants to merge 3 commits into
bcherny:masterfrom
psy-man:master

Conversation

@psy-man

@psy-man psy-man commented Jan 16, 2018

Copy link
Copy Markdown

Hi @bcherny.

There is a little issue in intersection solution: if the right array has any duplicates, they will appear in the result.
intersection([1, 5, 4, 2], [8, 91, 4, 1, 3, 8, 91]); // [ 4, 1, 8, 91 ]

I rewrote this function to es6 using Set.

@bcherny

bcherny commented Jan 18, 2018

Copy link
Copy Markdown
Owner

@psy-man thanks for the contribution! Please add a test that would not have passed before, but does pass now.

@bcherny

bcherny commented Jan 21, 2018

Copy link
Copy Markdown
Owner

Thanks for updating the test. The bug you found is a real bug, but the solution could be improved.

If we're taking the intersection of 2 arrays of lengths M and N, the current solution runs in O(M+N) time. Your solution is substantially slower, and runs in O(M+M*N²) time. The reason is that Set#has and Set#set run in linear time (see https://www.ecma-international.org/ecma-262/6.0/#sec-set.prototype.has).

A more efficient solution might avoid Sets for their poor performance. If you want to update, I'll be happy to merge.

@psy-man

psy-man commented Jan 21, 2018

Copy link
Copy Markdown
Author

Hi @bcherny

I've created a few possible solutions and tested with 2 arrays: 4646 and 1103 lengths.
Could you check please Benchmarks

intersectionNew - fix for current solution
intersectionNewWithSet - solution with Set
intersectionNewSimple - solution without reduce and concat

For me, the solution with Sets was the most faster.
We also can improve performance via comparing the lengths of arrays and start with the smaller one.

I'll add a commit after your decision.

@DarkPurple141

Copy link
Copy Markdown

Is this a case of the average case actually nullifying the worse case big O?

I did read the link you provided but if it truly was the case that Set was outperformed on its most standard ops by {} it's hard to understand why ECMA would have introduced Map and Set at all.

From ECMA also:

Set objects must be implemented using either hash tables or other mechanisms that, on average, provide access times that are sublinear on the number of elements in the collection. The data structures used in this Set objects specification is only intended to describe the required observable semantics of Set objects. It is not intended to be a viable implementation model.

Here's a benchmark from three years ago: https://github.com/anvaka/set-vs-object

It seems that most browsers have optimised them beyond the performance of old school key in Object checks.

I ran across across a few different browsers, mostly seems like with the length optimisation (ie choose the smaller array to iterate through), Set performs best, with the IntersectionNewSimple second best (but not by much).

@givehug

givehug commented Jul 26, 2019

Copy link
Copy Markdown

You don't need second table/set

function intersection(left, right) {
	let seen = left.reduce((seen, item) => {
		seen[item] = true
		return seen
	}, {})

	return right.filter(item => {
		const include = item in seen
		delete seen[item]
		return include
	})
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants