Skip to content

An algorithm for pairing #174

@giuliogcantone

Description

@giuliogcantone

Consider the following:

tibble(
  id = c(1:8),
  g = rep(c("A","B"),4),
  v1 = rnorm(8),
  v2 = rnorm(8),
  v3 = rnorm(8)
) -> obs

I'd like a method that produces something that would look as follows:

obs %>%
  mutate(cluster = sample(c(1:2),replace = F) %>% rep(2),
         .by = g) %>%
  mutate(pair = str_c(g,cluster)) %>%
  arrange(pair)

As you can notice, these pairs are random. Instead, I want these pair being elicited minimising - even not-optimally- a multivariate distance across vars, for example Mahalanobis.

Then I would expect something like this

obs %>% mutate(cluster = f(input_cols = starts_with("v"),dist="mahalanobis"))

And finally, if the set of obs is odd, one element is not going to be paired.
Notice that there is no supervision and no cl parameter.

Now, according to ChatGPT, this kind of f pairing algorithm does not exist at least within the options of this package.
But I am still doubtful so here I come asking if the machine said the truth, and in case to propose to develop it.

Notice that this is quite similar to how package MatchIt works, with the difference that MatchIt always pairs ("match") discordant rows from a binary x variable, such that the formula would be x ~ starts_with("v").
This is not the case, since there is no x variable in obs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions