Skip to content

Support for ScalarSubqueryExec and ScalarSubqueryExpr #471

@LiaCastaneda

Description

@LiaCastaneda

PR apache/datafusion#21240 introduced a node to execute subqueries, ScalarSubqueryExec which structure holds a shared state:

pub struct ScalarSubqueryExec {
    ...
    results: ScalarSubqueryResults,

}
pub struct ScalarSubqueryResults {
    slots: Arc<Vec<Mutex<Option<ScalarValue>>>>,
}

where the a given subquery has an index to fill it's corresponding slot at runtime:

pub struct ScalarSubqueryExpr {
   ...
    results: ScalarSubqueryResults,
}

If ScalarSubqueryExec and the node holding the ScalarSubqueryExpr are in different stages, the query fails during deserialization with the following error Internal("ScalarSubqueryExpr can only be deserialized as part of a surrounding ScalarSubqueryExec") . Even if we edited the decoder to create a fresh ScalarSubqueryResults on the worker, the result wouldn't be correct I think, the worker's slots are a different Arc from the coordinator, so the writer would never fill them. This looks like a similar problem to Dynamic Filtering (where the HashJoinExec holds the dynamic filter in a shared Arc with the probe/consumer). Note that DynamicFilterPhysicalExpr does not fail on serialization and we will never see correctness issues since it's an optimization.

For now we will be gating the new node behind a session property apache/datafusion#22530. This change will probably stay to a couple of releases more (until distributed execution engines adapt), so we need to add support for this eventually.

Some raw ideas:

  • Implement a communication mechanism that works for both use cases: ScalarSubqueryExec, ScalarSubqueryExpr and DynamicFilterPhysicalExpr.
  • Do not have network boundaries between ScalarSubqueryExec and the node containing the ScalarSubqueryExpr.

cc @jayshrivastava @adriangb

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions