Skip to content

Latest commit

 

History

History
115 lines (97 loc) · 8.51 KB

File metadata and controls

115 lines (97 loc) · 8.51 KB

DataFusion Comet 0.3.0 Changelog

This release consists of 57 commits from 12 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: Support type coercion for ScalarUDFs #865 (Kimahriman)
  • fix: CometTakeOrderedAndProjectExec native scan node should use child operator's output #896 (viirya)
  • fix: Fix various memory leaks problems #890 (Kontinuation)
  • fix: Add output to Comet operators equal and hashCode #902 (viirya)
  • fix: Fallback to Spark when cannot resolve AttributeReference #926 (viirya)
  • fix: Fix memory bloat caused by holding too many unclosed ArrowReaderIterators #929 (Kontinuation)
  • fix: Normalize NaN and zeros for floating number comparison #953 (viirya)
  • fix: window function range offset should be long instead of int #733 (huaxingao)
  • fix: CometScanExec on Spark 3.5.2 #915 (Kimahriman)
  • fix: div and rem by negative zero #960 (kazuyukitanimura)

Performance related:

  • perf: Optimize CometSparkToColumnar for columnar input #892 (mbutrovich)
  • perf: Fall back to Spark if query uses DPP with v1 data sources #897 (andygrove)
  • perf: Report accurate total time for scans #916 (andygrove)
  • perf: Add metric for time spent casting in native scan #919 (andygrove)
  • perf: Add criterion benchmark for aggregate expressions #948 (andygrove)
  • perf: Add metric for time spent in CometSparkToColumnarExec #931 (mbutrovich)
  • perf: Optimize decimal precision check in decimal aggregates (sum and avg) #952 (andygrove)

Implemented enhancements:

  • feat: Add config option to enable converting CSV to columnar #871 (andygrove)
  • feat: Implement basic version of string to float/double/decimal #870 (andygrove)
  • feat: Implement to_json for subset of types #805 (andygrove)
  • feat: Add ShuffleQueryStageExec to direct child node for CometBroadcastExchangeExec #880 (viirya)
  • feat: Support sort merge join with a join condition #553 (viirya)
  • feat: Array element extraction #899 (Kimahriman)
  • feat: date_add and date_sub functions #910 (mbutrovich)
  • feat: implement scripts for binary release build #932 (parthchandra)
  • feat: Publish artifacts to maven #946 (parthchandra)

Documentation updates:

  • doc: Documenting Helm chart for Comet Kube execution #874 (comphead)
  • doc: Update native code path in development #921 (viirya)
  • docs: Add more detailed architecture documentation #922 (andygrove)

Other:

  • chore: Update installation.md #869 (haoxins)
  • chore: Update versions to 0.3.0 / 0.3.0-SNAPSHOT #868 (andygrove)
  • chore: Add documentation on running benchmarks with Microk8s #848 (andygrove)
  • chore: Improve CometExchange metrics #873 (viirya)
  • chore: Add spilling metrics of SortMergeJoin #878 (viirya)
  • chore: change shuffle mode default from jvm to auto #877 (andygrove)
  • chore: Enable shuffle by default #881 (andygrove)
  • chore: print Comet native version to logs after Comet is initialized #900 (SemyonSinchenko)
  • chore: Revise batch pull approach to more follow C Data interface semantics #893 (viirya)
  • chore: Close dictionary provider when iterator is closed #904 (andygrove)
  • chore: Remove unused function #906 (viirya)
  • chore: Upgrade to latest DataFusion revision #909 (andygrove)
  • build: fix build #917 (andygrove)
  • chore: Revise array import to more follow C Data Interface semantics #905 (viirya)
  • chore: Address reviews #920 (viirya)
  • chore: Enable Comet shuffle for Spark core-1 test #924 (viirya)
  • build: Add maven-compiler-plugin for java cross-build #911 (viirya)
  • build: Disable upload-test-reports for macos-13 runner #933 (viirya)
  • minor: cast timestamp test #468 #923 (himadripal)
  • build: Set Java version arg for scala-maven-plugin #934 (viirya)
  • chore: Remove redundant RowToColumnar from CometShuffleExchangeExec for columnar shuffle #944 (viirya)
  • minor: rename CometMetricNode add to set and update documentation #940 (andygrove)
  • chore: Add config for enabling SMJ with join condition #937 (andygrove)
  • chore: Change maven group ID to org.apache.datafusion #941 (andygrove)
  • chore: Upgrade to DataFusion 42.0.0 #945 (andygrove)
  • build: Fix regression in jar packaging #950 (andygrove)
  • chore: Show reason for falling back to Spark when SMJ with join condition is not enabled #956 (andygrove)
  • chore: clarify tarball installation #959 (comphead)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    22	Andy Grove
    18	Liang-Chi Hsieh
     3	Adam Binford
     3	Matt Butrovich
     2	Kristin Cowalcijk
     2	Oleks V
     2	Parth Chandra
     1	Himadri Pal
     1	Huaxin Gao
     1	KAZUYUKI TANIMURA
     1	Semyon
     1	Xin Hao

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.