Skip to content

Latest commit

 

History

History
108 lines (90 loc) · 8.14 KB

File metadata and controls

108 lines (90 loc) · 8.14 KB

DataFusion Comet 0.4.0 Changelog

This release consists of 51 commits from 10 contributors. See credits at the end of this changelog for more information.

Fixed bugs:

  • fix: Use the number of rows from underlying arrays instead of logical row count from RecordBatch #972 (viirya)
  • fix: The spilled_bytes metric of CometSortExec should be size instead of time #984 (Kontinuation)
  • fix: Properly handle Java exceptions without error messages; fix loading of comet native library from java.library.path #982 (Kontinuation)
  • fix: Fallback to Spark if scan has meta columns #997 (viirya)
  • fix: Fallback to Spark if named_struct contains duplicate field names #1016 (viirya)
  • fix: Make comet-git-info.properties optional #1027 (andygrove)
  • fix: TopK operator should return correct results on dictionary column with nulls #1033 (viirya)
  • fix: need default value for getSizeAsMb(EXECUTOR_MEMORY.key) #1046 (neyama)

Performance related:

  • perf: Remove one redundant CopyExec for SMJ #962 (andygrove)
  • perf: Add experimental feature to replace SortMergeJoin with ShuffledHashJoin #1007 (andygrove)
  • perf: Cache jstrings during metrics collection #1029 (mbutrovich)

Implemented enhancements:

  • feat: Support GetArrayStructFields expression #993 (Kimahriman)
  • feat: Implement bloom_filter_agg #987 (mbutrovich)
  • feat: Support more types with BloomFilterAgg #1039 (mbutrovich)
  • feat: Implement CAST from struct to string #1066 (andygrove)
  • feat: Use official DataFusion 43 release #1070 (andygrove)
  • feat: Implement CAST between struct types #1074 (andygrove)
  • feat: support array_append #1072 (NoeB)
  • feat: Require offHeap memory to be enabled (always use unified memory) #1062 (andygrove)

Documentation updates:

  • doc: add documentation interlinks #975 (comphead)
  • docs: Add IntelliJ documentation for generated source code #985 (mbutrovich)
  • docs: Update tuning guide #995 (andygrove)
  • docs: Various documentation improvements #1005 (andygrove)
  • docs: clarify that Maven central only has jars for Linux #1009 (andygrove)
  • doc: fix K8s links and doc #1058 (comphead)
  • docs: Update benchmarking.md #1085 (rluvaton-flarion)

Other:

  • chore: Generate changelog for 0.3.0 release #964 (andygrove)
  • chore: fix publish-to-maven script #966 (andygrove)
  • chore: Update benchmarks results based on 0.3.0-rc1 #969 (andygrove)
  • chore: update rem expression guide #976 (kazuyukitanimura)
  • chore: Enable additional CreateArray tests #928 (Kimahriman)
  • chore: fix compatibility guide #978 (kazuyukitanimura)
  • chore: Update for 0.3.0 release, prepare for 0.4.0 development #970 (andygrove)
  • chore: Don't transform the HashAggregate to CometHashAggregate if Comet shuffle is disabled #991 (viirya)
  • chore: Make parquet reader options Comet options instead of Hadoop options #968 (parthchandra)
  • chore: remove legacy comet-spark-shell #1013 (andygrove)
  • chore: Reserve memory for native shuffle writer per partition #988 (viirya)
  • chore: Bump arrow-rs to 53.1.0 and datafusion #1001 (kazuyukitanimura)
  • chore: Revert "chore: Reserve memory for native shuffle writer per partition (#988)" #1020 (viirya)
  • minor: Remove hard-coded version number from Dockerfile #1025 (andygrove)
  • chore: Reserve memory for native shuffle writer per partition #1022 (viirya)
  • chore: Improve error handling when native lib fails to load #1000 (andygrove)
  • chore: Use twox-hash 2.0 xxhash64 oneshot api instead of custom implementation #1041 (NoeB)
  • chore: Refactor Arrow Array and Schema allocation in ColumnReader and MetadataColumnReader #1047 (viirya)
  • minor: Refactor binary expr serde to reduce code duplication #1053 (andygrove)
  • chore: Upgrade to DataFusion 43.0.0-rc1 #1057 (andygrove)
  • chore: Refactor UnaryExpr and MathExpr in protobuf #1056 (andygrove)
  • minor: use defaults instead of hard-coding values #1060 (andygrove)
  • minor: refactor UnaryExpr handling to make code more concise #1065 (andygrove)
  • chore: Refactor binary and math expression serde code #1069 (andygrove)
  • chore: Simplify CometShuffleMemoryAllocator to use Spark unified memory allocator #1063 (viirya)
  • test: Restore one test in CometExecSuite by adding COMET_SHUFFLE_MODE config #1087 (viirya)

Credits

Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.

    19	Andy Grove
    13	Matt Butrovich
     8	Liang-Chi Hsieh
     3	KAZUYUKI TANIMURA
     2	Adam Binford
     2	Kristin Cowalcijk
     1	NoeB
     1	Oleks V
     1	Parth Chandra
     1	neyama

Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.