[Feature]  TrinoSplitManager calls dropStats() before scan planning, preventing manifest-level file pruning

### Search before asking

- [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar.


### Motivation

### `metadata.stats-mode` rendered useless for Trino

Users who configure `metadata.stats-mode = 'full'` or `metadata.stats-mode = 'truncate(16)'` receive no benefit in Trino. Statistics are written to manifests correctly but discarded before they can be used for pruning.



### Solution

## Summary

`TrinoSplitManager` calls `.dropStats()` before `newScan().plan()`, which strips manifest-level column statistics from `DataFileMeta` entries before predicate evaluation can use them. As a result, Trino cannot use `metadata.stats-mode` min/max statistics for file-level skipping — all files become splits regardless of predicate filters.

`.dropStats()` was added deliberately in commit `5144ad9` when upgrading from Paimon 0.8.0 to 1.0-SNAPSHOT, likely to reduce split serialisation overhead or fix a serialisation error. The fix is not to remove it, but to move it to **after** predicate evaluation — so statistics are used for pruning during `plan()` and then stripped before splits are sent to workers.

---

## Root Cause

**File:** `src/main/java/org/apache/paimon/trino/TrinoSplitManager.java`, line 86

```java
// Current — dropStats() called BEFORE scan planning; stats unavailable for predicate evaluation
List<Split> splits = readBuilder.dropStats().newScan().plan().splits();
```

`.dropStats()` flags the scan to zero out `SimpleStats` (min/max/null-count per column per file) on each `DataFileMeta` entry. Because it is called before `newScan().plan()`, the predicate filter wired via `readBuilder.withFilter()` has no statistics to evaluate against during `plan()`. Every file passes the statistics check (vacuously, since stats are empty) and becomes a split.

---

## History

`.dropStats()` was introduced in Paimon core in `#4506` (November 2024) as an explicit optimisation to reduce the size of split objects sent to workers — splits carry `DataFileMeta` entries, which include column statistics that are not needed by workers after planning. Spark added `dropStats()` in `#5093` in a secondary path. When `paimon-trino` upgraded to 1.0-SNAPSHOT in `5144ad9`, `.dropStats()` was added to `TrinoSplitManager` — the commit message ("Update Paimon core to 1.0-SNAPSHOT / fix") gives no further explanation, but the intent is the same serialisation optimisation.

The problem is placement: `.dropStats()` must be called **after** `plan()` completes, not before. Calling it before `plan()` eliminates the serialisation overhead but also eliminates all statistics-based file pruning.

---

## What `dropStats()` does

`ReadBuilder.dropStats()` sets a flag that causes `AbstractFileStoreScan` to call `DataFileMeta.copyWithoutStats()` on each entry — replacing column statistics with `EMPTY_STATS` before returning results. The flag is evaluated during `plan()`. If set before `plan()`, stats are zeroed before predicate evaluation. If set after `plan()` (on the returned splits), stats are zeroed only for serialisation, after pruning has already occurred.


### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] TrinoSplitManager calls dropStats() before scan planning, preventing manifest-level file pruning #8257

Search before asking

Motivation

`metadata.stats-mode` rendered useless for Trino

Solution

Summary

Root Cause

History

What `dropStats()` does

Anything else?

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature] TrinoSplitManager calls dropStats() before scan planning, preventing manifest-level file pruning #8257

Description

Search before asking

Motivation

metadata.stats-mode rendered useless for Trino

Solution

Summary

Root Cause

History

What dropStats() does

Anything else?

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`metadata.stats-mode` rendered useless for Trino

What `dropStats()` does