Skip to content

feat: support HDFS object storage#5472

Open
hfutatzhanghb wants to merge 10 commits into
lance-format:mainfrom
hfutatzhanghb:dev-hdfs-support
Open

feat: support HDFS object storage#5472
hfutatzhanghb wants to merge 10 commits into
lance-format:mainfrom
hfutatzhanghb:dev-hdfs-support

Conversation

@hfutatzhanghb

@hfutatzhanghb hfutatzhanghb commented Dec 15, 2025

Copy link
Copy Markdown
Contributor

What changed

  • add an optional hdfs feature for lance-io backed by OpenDAL
  • register hdfs:// with the object store registry
  • map HDFS URLs, storage options, and supported environment variables into OpenDAL configuration
  • add unit coverage for HDFS path and configuration handling

Why

This allows Lance datasets to be accessed through HDFS URLs, including deployments using HDFS nameservices.

Validation

  • cargo fmt --all -- --check
  • cargo clippy -p lance-io --all-targets --features hdfs --locked -- -D warnings
  • cargo clippy --all --tests --benches -- -D warnings
  • cargo test -p lance-io --features hdfs --locked object_store::providers::hdfs::tests --no-fail-fast
  • cargo +1.91.0 check -p lance-io --features hdfs --locked

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@github-actions github-actions Bot added enhancement New feature or request A-java Java bindings + JNI labels Dec 15, 2025
@hfutatzhanghb

Copy link
Copy Markdown
Contributor Author

Hi, @jackye1995 @wojiaodoubao @majin1102 . Could you please help review this PR when you have free time? Thanks a lot.

Comment thread rust/lance-io/Cargo.toml Outdated

[features]
default = ["aws", "azure", "gcp"]
default = ["aws", "azure", "gcp", "hdfs"]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't enable HDFS by default. It introduces many new dependencies in Lance and also requires users to have a Java setup. Without it, Lance will fail to start.

}
} else {
// Fall back to system username
config_map.insert("user".to_string(), whoami::username());

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer having users fill this out instead of using whoami.

Comment thread java/pom.xml Outdated
<copyTo>${project.build.directory}/classes/nativelib</copyTo>
<copyWithPlatformDir>true</copyWithPlatformDir>
<environmentVariables>
<CARGO_FEATURES>hdfs</CARGO_FEATURES>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the same. It doesn’t seem like a good idea to enable hdfs by default.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xuanwo Thanks for reviewing. Got it, will push an new version based on your advice laterly.

@github-actions

Copy link
Copy Markdown
Contributor

Thank you for your contribution. This PR has been inactive for a while, so we're closing it to free up bandwidth. Feel free to reopen it if you still find it useful.

@github-actions github-actions Bot closed this May 16, 2026
@BubbleCal BubbleCal reopened this May 28, 2026
@github-actions github-actions Bot removed the Stale label May 29, 2026
@github-actions github-actions Bot added the A-encoding Encoding, IO, file reader/writer label Jun 9, 2026
@hfutatzhanghb hfutatzhanghb changed the title feat: lance supports hdfs scheme feat: support HDFS object storage Jun 9, 2026
@github-actions github-actions Bot added A-deps Dependency updates A-ci CI / build workflows labels Jun 9, 2026
@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@hfutatzhanghb

Copy link
Copy Markdown
Contributor Author

@Xuanwo @BubbleCal @jiaoew1991 Hi, could you please review this PR when free? Thanks very much. We have use HDFS as backend storage for a long time.

zhanghaobo@kanzhun.com and others added 7 commits June 10, 2026 13:55
The hdfs feature requires Hadoop native libraries (libhdfs.so) which are not
available on CI runners. Exclude it from ALL_FEATURES computation, following
the same pattern as protoc and slow_tests.

Also update Cargo.lock to include hdfs-related dependencies (hdrs, hdfs-sys,
java-locator, opendal-service-hdfs) so --locked builds don't fail when hdfs
is accidentally enabled.
@github-actions github-actions Bot added the A-docs Documentation label Jun 10, 2026
@github-actions github-actions Bot added the A-python Python bindings label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ci CI / build workflows A-deps Dependency updates A-docs Documentation A-encoding Encoding, IO, file reader/writer A-java Java bindings + JNI A-python Python bindings enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants