feat: support HDFS object storage#5472
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Hi, @jackye1995 @wojiaodoubao @majin1102 . Could you please help review this PR when you have free time? Thanks a lot. |
|
|
||
| [features] | ||
| default = ["aws", "azure", "gcp"] | ||
| default = ["aws", "azure", "gcp", "hdfs"] |
There was a problem hiding this comment.
I think we shouldn't enable HDFS by default. It introduces many new dependencies in Lance and also requires users to have a Java setup. Without it, Lance will fail to start.
| } | ||
| } else { | ||
| // Fall back to system username | ||
| config_map.insert("user".to_string(), whoami::username()); |
There was a problem hiding this comment.
I prefer having users fill this out instead of using whoami.
| <copyTo>${project.build.directory}/classes/nativelib</copyTo> | ||
| <copyWithPlatformDir>true</copyWithPlatformDir> | ||
| <environmentVariables> | ||
| <CARGO_FEATURES>hdfs</CARGO_FEATURES> |
There was a problem hiding this comment.
I feel the same. It doesn’t seem like a good idea to enable hdfs by default.
There was a problem hiding this comment.
@Xuanwo Thanks for reviewing. Got it, will push an new version based on your advice laterly.
84a92c5 to
c6087b6
Compare
|
Thank you for your contribution. This PR has been inactive for a while, so we're closing it to free up bandwidth. Feel free to reopen it if you still find it useful. |
c6087b6 to
467c486
Compare
467c486 to
e646087
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
@Xuanwo @BubbleCal @jiaoew1991 Hi, could you please review this PR when free? Thanks very much. We have use HDFS as backend storage for a long time. |
07a46bc to
a48caed
Compare
The hdfs feature requires Hadoop native libraries (libhdfs.so) which are not available on CI runners. Exclude it from ALL_FEATURES computation, following the same pattern as protoc and slow_tests. Also update Cargo.lock to include hdfs-related dependencies (hdrs, hdfs-sys, java-locator, opendal-service-hdfs) so --locked builds don't fail when hdfs is accidentally enabled.
What changed
hdfsfeature forlance-iobacked by OpenDALhdfs://with the object store registryWhy
This allows Lance datasets to be accessed through HDFS URLs, including deployments using HDFS nameservices.
Validation
cargo fmt --all -- --checkcargo clippy -p lance-io --all-targets --features hdfs --locked -- -D warningscargo clippy --all --tests --benches -- -D warningscargo test -p lance-io --features hdfs --locked object_store::providers::hdfs::tests --no-fail-fastcargo +1.91.0 check -p lance-io --features hdfs --locked