Skip to content

Hadoop can't create S3 client with latest AWS SDK #7459

Description

@patchwork01

Description / Background

Dependabot has failed to upgrade to the latest AWS SDK because of a failure in Hadoop:

It can't create an S3Client because it's trying to directly create the AWS ApacheHttpClient, and that class no longer exists in the same package in the latest version.

Steps to reproduce

  1. Upgrade to latest AWS SDK
  2. Run tests
  3. See error

Expected behaviour

The system should still be able to interact with S3 via Hadoop.

Technical Notes / Implementation Details

One option may be to give Hadoop an alternative factory for the S3 client. We can look into the documentation and the source code for the classes listed in the stack trace.

Looking at the source code it seems like it reads the Hadoop property "fs.s3a.s3.client.factory.impl" to create the client factory. It needs to be a class with a public constructor taking no arguments, implementing the interface org.apache.hadoop.fs.s3a.S3ClientFactory. We should be able to implement that and set the property.

We'd better explain why this is necessary in Javadoc in the implementation.

We'll need to set this in LocalStackHadoopConfigurationProvider, WiremockHadoopConfigurationProvider, and in HadoopConfigurationProvider in the Parquet module and the Trino module.

Screenshots/Logs

Stack trace from a test:

java.lang.NoClassDefFoundError: software/amazon/awssdk/http/apache/ApacheHttpClient
  at org.apache.hadoop.fs.s3a.impl.AWSClientConfig.createHttpClientBuilder(AWSClientConfig.java:147)
  at org.apache.hadoop.fs.s3a.DefaultS3ClientFactory.createS3Client(DefaultS3ClientFactory.java:129)
  at org.apache.hadoop.fs.s3a.impl.ClientManagerImpl.lambda$createS3Client$0(ClientManagerImpl.java:118)
  at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
  at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
  at org.apache.hadoop.util.functional.LazyAtomicReference.eval(LazyAtomicReference.java:94)
  at org.apache.hadoop.util.functional.LazyAutoCloseableReference.eval(LazyAutoCloseableReference.java:54)
  at org.apache.hadoop.fs.s3a.impl.ClientManagerImpl.getOrCreateS3Client(ClientManagerImpl.java:148)
  at org.apache.hadoop.fs.s3a.impl.S3AStoreImpl.getOrCreateS3Client(S3AStoreImpl.java:232)
  at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:796)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3615)
  at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:172)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3716)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3667)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:557)
  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:366)
  at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:415)
  at sleeper.compaction.job.execution.JavaCompactionRunner.createInputIterators(JavaCompactionRunner.java:140)
  at sleeper.compaction.job.execution.JavaCompactionRunner.compact(JavaCompactionRunner.java:78)
  at sleeper.compaction.core.task.CompactionTask.compact(CompactionTask.java:245)
  at sleeper.compaction.core.task.CompactionTask.processCompactionMessage(CompactionTask.java:195)
  at sleeper.compaction.core.task.CompactionTask.handleMessages(CompactionTask.java:157)
  at sleeper.compaction.core.task.CompactionTask.run(CompactionTask.java:133)
  at sleeper.compaction.core.task.CompactionTaskTestHelper.runTask(CompactionTaskTestHelper.java:96)
  at sleeper.compaction.core.task.CompactionTaskTestHelper.runTask(CompactionTaskTestHelper.java:86)
  at sleeper.compaction.job.execution.testutils.CompactionRunnerTestBase.runTask(CompactionRunnerTestBase.java:105)
  at sleeper.compaction.job.execution.testutils.CompactionRunnerTestBase.runTask(CompactionRunnerTestBase.java:98)
  at sleeper.compaction.job.execution.JavaCompactionRunnerLocalStackIT.shouldRunCompactionJob(JavaCompactionRunnerLocalStackIT.java:96)
  at java.base/java.lang.reflect.Method.invoke(Method.java:569)
  at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
  at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
Caused by: java.lang.ClassNotFoundException: software.amazon.awssdk.http.apache.ApacheHttpClient
  at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  ... 31 more

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingversion-upgradesIssues to upgrade dependencies

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions