Two non-blocking issues with the above template that seem to be addressable in other parts of the stack.
Parquet Files Sample 0: 0%
0/1 [00:02<?, ?it/s]
2025-10-20 09:21:06,687 INFO streaming_executor.py:112 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2025-10-20_08-27-54_702051_216/logs/ray-data
2025-10-20 09:21:06,688 INFO streaming_executor.py:113 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadParquet] -> LimitOperator[limit=5] -> ActorPoolMapOperator[MapBatches(SDTransformer)] -> ActorPoolMapOperator[MapBatches(SDLatentEncoder)] -> TaskPoolMapOperator[Write]
(_MapWorker pid=826, ip=10.0.192.52) The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]id=826, ip=10.0.192.52)
(_MapWorker pid=826, ip=10.0.192.52) Initialized SDLatentEncoder.
(_MapWorker pid=826, ip=10.0.192.52) Device: cpu
(_MapWorker pid=826, ip=10.0.192.52) Resolution: 512
Running: 1/24.0 CPU, 0/3.0 GPU, 44.2MB/17.8GB object_store_memory: 0%
0/1 [01:14<?, ?it/s]
✓ Successfully uploaded checkpoints to abfss://...
But then ... TypeError: object NoneType can't be used in 'await' expression
Two non-blocking issues with the above template that seem to be addressable in other parts of the stack.