You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
E4S Data & Visualization Libraries and Tools Guide
permalink
/guide-dav/
classes
wide
sidebar
nav
botmenu
{% include e4s-page-actions.html %}
{% include e4s-bot-button.html %}
Introduction
Selecting the right data and visualization libraries or tools is an important step in building efficient, scalable, and maintainable workflows for scientific computing and high-performance data analysis. The E4S 25.06 release includes a rich collection of data management, I/O, and visualization libraries designed for parallel performance, portability, and ecosystem integration.
A newcomer should think about attributes related to their data structures, access patterns, parallelism model, visualization scale, supported hardware, and compatibility with programming environments. Understanding these attributes helps identify tools that meet the technical and scientific needs of a given project—whether the goal is to efficiently store and move simulation output, visualize multi-terabyte datasets, or integrate AI-assisted analytics into scientific workflows.
When defining selection attributes, it helps to consider both broadly meaningful characteristics (like portability or API support) and situation-specific attributes (like in-situ visualization, streaming data, or exascale readiness).
These attributes can then be described in a structured form that allows a chatbot or decision assistant to guide the selection of appropriate E4S products such as ADIOS2, HDF5, ParaView, Ascent, VisIt, VTK-m, or Catalyst2.
Example Prompt
I need a parallel data I/O library that can handle structured mesh output from a multi-GPU simulation on Frontier. The workflow runs in MPI with C++ and uses Kokkos for parallelism. The output should be compatible with ParaView Catalyst for in-situ visualization. I prefer an open-source library with strong E4S support and minimal code changes for integration.
Attributes for Data Libraries and Tools
Broadly Meaningful Attributes
Attribute
Description
Parallel I/O support
Whether the library supports scalable I/O for distributed applications using MPI or similar models
Data model
Type of data supported (structured, unstructured, tabular, hierarchical, key-value, etc.)
Supported APIs
Available language bindings such as C, C++, Fortran, Python
Portability
Extent to which the library is portable across CPU and GPU systems and HPC platforms
Metadata management
Ability to manage and query metadata efficiently at scale
Compression
Support for data compression and decompression, including lossy or lossless modes
Checkpoint/restart capabilities
Whether the library supports state checkpointing and restart for resilience
File format compatibility
Interoperability with common file formats like HDF5, NetCDF, or BP5
Data streaming
Ability to handle real-time or asynchronous data streams
E4S integration
Availability as part of E4S releases with verified Spack package support
License
Software license type and compatibility with project policies
Community support
Availability of documentation, tutorials, and active maintenance
Performance tuning
Options for optimizing data layout, buffering, and I/O scheduling
Situation-Specific Attributes for Data Libraries and Tools
For Simulation Output
Attribute
Description
In-situ I/O
Direct output to visualization or analysis frameworks during simulation runtime
Temporal data management
Support for time series and multi-timestep data
Large-file scalability
Ability to handle multi-terabyte output efficiently
Domain decomposition mapping
Handling of partitioned data in structured or unstructured domains
For Machine Learning Data Pipelines
Attribute
Description
Tensor data support
Ability to read/write multi-dimensional tensor data efficiently
Streaming ingestion
Support for high-frequency or batched input data
Integration with AI frameworks
Compatibility with TensorFlow, PyTorch, or ONNX data formats
For Exascale and Heterogeneous Systems
Attribute
Description
GPU-direct I/O
Capability to move data between GPU memory and storage without CPU involvement
Burst buffer support
Awareness of fast intermediate storage tiers on supercomputers
Resilience features
Fault-tolerant I/O and restart mechanisms at extreme scale
Attributes for Visualization Libraries and Tools
Broadly Meaningful Attributes
Attribute
Description
Rendering model
Type of rendering (rasterization, ray tracing, volume rendering) supported
Parallel rendering
Ability to distribute rendering across multiple nodes or GPUs