Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces Base64 encoding/decoding support to stdlib, integrating it into the strings component and adding a dedicated test suite (per issue #916).
Changes:
- Added new
stdlib_base64module implementingbase64_encode(generic over numeric/logical types) andbase64_decode. - Re-exported
base64_encode/base64_decodeviastdlib_strings. - Added Base64 unit tests and wired them into the string test CMake configuration.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/strings/stdlib_base64.fypp |
New Base64 implementation (encode/decode). |
src/strings/stdlib_strings.fypp |
Re-exports Base64 APIs from stdlib_base64. |
src/strings/CMakeLists.txt |
Adds stdlib_base64.fypp to strings build sources. |
test/string/test_base64.f90 |
Adds unit tests for known vectors, whitespace handling, invalid input, and roundtrips. |
test/string/CMakeLists.txt |
Registers the new base64 test target. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@jvdp1 Could you please have a look? I have tackled all possible issues. Working on a performance improvement using avx2 specific instructions which was the ultimate goal of the resources shared in the issue. Upon completion, would increase speeds drastically. Since fortran is depndent on autovectorizers, it may not always lead to the optimall register management as can be controlled using C intrinsics. Will still have a architecture fallback. Architecture specific kernels can be added to increase performance in future. |
jvdp1
left a comment
There was a problem hiding this comment.
thank you @RatanKokal for this PR. Overall it looks already very good to me.
Maybe some general comments:
- I suggest to use
state_type(provided bystdlib_errorto handle errors - I think that some procedures could be declared
pure - Is there any reason to prefer
moduleinstead of usingsubmodule?
src/strings/stdlib_base64.fypp
Outdated
| use stdlib_base64_encode, only: base64_encode, base64_encode_into | ||
| use stdlib_base64_decode, only: base64_decode, base64_decode_into | ||
|
|
||
| implicit none | ||
| private | ||
|
|
||
| public :: base64_encode, base64_encode_into | ||
| public :: base64_decode, base64_decode_into |
There was a problem hiding this comment.
Why did you use this approach instead of using submodules?
There was a problem hiding this comment.
There was no specific technical reason. I originally used the separate module approach for organization. However, after looking into the benefits of submodules for reducing compilation cascades and maintaining a cleaner public interface, I have refactored the implementation to use them.
| public :: base64_decode | ||
| public :: base64_decode_into | ||
|
|
||
| integer(int8), parameter :: DT(0:255) = int( [ & |
There was a problem hiding this comment.
What are these number? Please provide some info.
There was a problem hiding this comment.
DT is an O(1) branchless lookup table that maps the 256 ASCII characters directly to their 6-bit Base64 integer values. By deliberately mapping all invalid characters to -1, we can safely decode the input without any if statements and validate the entire stream at the end using a single bitwise OR reduction, which significantly boosts loop vectorization.
I have added a short line about it in the file too.
| ], int8) | ||
|
|
||
| ! 0 for whitespace/control chars, 1 for valid chars | ||
| integer(int32), parameter :: IS_VAL(0:255) = int([ & |
There was a problem hiding this comment.
IS_VAL is a branchless stream-compaction mask used during the despacing step, mapping whitespace/control characters to 0 and valid Base64 characters to 1. This allows the pointer to advance mathematically (j = j + IS_VAL(c)) rather than conditionally, keeping the CPU pipeline full by avoiding branch mispredictions on unpredictable whitespace.
I have added a short line about it in the file too.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1171 +/- ##
==========================================
+ Coverage 68.04% 68.86% +0.82%
==========================================
Files 404 409 +5
Lines 12948 13713 +765
Branches 1395 1551 +156
==========================================
+ Hits 8810 9443 +633
- Misses 4138 4270 +132 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: RatanKokal <ratanskokal@gmail.com>
…ulator Replaced O(N) string searches with an instant Look-Up Table and implemented a 32-bit accumulator to process data in a single pass. Eliminated multi-pass string packing, removed helper function overhead (dead code elimination), and pre-allocated memory to prevent expensive reallocations. Signed-off-by: RatanKokal <ratanskokal@gmail.com>
This commit addresses reviewer feedback regarding memory safety, type portability, and documentation for the pure Fortran implementation. Key changes: - API Alignment: Updated base64_encode_into to mirror the decoder API by returning encoded_len and error_flag. - Memory Safety: Added explicit blank-filling and strict bounds checking to the encoder to prevent silent buffer overflows. - Branchless Bounds Checking: Implemented an iand(..., 255_int32) mask in the decoder to safely handle non-ASCII codepoints. - Strict Portability: Enforced int32 type consistency across all bitwise intrinsics and literal masks. - Documentation: Added FORD headers to the umbrella module. Signed-off-by: RatanKokal <ratanskokal@gmail.com>
This commit hardens the base64 implementation by addressing standard compliance issues and optimizing the hot paths for the zero-copy API.
Key changes:
* Compliance: Replaced non-conforming 'c_loc' and 'c_f_pointer' usage with 'transfer' and 'select rank' blocks to ensure compatibility with non-C-interoperable types.
* Correctness: Fixed a Unicode aliasing bug in the decoder by implementing a bitwise-OR 'unicode_accum' check to reject non-ASCII characters (>255).
* Performance: Optimized 'base64_encode_into' by moving string blank-filling ('str = ""') off the hot path, preventing unnecessary memory operations during high-throughput encoding.
* API Refinement: Restricted the 'base64_encode_into' power-user API to accept only 'integer(int8)' arrays to guarantee zero-copy performance while remaining standard-compliant.
* Ergonomics: Added an optional 'error_flag' to 'base64_decode' to improve error handling for the allocating API.
* Testing: Expanded unit tests to include coverage for the new subroutine signatures and error states.
Signed-off-by: RatanKokal <ratanskokal@gmail.com>
This commit addresses the architectural and stylistic feedback for the Base64 implementation: * Architecture: Consolidated interfaces into a single stdlib_base64 parent module and moved the high-performance logic into encode and decode submodules to prevent compilation cascades. * Standards: Replaced the logical error flag with type(state_type) from stdlib_error for consistent library-wide error handling. * Optimization: Marked the core _into subroutines as pure to guarantee no side effects and allow for aggressive compiler optimizations. * Dependency: Imported base64_alphabet directly from stdlib_ascii to reduce redundancy. * Documentation: Added explanatory comments for the branchless DT and IS_VAL lookup tables. * Formatting: Expanded semicolon-separated multi-statement lines into single lines to improve debuggability and prevent CI truncation errors. Signed-off-by: RatanKokal <ratanskokal@gmail.com>
Adds support for base64
Solves issue: #916
This PR provides a portable, branchless scalar implementation. A follow-up PR is in development to add an AVX2-accelerated backend for supported x86_64 architectures.