Skip to content

feat: add base64 support#1171

Open
RatanKokal wants to merge 5 commits intofortran-lang:masterfrom
RatanKokal:wip_base64
Open

feat: add base64 support#1171
RatanKokal wants to merge 5 commits intofortran-lang:masterfrom
RatanKokal:wip_base64

Conversation

@RatanKokal
Copy link
Copy Markdown

@RatanKokal RatanKokal commented Mar 31, 2026

Adds support for base64

Solves issue: #916

This PR provides a portable, branchless scalar implementation. A follow-up PR is in development to add an AVX2-accelerated backend for supported x86_64 architectures.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces Base64 encoding/decoding support to stdlib, integrating it into the strings component and adding a dedicated test suite (per issue #916).

Changes:

  • Added new stdlib_base64 module implementing base64_encode (generic over numeric/logical types) and base64_decode.
  • Re-exported base64_encode/base64_decode via stdlib_strings.
  • Added Base64 unit tests and wired them into the string test CMake configuration.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/strings/stdlib_base64.fypp New Base64 implementation (encode/decode).
src/strings/stdlib_strings.fypp Re-exports Base64 APIs from stdlib_base64.
src/strings/CMakeLists.txt Adds stdlib_base64.fypp to strings build sources.
test/string/test_base64.f90 Adds unit tests for known vectors, whitespace handling, invalid input, and roundtrips.
test/string/CMakeLists.txt Registers the new base64 test target.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@RatanKokal
Copy link
Copy Markdown
Author

@jvdp1 Could you please have a look? I have tackled all possible issues.

Working on a performance improvement using avx2 specific instructions which was the ultimate goal of the resources shared in the issue. Upon completion, would increase speeds drastically.

Since fortran is depndent on autovectorizers, it may not always lead to the optimall register management as can be controlled using C intrinsics.

Will still have a architecture fallback. Architecture specific kernels can be added to increase performance in future.

Copy link
Copy Markdown
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @RatanKokal for this PR. Overall it looks already very good to me.

Maybe some general comments:

  • I suggest to use state_type (provided by stdlib_error to handle errors
  • I think that some procedures could be declared pure
  • Is there any reason to prefer module instead of using submodule?

Comment on lines +13 to +20
use stdlib_base64_encode, only: base64_encode, base64_encode_into
use stdlib_base64_decode, only: base64_decode, base64_decode_into

implicit none
private

public :: base64_encode, base64_encode_into
public :: base64_decode, base64_decode_into
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you use this approach instead of using submodules?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no specific technical reason. I originally used the separate module approach for organization. However, after looking into the benefits of submodules for reducing compilation cascades and maintaining a cleaner public interface, I have refactored the implementation to use them.

public :: base64_decode
public :: base64_decode_into

integer(int8), parameter :: DT(0:255) = int( [ &
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these number? Please provide some info.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DT is an O(1) branchless lookup table that maps the 256 ASCII characters directly to their 6-bit Base64 integer values. By deliberately mapping all invalid characters to -1, we can safely decode the input without any if statements and validate the entire stream at the end using a single bitwise OR reduction, which significantly boosts loop vectorization.

I have added a short line about it in the file too.

], int8)

! 0 for whitespace/control chars, 1 for valid chars
integer(int32), parameter :: IS_VAL(0:255) = int([ &
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these values?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IS_VAL is a branchless stream-compaction mask used during the despacing step, mapping whitespace/control characters to 0 and valid Base64 characters to 1. This allows the pointer to advance mathematically (j = j + IS_VAL(c)) rather than conditionally, keeping the CPU pipeline full by avoiding branch mispredictions on unpredictable whitespace.

I have added a short line about it in the file too.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 5, 2026

Codecov Report

❌ Patch coverage is 97.87234% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.86%. Comparing base (dae9d4b) to head (2e7fad9).
⚠️ Report is 52 commits behind head on master.

Files with missing lines Patch % Lines
test/string/test_base64.f90 97.87% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1171      +/-   ##
==========================================
+ Coverage   68.04%   68.86%   +0.82%     
==========================================
  Files         404      409       +5     
  Lines       12948    13713     +765     
  Branches     1395     1551     +156     
==========================================
+ Hits         8810     9443     +633     
- Misses       4138     4270     +132     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: RatanKokal <ratanskokal@gmail.com>
…ulator

Replaced O(N) string searches with an instant Look-Up Table and implemented a 32-bit accumulator to process data in a single pass. Eliminated multi-pass string packing, removed helper function overhead (dead code elimination), and pre-allocated memory to prevent expensive reallocations.

Signed-off-by: RatanKokal <ratanskokal@gmail.com>
This commit addresses reviewer feedback regarding memory safety, type portability, and documentation for the pure Fortran implementation.

Key changes:
- API Alignment: Updated base64_encode_into to mirror the decoder API by returning encoded_len and error_flag.
- Memory Safety: Added explicit blank-filling and strict bounds checking to the encoder to prevent silent buffer overflows.
- Branchless Bounds Checking: Implemented an iand(..., 255_int32) mask in the decoder to safely handle non-ASCII codepoints.
- Strict Portability: Enforced int32 type consistency across all bitwise intrinsics and literal masks.
- Documentation: Added FORD headers to the umbrella module.

Signed-off-by: RatanKokal <ratanskokal@gmail.com>
This commit hardens the base64 implementation by addressing standard compliance issues and optimizing the hot paths for the zero-copy API.

Key changes:
* Compliance: Replaced non-conforming 'c_loc' and 'c_f_pointer' usage with 'transfer' and 'select rank' blocks to ensure compatibility with non-C-interoperable types.
* Correctness: Fixed a Unicode aliasing bug in the decoder by implementing a bitwise-OR 'unicode_accum' check to reject non-ASCII characters (>255).
* Performance: Optimized 'base64_encode_into' by moving string blank-filling ('str = ""') off the hot path, preventing unnecessary memory operations during high-throughput encoding.
* API Refinement: Restricted the 'base64_encode_into' power-user API to accept only 'integer(int8)' arrays to guarantee zero-copy performance while remaining standard-compliant.
* Ergonomics: Added an optional 'error_flag' to 'base64_decode' to improve error handling for the allocating API.
* Testing: Expanded unit tests to include coverage for the new subroutine signatures and error states.

Signed-off-by: RatanKokal <ratanskokal@gmail.com>
This commit addresses the architectural and stylistic feedback for the Base64 implementation:

* Architecture: Consolidated interfaces into a single stdlib_base64 parent module and moved the high-performance logic into encode and decode submodules to prevent compilation cascades.
* Standards: Replaced the logical error flag with type(state_type) from stdlib_error for consistent library-wide error handling.
* Optimization: Marked the core _into subroutines as pure to guarantee no side effects and allow for aggressive compiler optimizations.
* Dependency: Imported base64_alphabet directly from stdlib_ascii to reduce redundancy.
* Documentation: Added explanatory comments for the branchless DT and IS_VAL lookup tables.
* Formatting: Expanded semicolon-separated multi-statement lines into single lines to improve debuggability and prevent CI truncation errors.

Signed-off-by: RatanKokal <ratanskokal@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants