Skip to content

fix: use wrong sample_rate#5

Merged
Moskize91 merged 3 commits intomainfrom
sample
Sep 25, 2025
Merged

fix: use wrong sample_rate#5
Moskize91 merged 3 commits intomainfrom
sample

Conversation

@Moskize91
Copy link
Copy Markdown
Contributor

No description provided.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 25, 2025

Summary by CodeRabbit

  • New Features
    • Automatic sample-rate alignment across all generated audio segments for consistent playback without manual configuration.
  • Bug Fixes
    • Resolves mismatched audio rates between segments that could cause playback artifacts.
  • Tests
    • Simplified Chapter TTS tests to run without external services.
    • Added end-to-end M4B generation tests, including multiple voice variations, executed when credentials are available.
  • Chores
    • Added a new dependency to support audio resampling.

Walkthrough

  • epub2speech/chapter_tts.py: Removed sample_rate from ChapterTTS.init; audio processing now infers final sample rate from the first segment. Implemented resampling of subsequent segments to the first segment’s rate using scipy.signal.resample. Collected audio_segments as (array, rate) tuples and added typing hints. Final output written at the first segment’s rate.
  • pyproject.toml: Added scipy (>=1.10.0,<2.0.0) dependency.
  • tests/test_chapter_tts.py: Removed Azure-backed integration test; kept and refactored CJK sentence-splitting using a MockTTSProtocol and new temp paths.
  • tests/test_m4b_generation.py: Added new integration tests for EPUB-to-M4B conversion via Azure TTS, including voice variants and file existence/size checks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request has no description, so there is no author-provided context or summary of the changes to confirm their relevance or purpose. Please add a concise description of the pull request that outlines its objectives and summarizes the key changes, such as the removal of the sample_rate parameter and the introduction of dynamic resampling.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title follows the required <type>: <subject> format and references the sample_rate change, but it only partially summarizes the main fix since it does not clearly describe the removal of the sample_rate parameter and the new dynamic resampling logic.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Moskize91 Moskize91 merged commit dd44e9e into main Sep 25, 2025
2 of 3 checks passed
@Moskize91 Moskize91 deleted the sample branch September 25, 2025 12:01
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
epub2speech/chapter_tts.py (2)

7-7: Prefer resample_poly for better quality and speed.

scipy.signal.resample (FFT) can introduce ringing and is heavier. resample_poly generally yields better quality and performance for arbitrary ratios.

Apply this diff to switch to polyphase resampling:

-from scipy.signal import resample
+from scipy.signal import resample_poly

And replace the resampling call inside the loop:

-                        target_len = max(1, round(len(audio_data) * first_sample_rate / sample_rate))
-                        resampled_audio = resample(audio_data, target_len, axis=0)
+                        # polyphase resampling with integer factors
+                        from math import gcd
+                        g = gcd(first_sample_rate, sample_rate)
+                        up = first_sample_rate // g
+                        down = sample_rate // g
+                        resampled_audio = resample_poly(audio_data, up, down, axis=0)

54-56: Consistency: prefer builtin generics or typing generics uniformly.

The file mixes list[...] and List[...]. Consider standardizing on builtin generics (Python 3.10+).

tests/test_chapter_tts.py (2)

76-82: Assert output was created and encoded as expected.

Strengthen the test by verifying the WAV exists, is non-empty, and uses the first segment’s sample rate (24k).

Apply this diff:

             output_path = output_dir / f"mock_{test_case['name'].replace(' ', '_').lower()}.wav"
             chapter_tts.process_chapter(
                 text=test_case['text'],
                 output_path=output_path,
                 workspace_path=temp_dir,
                 voice="mock-voice"
             )
+            # Validate output
+            self.assertTrue(output_path.exists(), f"Output not created: {output_path}")
+            import soundfile as sf
+            data, sr = sf.read(output_path)
+            self.assertGreater(len(data), 0, "Output audio is empty")
+            self.assertEqual(sr, 24000, "Final output should use the first segment's sample rate")

63-67: Clean up temp artifacts in tests.

Use TemporaryDirectory or remove created files on success to avoid polluting the repo.

tests/test_m4b_generation.py (2)

1-1: Remove shebang or make file executable.

Shebang in a non-executable test triggers EXE001. Easiest fix: remove it.

Apply this diff:

-#!/usr/bin/env python3

21-26: Skip messaging is fine; consider gating via env var too.

Optionally add an env flag (e.g., RUN_AZURE_TTS_TESTS) to skip quickly in CI without filesystem checks.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2daac05 and 397a110.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • epub2speech/chapter_tts.py (3 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_chapter_tts.py (1 hunks)
  • tests/test_m4b_generation.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_m4b_generation.py (3)
epub2speech/convertor.py (1)
  • convert_epub_to_m4b (121-139)
epub2speech/tts/azure_provider.py (1)
  • AzureTextToSpeech (10-71)
tests/utils/config.py (2)
  • TTSConfig (9-43)
  • get_azure_config (29-30)
🪛 Ruff (0.13.1)
tests/test_m4b_generation.py

1-1: Shebang is present but file is not executable

(EXE001)

🔇 Additional comments (2)
epub2speech/chapter_tts.py (1)

41-48: Signature change safe: no remaining sample_rate args
Verified no calls to ChapterTTS(...) include a sample_rate parameter in tests or production code.

pyproject.toml (1)

36-36: No change needed: existing SciPy constraints ensure prebuilt wheels for Python 3.12/3.13.
Pip will install the latest SciPy (<2.0.0)—SciPy 1.15.x provides wheels for both Python 3.12 and 3.13.

Comment on lines +78 to 91
if audio_segments:
_, first_sample_rate = audio_segments[0]
resampled_segments = []
for audio_data, sample_rate in audio_segments:
if sample_rate != first_sample_rate:
resampled_length = int(len(audio_data) * first_sample_rate / sample_rate)
resampled_audio = resample(audio_data, resampled_length)
resampled_segments.append(resampled_audio)
else:
resampled_segments.append(audio_data)

final_audio = np.concatenate(resampled_segments)
sf.write(output_path, final_audio, first_sample_rate)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Channel mismatch can break concatenation; align channels before resampling/concat.

If some segments are mono and others stereo, np.concatenate will fail or produce inconsistent results. Align to the first segment’s channel count before resampling.

Apply this diff to make channel count consistent and specify resampling axis:

-            if audio_segments:
-                _, first_sample_rate = audio_segments[0]
-                resampled_segments = []
-                for audio_data, sample_rate in audio_segments:
-                    if sample_rate != first_sample_rate:
-                        resampled_length = int(len(audio_data) * first_sample_rate / sample_rate)
-                        resampled_audio = resample(audio_data, resampled_length)
-                        resampled_segments.append(resampled_audio)
-                    else:
-                        resampled_segments.append(audio_data)
-
-                final_audio = np.concatenate(resampled_segments)
-                sf.write(output_path, final_audio, first_sample_rate)
+            if audio_segments:
+                first_audio, first_sample_rate = audio_segments[0]
+                first_channels = 1 if first_audio.ndim == 1 else first_audio.shape[1]
+
+                resampled_segments: list[np.ndarray] = []
+                for audio_data, sample_rate in audio_segments:
+                    # Channel align to match the first segment
+                    if audio_data.ndim == 1 and first_channels > 1:
+                        audio_data = np.repeat(audio_data[:, None], first_channels, axis=1)
+                    elif audio_data.ndim == 2:
+                        ch = audio_data.shape[1]
+                        if ch != first_channels:
+                            if first_channels == 1:
+                                audio_data = audio_data.mean(axis=1)  # downmix to mono
+                            else:
+                                # upmix by duplicating the first channel
+                                audio_data = np.tile(audio_data[:, [0]], (1, first_channels))
+
+                    # Resample if needed
+                    if sample_rate != first_sample_rate:
+                        target_len = max(1, round(len(audio_data) * first_sample_rate / sample_rate))
+                        resampled_audio = resample(audio_data, target_len, axis=0)
+                        resampled_segments.append(resampled_audio.astype(np.float32, copy=False))
+                    else:
+                        resampled_segments.append(audio_data.astype(np.float32, copy=False))
+
+                final_audio = np.concatenate(resampled_segments, axis=0)
+                sf.write(output_path, final_audio, first_sample_rate)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if audio_segments:
_, first_sample_rate = audio_segments[0]
resampled_segments = []
for audio_data, sample_rate in audio_segments:
if sample_rate != first_sample_rate:
resampled_length = int(len(audio_data) * first_sample_rate / sample_rate)
resampled_audio = resample(audio_data, resampled_length)
resampled_segments.append(resampled_audio)
else:
resampled_segments.append(audio_data)
final_audio = np.concatenate(resampled_segments)
sf.write(output_path, final_audio, first_sample_rate)
if audio_segments:
first_audio, first_sample_rate = audio_segments[0]
first_channels = 1 if first_audio.ndim == 1 else first_audio.shape[1]
resampled_segments: list[np.ndarray] = []
for audio_data, sample_rate in audio_segments:
# Channel align to match the first segment
if audio_data.ndim == 1 and first_channels > 1:
audio_data = np.repeat(audio_data[:, None], first_channels, axis=1)
elif audio_data.ndim == 2:
ch = audio_data.shape[1]
if ch != first_channels:
if first_channels == 1:
audio_data = audio_data.mean(axis=1) # downmix to mono
else:
# upmix by duplicating the first channel
audio_data = np.tile(audio_data[:, [0]], (1, first_channels))
# Resample if needed
if sample_rate != first_sample_rate:
target_len = max(1, round(len(audio_data) * first_sample_rate / sample_rate))
resampled_audio = resample(audio_data, target_len, axis=0)
resampled_segments.append(resampled_audio.astype(np.float32, copy=False))
else:
resampled_segments.append(audio_data.astype(np.float32, copy=False))
final_audio = np.concatenate(resampled_segments, axis=0)
sf.write(output_path, final_audio, first_sample_rate)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant