Conversation
Summary by CodeRabbit
Walkthrough
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
epub2speech/chapter_tts.py (2)
7-7: Prefer resample_poly for better quality and speed.scipy.signal.resample (FFT) can introduce ringing and is heavier. resample_poly generally yields better quality and performance for arbitrary ratios.
Apply this diff to switch to polyphase resampling:
-from scipy.signal import resample +from scipy.signal import resample_polyAnd replace the resampling call inside the loop:
- target_len = max(1, round(len(audio_data) * first_sample_rate / sample_rate)) - resampled_audio = resample(audio_data, target_len, axis=0) + # polyphase resampling with integer factors + from math import gcd + g = gcd(first_sample_rate, sample_rate) + up = first_sample_rate // g + down = sample_rate // g + resampled_audio = resample_poly(audio_data, up, down, axis=0)
54-56: Consistency: prefer builtin generics or typing generics uniformly.The file mixes list[...] and List[...]. Consider standardizing on builtin generics (Python 3.10+).
tests/test_chapter_tts.py (2)
76-82: Assert output was created and encoded as expected.Strengthen the test by verifying the WAV exists, is non-empty, and uses the first segment’s sample rate (24k).
Apply this diff:
output_path = output_dir / f"mock_{test_case['name'].replace(' ', '_').lower()}.wav" chapter_tts.process_chapter( text=test_case['text'], output_path=output_path, workspace_path=temp_dir, voice="mock-voice" ) + # Validate output + self.assertTrue(output_path.exists(), f"Output not created: {output_path}") + import soundfile as sf + data, sr = sf.read(output_path) + self.assertGreater(len(data), 0, "Output audio is empty") + self.assertEqual(sr, 24000, "Final output should use the first segment's sample rate")
63-67: Clean up temp artifacts in tests.Use TemporaryDirectory or remove created files on success to avoid polluting the repo.
tests/test_m4b_generation.py (2)
1-1: Remove shebang or make file executable.Shebang in a non-executable test triggers EXE001. Easiest fix: remove it.
Apply this diff:
-#!/usr/bin/env python3
21-26: Skip messaging is fine; consider gating via env var too.Optionally add an env flag (e.g., RUN_AZURE_TTS_TESTS) to skip quickly in CI without filesystem checks.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
epub2speech/chapter_tts.py(3 hunks)pyproject.toml(1 hunks)tests/test_chapter_tts.py(1 hunks)tests/test_m4b_generation.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_m4b_generation.py (3)
epub2speech/convertor.py (1)
convert_epub_to_m4b(121-139)epub2speech/tts/azure_provider.py (1)
AzureTextToSpeech(10-71)tests/utils/config.py (2)
TTSConfig(9-43)get_azure_config(29-30)
🪛 Ruff (0.13.1)
tests/test_m4b_generation.py
1-1: Shebang is present but file is not executable
(EXE001)
🔇 Additional comments (2)
epub2speech/chapter_tts.py (1)
41-48: Signature change safe: no remainingsample_rateargs
Verified no calls toChapterTTS(...)include asample_rateparameter in tests or production code.pyproject.toml (1)
36-36: No change needed: existing SciPy constraints ensure prebuilt wheels for Python 3.12/3.13.
Pip will install the latest SciPy (<2.0.0)—SciPy 1.15.x provides wheels for both Python 3.12 and 3.13.
| if audio_segments: | ||
| _, first_sample_rate = audio_segments[0] | ||
| resampled_segments = [] | ||
| for audio_data, sample_rate in audio_segments: | ||
| if sample_rate != first_sample_rate: | ||
| resampled_length = int(len(audio_data) * first_sample_rate / sample_rate) | ||
| resampled_audio = resample(audio_data, resampled_length) | ||
| resampled_segments.append(resampled_audio) | ||
| else: | ||
| resampled_segments.append(audio_data) | ||
|
|
||
| final_audio = np.concatenate(resampled_segments) | ||
| sf.write(output_path, final_audio, first_sample_rate) | ||
|
|
There was a problem hiding this comment.
Channel mismatch can break concatenation; align channels before resampling/concat.
If some segments are mono and others stereo, np.concatenate will fail or produce inconsistent results. Align to the first segment’s channel count before resampling.
Apply this diff to make channel count consistent and specify resampling axis:
- if audio_segments:
- _, first_sample_rate = audio_segments[0]
- resampled_segments = []
- for audio_data, sample_rate in audio_segments:
- if sample_rate != first_sample_rate:
- resampled_length = int(len(audio_data) * first_sample_rate / sample_rate)
- resampled_audio = resample(audio_data, resampled_length)
- resampled_segments.append(resampled_audio)
- else:
- resampled_segments.append(audio_data)
-
- final_audio = np.concatenate(resampled_segments)
- sf.write(output_path, final_audio, first_sample_rate)
+ if audio_segments:
+ first_audio, first_sample_rate = audio_segments[0]
+ first_channels = 1 if first_audio.ndim == 1 else first_audio.shape[1]
+
+ resampled_segments: list[np.ndarray] = []
+ for audio_data, sample_rate in audio_segments:
+ # Channel align to match the first segment
+ if audio_data.ndim == 1 and first_channels > 1:
+ audio_data = np.repeat(audio_data[:, None], first_channels, axis=1)
+ elif audio_data.ndim == 2:
+ ch = audio_data.shape[1]
+ if ch != first_channels:
+ if first_channels == 1:
+ audio_data = audio_data.mean(axis=1) # downmix to mono
+ else:
+ # upmix by duplicating the first channel
+ audio_data = np.tile(audio_data[:, [0]], (1, first_channels))
+
+ # Resample if needed
+ if sample_rate != first_sample_rate:
+ target_len = max(1, round(len(audio_data) * first_sample_rate / sample_rate))
+ resampled_audio = resample(audio_data, target_len, axis=0)
+ resampled_segments.append(resampled_audio.astype(np.float32, copy=False))
+ else:
+ resampled_segments.append(audio_data.astype(np.float32, copy=False))
+
+ final_audio = np.concatenate(resampled_segments, axis=0)
+ sf.write(output_path, final_audio, first_sample_rate)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if audio_segments: | |
| _, first_sample_rate = audio_segments[0] | |
| resampled_segments = [] | |
| for audio_data, sample_rate in audio_segments: | |
| if sample_rate != first_sample_rate: | |
| resampled_length = int(len(audio_data) * first_sample_rate / sample_rate) | |
| resampled_audio = resample(audio_data, resampled_length) | |
| resampled_segments.append(resampled_audio) | |
| else: | |
| resampled_segments.append(audio_data) | |
| final_audio = np.concatenate(resampled_segments) | |
| sf.write(output_path, final_audio, first_sample_rate) | |
| if audio_segments: | |
| first_audio, first_sample_rate = audio_segments[0] | |
| first_channels = 1 if first_audio.ndim == 1 else first_audio.shape[1] | |
| resampled_segments: list[np.ndarray] = [] | |
| for audio_data, sample_rate in audio_segments: | |
| # Channel align to match the first segment | |
| if audio_data.ndim == 1 and first_channels > 1: | |
| audio_data = np.repeat(audio_data[:, None], first_channels, axis=1) | |
| elif audio_data.ndim == 2: | |
| ch = audio_data.shape[1] | |
| if ch != first_channels: | |
| if first_channels == 1: | |
| audio_data = audio_data.mean(axis=1) # downmix to mono | |
| else: | |
| # upmix by duplicating the first channel | |
| audio_data = np.tile(audio_data[:, [0]], (1, first_channels)) | |
| # Resample if needed | |
| if sample_rate != first_sample_rate: | |
| target_len = max(1, round(len(audio_data) * first_sample_rate / sample_rate)) | |
| resampled_audio = resample(audio_data, target_len, axis=0) | |
| resampled_segments.append(resampled_audio.astype(np.float32, copy=False)) | |
| else: | |
| resampled_segments.append(audio_data.astype(np.float32, copy=False)) | |
| final_audio = np.concatenate(resampled_segments, axis=0) | |
| sf.write(output_path, final_audio, first_sample_rate) |
No description provided.