Feature Proposal: Real Time Streaming MIDI Output Support
Summary
This feature request proposes the addition of real-time streaming MIDI output to basic-pitch, allowing the system to process and output MIDI data concurrently with audio input. This would significantly expand the usability of basic-pitch in live performance, educational, and DAW integration contexts.
Motivation
Currently, basic-pitch operates as a batch audio-to-MIDI converter, requiring the full audio file to be processed before producing MIDI output. While effective for offline applications, this architecture limits the tool’s applicability in live scenarios. Real-time audio-to-MIDI conversion has growing demand in:
- Live instrument-to-MIDI conversion for digital audio workstations (DAWs)
- Music education platforms requiring instant feedback
- Interactive composition and improvisation tools
- Low-latency MIDI controllers for experimental performance setups
Several commercial and research-grade tools provide real-time capabilities (e.g., JamOrigin MIDI Guitar, AIO MIDINet, and various ONNX-based pipelines), but few offer open-source solutions with the transcription accuracy that basic-pitch provides.
Proposed Implementation
A modular, low-latency real-time streaming pipeline could be introduced as an extension of the existing model. Suggested steps include:
Input Handling
- Use
pyaudio, sounddevice, or other low-latency libraries to stream audio input directly from a microphone or system source.
- Implement windowed audio buffering with overlap to allow continuous model inference.
Inference Adaptation
- Adapt the inference loop to process fixed-size frames (e.g., 2048 or 4096 samples) in real time.
- Introduce incremental model state management to preserve performance across audio frames.
Streaming Output
- Emit MIDI note events incrementally using a ring buffer or FIFO stream.
- Optionally expose a MIDI output via
mido, rtmidi, or similar libraries for live routing to DAWs or synthesizers.
Latency and Performance Tuning
- Introduce a tunable latency buffer to balance between transcription accuracy and real-time responsiveness.
- Profile model inference to determine optimal window sizes and overlaps under typical hardware constraints.
Optional Network Interface
- For advanced use cases, expose the real-time inference through a lightweight WebSocket or gRPC API, enabling remote control and cloud deployment.
Anticipated Challenges
- Model Adaptability: Ensuring the model performs well on partial inputs without full temporal context.
- Latency Minimization: Achieving real-time responsiveness while maintaining accuracy will require careful tuning.
- False Positives: Low-duration notes may introduce noise in real-time environments, so adaptive thresholding or smoothing may be necessary.
Benefits to the Ecosystem
- Adds live performance capabilities to the
basic-pitch ecosystem
- Opens opportunities for integration with VSTs, DAWs, and educational tools
- Fills a notable gap in the open-source music transcription landscape
Conclusion
Adding real-time streaming MIDI output to basic-pitch would make the tool significantly more versatile and competitive with proprietary solutions. Given its high transcription accuracy and open architecture, basic-pitch is well-positioned to lead in this space. This feature would serve both the open-source community and professional musicians seeking reliable, low-latency audio-to-MIDI conversion.
I’d be happy to contribute or assist with prototyping this functionality.
Feature Proposal: Real Time Streaming MIDI Output Support
Summary
This feature request proposes the addition of real-time streaming MIDI output to
basic-pitch, allowing the system to process and output MIDI data concurrently with audio input. This would significantly expand the usability ofbasic-pitchin live performance, educational, and DAW integration contexts.Motivation
Currently,
basic-pitchoperates as a batch audio-to-MIDI converter, requiring the full audio file to be processed before producing MIDI output. While effective for offline applications, this architecture limits the tool’s applicability in live scenarios. Real-time audio-to-MIDI conversion has growing demand in:Several commercial and research-grade tools provide real-time capabilities (e.g., JamOrigin MIDI Guitar, AIO MIDINet, and various ONNX-based pipelines), but few offer open-source solutions with the transcription accuracy that
basic-pitchprovides.Proposed Implementation
A modular, low-latency real-time streaming pipeline could be introduced as an extension of the existing model. Suggested steps include:
Input Handling
pyaudio,sounddevice, or other low-latency libraries to stream audio input directly from a microphone or system source.Inference Adaptation
Streaming Output
mido,rtmidi, or similar libraries for live routing to DAWs or synthesizers.Latency and Performance Tuning
Optional Network Interface
Anticipated Challenges
Benefits to the Ecosystem
basic-pitchecosystemConclusion
Adding real-time streaming MIDI output to
basic-pitchwould make the tool significantly more versatile and competitive with proprietary solutions. Given its high transcription accuracy and open architecture,basic-pitchis well-positioned to lead in this space. This feature would serve both the open-source community and professional musicians seeking reliable, low-latency audio-to-MIDI conversion.I’d be happy to contribute or assist with prototyping this functionality.