Case Study
Acoustic Keyboard Detection
I built a sound-first classifier that spots keyboard activity using pure audio cues. The system translates microphone streams into MFCC features, applies a custom 1D-CNN to identify keystroke fingerprints, and exposes everything through a Streamlit experience that runs entirely in the browser.
- Role
- Applied ML Engineer
- Timeline
- Sep–Nov 2024
- Stack highlights
- TensorFlow, Librosa, Streamlit
- Impact
- 94% F1 across devices
Why it mattered
Acoustic side channels are usually treated as threats, accessibility enablers, or novel input modalities. I wanted to show how little friction it takes to convert raw sound into behavioral signals—useful for secure workplaces, assistive typing, or ambient analytics. The constraint: run on commodity microphones, keep inference sub-second, and require zero local installs.
Capability: Detect keyboard typing from short audio clips
Capability: Real-time inference inside a deployed Streamlit UI
Capability: MFCC-based feature engineering pipeline
Capability: Custom 1D-CNN tuned for temporal-spectral cues
Capability: Upload or microphone capture with instant feedback
Architecture snapshot
The app wraps signal processing, model scoring, and UX in a single Streamlit deployment. Librosa handles feature extraction while TensorFlow serves a frozen SavedModel. A lightweight inference cache prevents double-processing repeated clips, and the UI streams probability curves to keep users engaged.
- Python
- Librosa
- NumPy
- TensorFlow / Keras
- Scikit-learn
- Streamlit
Interactive UX
Users can upload WAV/MP3 files or record directly via the browser. The app visualizes spectrograms, MFCC heatmaps, and rolling predictions so researchers can see why the classifier triggers. A guided checklist nudges users to test multiple keyboards, distances, and ambient noise levels.
Inference latency: < 600 ms per 3s clip
Accuracy window: optimized for 1–5 ft microphone range
Accessibility toggle: optional haptic/audio cues for low-vision testers
Workflow and guardrails
Each run pushes audio through the same reproducible pipeline used in model development. This keeps live traffic aligned with the training distribution and makes it easy to retrain when we collect new acoustic signatures.
Audio ingestion
Streamlit records or accepts WAV uploads, normalizes bit-depth, and trims silence so the classifier only sees meaningful signal.
MFCC feature grid
Librosa extracts 40-coefficient MFCC windows plus delta accelerations, giving the CNN a structured view of spectral changes.
1D-CNN classifier
Two convolutional blocks with squeeze-and-excite layers learn the rhythmic envelope of keystrokes; dropout keeps the model resilient to room noise.
Deployment + monitoring
Model weights ship with the Streamlit app; inference metrics and user feedback are logged via a lightweight FastAPI webhook for continual tuning.
Outcomes
- 94% F1 across laptops and external mics after augmenting with pink-noise and distance shifts.
- Sub-second feedback loop lets security researchers simulate attacks live.
- Zero-install onboarding thanks to Streamlit Cloud deployment.
Extend it
The codebase includes reusable MFCC utilities, model-training notebooks, and deployment scripts so teams can adapt the classifier to voice, Morse, or other acoustic gestures. I am happy to pair on hardening it for your security or accessibility roadmap.