Case Study

Acoustic Keyboard Detection

I built a sound-first classifier that spots keyboard activity using pure audio cues. The system translates microphone streams into MFCC features, applies a custom 1D-CNN to identify keystroke fingerprints, and exposes everything through a Streamlit experience that runs entirely in the browser.

Launch Streamlit demo Request a teardown

Role: Applied ML Engineer
Timeline: Sep–Nov 2024
Stack highlights: TensorFlow, Librosa, Streamlit
Impact: 94% F1 across devices

Why it mattered

Acoustic side channels are usually treated as threats, accessibility enablers, or novel input modalities. I wanted to show how little friction it takes to convert raw sound into behavioral signals—useful for secure workplaces, assistive typing, or ambient analytics. The constraint: run on commodity microphones, keep inference sub-second, and require zero local installs.

Capability: Detect keyboard typing from short audio clips
Capability: Real-time inference inside a deployed Streamlit UI
Capability: MFCC-based feature engineering pipeline
Capability: Custom 1D-CNN tuned for temporal-spectral cues
Capability: Upload or microphone capture with instant feedback

Architecture snapshot

The app wraps signal processing, model scoring, and UX in a single Streamlit deployment. Librosa handles feature extraction while TensorFlow serves a frozen SavedModel. A lightweight inference cache prevents double-processing repeated clips, and the UI streams probability curves to keep users engaged.

Python
Librosa
NumPy
TensorFlow / Keras
Scikit-learn
Streamlit

Interactive UX

Users can upload WAV/MP3 files or record directly via the browser. The app visualizes spectrograms, MFCC heatmaps, and rolling predictions so researchers can see why the classifier triggers. A guided checklist nudges users to test multiple keyboards, distances, and ambient noise levels.

Inference latency: < 600 ms per 3s clip

Accuracy window: optimized for 1–5 ft microphone range

Accessibility toggle: optional haptic/audio cues for low-vision testers

Workflow and guardrails

Each run pushes audio through the same reproducible pipeline used in model development. This keeps live traffic aligned with the training distribution and makes it easy to retrain when we collect new acoustic signatures.

Audio ingestion

Streamlit records or accepts WAV uploads, normalizes bit-depth, and trims silence so the classifier only sees meaningful signal.

MFCC feature grid

Librosa extracts 40-coefficient MFCC windows plus delta accelerations, giving the CNN a structured view of spectral changes.

1D-CNN classifier

Two convolutional blocks with squeeze-and-excite layers learn the rhythmic envelope of keystrokes; dropout keeps the model resilient to room noise.

Deployment + monitoring

Model weights ship with the Streamlit app; inference metrics and user feedback are logged via a lightweight FastAPI webhook for continual tuning.

Outcomes

94% F1 across laptops and external mics after augmenting with pink-noise and distance shifts.
Sub-second feedback loop lets security researchers simulate attacks live.
Zero-install onboarding thanks to Streamlit Cloud deployment.

Extend it

The codebase includes reusable MFCC utilities, model-training notebooks, and deployment scripts so teams can adapt the classifier to voice, Morse, or other acoustic gestures. I am happy to pair on hardening it for your security or accessibility roadmap.

Book a working session Talk to me