Feature-Dive: An interactive search application for audio and symbol music features.
An interactive web application for exploring and searching songs using both audio features and symbolic (MIDI) features extracted from music. I applied manifold dimensionality reduction to compress high-dimensional feature spaces into 3D, and designed an interface where you can interactively experience the similarity between songs.
Overview
While keyword search, playlists, and recommendation engines are the norm for music discovery, this system was designed around a different motivation: experiencing musical similarity as a navigable space.
A feature vector representing a song is inherently tens of dimensions or more. By reducing it to 3D with PCA or t-SNE, you can freely fly through a Three.js-based 3D viewer. Users can also bring in their own music files (wav/MIDI) and place them in the space, visually exploring which existing songs are closest to their own.
Dataset
The Meta MIDI Dataset (MMD) was used as the source of song data. Published on Zenodo, this dataset contains tens of thousands of MIDI files, each accompanied by a mapping to Spotify Track IDs. Using this mapping, acoustic metadata was fetched from the Spotify Web API and integrated with symbolic features extracted from the MIDI files, then stored in PostgreSQL.
1DATASET_PATH/ 2 meta_midi_dataset/ 3 *.mid # MIDI files 4 MMD_audio_matches.json # MIDI <-> Spotify Track ID mapping 5 MMD_spotify_all.csv # Song metadata fetched from Spotify 6 spotify_sample/ # Spotify preview audio (mp3)
Two primary tables were created in the database:
| Table | Contents |
|---|---|
song | Song metadata (artist, title, genre, release date, etc.) |
spotify_features | Spotify Audio Features (acousticness, danceability, energy, etc. + album artwork URL) |
Features extracted from MIDI and from audio files are stored in separate tables and joined at query time when the API is called.
Feature Extraction
Audio Features (librosa)
Spotify preview clips (30-second mp3s) are analysed with librosa to extract the following features:
1# from audio_feature.py 2AUDIO_FEATURE_ORDER = [ 3 "spotify_track_id", 4 "tempo", 5 "zero_crossing_rate", 6 "harmonic_components", 7 "percussive_components", 8 "spectral_centroid", 9 "spectral_rolloff", 10 "chroma_frequencies", # 12-dimensional chromagram 11]
The actual extraction looks like this:
1y, sr = librosa.load(path) 2tempo = float(librosa.beat.tempo(y=y, sr=sr)[0]) 3zcr = librosa.feature.zero_crossing_rate(y=y, pad=False)[0] 4y_harm, y_perc = librosa.effects.hpss(y=y) # harmonic/percussive separation 5y_harm_rms = librosa.feature.rms(y=y_harm)[0] 6y_perc_rms = librosa.feature.rms(y=y_perc)[0] 7spectral_centroids = librosa.feature.spectral_centroid(y=y, sr=sr)[0] 8spectral_rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)[0] 9chromagram = librosa.feature.chroma_stft( 10 y=y, sr=sr, hop_length=512).mean(axis=1).astype(float)
The chromagram is aggregated into 12 dimensions (energy per pitch class, C through B) to capture the tonal character of the music.
Symbolic Features (muspy)
Symbolic features are extracted from MIDI files using muspy:
1# from midi_feature.py 2MIDI_FEATURE_ORDER = [ 3 "md5", 4 "pitch_range", 5 "n_pitches_used", 6 "n_pitch_classes_used", 7 "polyphony", 8 "polyphony_rate", 9 "scale_consistency", 10 "pitch_entropy", 11 "pitch_class_entropy", 12 "empty_beat_rate", 13 "drum_in_duple_rate", 14 "drum_pattern_consistency", 15]
1mus: Music = muspy.read_midi(path) 2pitch_range = muspy.pitch_range(mus) 3n_pitches_used = muspy.n_pitches_used(mus) 4n_pitch_classes_used = muspy.n_pitch_classes_used(mus) 5polyphony = muspy.polyphony(mus) 6polyphony_rate = muspy.polyphony_rate(mus) 7scale_consistency = muspy.scale_consistency(mus) 8pitch_entropy = muspy.pitch_entropy(mus) 9pitch_class_entropy = muspy.pitch_class_entropy(mus) 10empty_beat_rate = muspy.empty_beat_rate(mus)
Metrics such as scale_consistency (how closely the notes adhere to a scale) and polyphony (harmonic complexity) were chosen for their correspondence to perceptible musical richness.
These symbolic features reflect score-level structure that audio features struggle to capture. The ability to switch between the two feature spaces while exploring is the core concept of this application.
Dimensionality Reduction
To visualise high-dimensional feature vectors in 3D, several dimensionality reduction methods are implemented in dim_reduction.py:
1from sklearn.decomposition import PCA 2from sklearn.manifold import TSNE 3 4def dim_reduction_pca(data: np.ndarray) -> np.ndarray: 5 return PCA(n_components=3).fit_transform(data) 6 7def dim_reduction_tsne(data: np.ndarray) -> np.ndarray: 8 return TSNE(n_components=3, n_iter=1000).fit_transform(data)
Additionally, Hierarchical t-SNE (h-tSNE) is implemented to account for hierarchical dependencies between features. This approach builds a graph with NetworkX and incorporates path distances between features into the t-SNE distance matrix — attempting to preserve the "semantic groupings" of features that plain t-SNE tends to lose.
The dimensionality reduction method can be switched via the method parameter in API requests:
1{ 2 "features_name": ["pitch_entropy", "polyphony", "scale_consistency", "tempo"], 3 "method": "PCA", 4 "n_songs": 500, 5 "genres": ["rock", "pops"], 6 "year_range": [1990, 2005], 7 "user_songs": [] 8}
Backend API (Flask)
A Flask REST API handles the delivery of feature data. The main endpoints are:
| Endpoint | Method | Description |
|---|---|---|
/get_3d_features | POST | Returns 3D coordinates for specified features, reduction method, genre, and year range |
/user_data/audio | POST | Upload a user's audio file (wav, etc.) |
/user_data/midi | POST | Upload a user's MIDI file |
/get_features_sample | GET/POST | Fetch sample data |
When a user uploads a file, the server extracts features in real time, re-runs the dimensionality reduction including the new data, and returns the result. This lets users immediately see where their file lands in the space.
1# from api.py (upload handler) 2@app.route("/user_data/audio", methods=["POST"]) 3def user_data_audio(): 4 file = request.files.get('file') 5 if file: 6 file_name = datetime.now().strftime("%Y%m%d-%H%M%S") + "-" + \ 7 (file.filename or "user_audio.wav") 8 with open(f"uploads/audio/{file_name}", 'wb') as f: 9 file.save(f) 10 return jsonify({"fileName": file_name})
Frontend (Next.js + react-three-fiber)
The frontend is built with Next.js + TypeScript, using @react-three/fiber (a React wrapper for Three.js) and @react-three/drei for 3D rendering.
3D Space Viewer (PointsViewer)
Each point representing a song is rendered in bulk using Instances from @react-three/drei, keeping rendering cost low even with many points. Camera controls can be toggled between OrbitControls and ArcballControls, and a GizmoHelper always shows the current viewing direction.
1import * as Drei from "@react-three/drei"; 2import { Canvas } from "@react-three/fiber"; 3import { ArcballControls, GizmoHelper, GizmoViewport, 4 Instances, OrbitControls } from "@react-three/drei"; 5 6// Render point cloud with per-genre colour mapping 7<Instances> 8 {songs.map((song) => ( 9 <SongPoint key={song.md5} song={song} color={genreColor(song.genre)} /> 10 ))} 11</Instances>
Clicking a point opens the song's details in a side panel and lets you preview it via the embedded Spotify player.
Spotify Player (SpotifyPlayer)
1// SpotifyPlayer.tsx 2export default function SpotifyPlayer({ track_id }: { track_id: string }) { 3 return ( 4 <iframe 5 className="spotify-player" 6 src={`https://open.spotify.com/embed/track/${track_id}`} 7 width={"50%"} 8 height={"80px"} 9 /> 10 ); 11}
Preview playback is provided through Spotify's embed iframe. Because you can listen the moment you click a point in the space, the "explore while listening" experience flows without interruption.
User File Upload (AudioTrimmer)
The AudioTrimmer component supports trimming wav and mp3 files, allowing users to cut out any segment of an audio file before sending it to the server. This opens up use cases like bringing in field recording material and exploring how it relates to existing songs in the dataset.


