Under the Hood: Building a Real-Time Chord Recognizer

The problem is not a lookup

The first intuition when building a chord recognizer is to build a dictionary. There are only 12 pitch classes, which means there are only 2^12 = 4096 possible pitch-class sets. Store a name for each set, and when a user plays C-E-G, look up {C, E, G} and return “C major.”

The problem is not memory. Four thousand entries is trivial. The problem is meaning. A pitch-class set does not contain enough information to decide what musicians will call it.

Pianists often leave out notes that a dictionary entry might expect. Extended chords add notes that no fixed dictionary entry anticipates. And the same set of pitch classes, as discussed in the companion article, can legitimately be described as multiple different chords depending on musical context.

What you actually need is a scoring model. It has to evaluate how well any given set of notes fits each chord type, rank all plausible interpretations, and apply musical judgment when scores are close. That is the approach WhatChord takes.

Overview: a four-stage pipeline

Before diving into each component, here is the overall shape of the algorithm. A snapshot of sounding notes enters at the top; a ranked list of chord interpretations comes out at the bottom.

Input: set of sounding pitch classes + lowest (bass) note

↓

Pitch-class bitmask

12-bit integer: one bit per semitone in the octave

↓

Candidate generation

Each sounding note becomes a candidate root, scored against every chord template, extensions extracted

↓

Score normalization

Raw scores are normalized for fair comparison across chord complexities

↓

Ranking

Musical heuristics resolve ambiguous scores; hard structural rules override when the score alone would pick the wrong answer

↓

Output: top ranked chord candidates, result cached in LRU

The rest of this article walks through each stage in detail, ending with a discussion of known limitations.

Pitch classes and bitmasks

Western music theory operates in 12-tone equal temperament (12-TET): 12 equal semitones per octave. Pitch class is the note’s position within that octave, ignoring which octave it’s in, so middle C, the C above it, and the C three octaves below all share pitch class 0. In WhatChord, pitch classes are numbered 0 (C) through 11 (B).

A chord is a set of pitch classes. WhatChord represents this as a 12-bit integer mask where bit n is set if pitch class n is present. C major (C=0, E=4, G=7) looks like this:

11	10	9	8	7	6	5	4	3	2	1	0
B	A♯	A	G♯	G	F♯	F	E	D♯	D	C♯	C
0	0	0	0	1	0	0	1	0	0	0	1

// Pitch classes: C=0, E=4, G=7
int pcMask = (1 << 0) | (1 << 4) | (1 << 7);
// pcMask == 0b000010010001 == 0x091

This representation is surprisingly convenient. Checking whether a pitch class is present is a single bitwise AND. Counting present notes is a popcount. Rotating the voicing relative to a candidate root is a loop over bits with modular arithmetic. All of these operations are cheap.

A key design decision: only pitch classes actually present in the voicing are tested as candidate roots. There are no “ghost roots” and the algorithm never proposes an interpretation where the chord is rooted on a note that is not being played. This keeps the candidate count small (bounded by the number of sounding notes, typically 3–7) and avoids obviously wrong readings.

This is a deliberate “solo keyboard” assumption. WhatChord is currently optimized for the common case where the same MIDI stream contains both the harmony and the bass note. A future ensemble mode could relax that rule for settings where another instrument is carrying the bass, allowing rootless voicings to imply roots that are not literally present in the keyboard part.

Chord templates

WhatChord also defines chord qualities as bitmask templates. Each one describes three sets of intervals relative to the root:

Required: tones that must be present to identify this quality. Missing more than one required tone causes the template to be skipped entirely.
Optional: tones frequently omitted in real voicings (almost always the perfect 5th). Present when played, unremarkable when absent.
Penalty: tones that actively contradict this quality. Having a major 3rd present when you are trying to identify a minor chord hurts the score.

The 22 templates, organized by complexity:

Quality	Required intervals	Optional	Key penalty tones
Major	R, M3	P5	m3, m7, M7
Minor	R, m3	P5	M3, m7, M7
Diminished	R, m3, ♭5	—	M3, P5
Augmented	R, M3, ♯5	—	m3, P5
Sus2	R, M2, P5	—	m3, M3, m7, M7
Sus4	R, P4, P5	—	m3, M3, m7, M7
Major 6	R, M3, M6	P5	m3, m7, M7
Minor 6	R, m3, M6	P5	M3, m7, M7
Dominant 7	R, M3, m7	P5	M7, m3
7sus2	R, M2, m7	P5	m3, M3, P4, M7
7sus4	R, P4, m7	P5	m3, M3, M7
7♭5	R, M3, ♭5, m7	—	P5, M7, m3
7♯5	R, M3, ♯5, m7	—	P5, M7, m3
Major 7	R, M3, M7	P5	m7, m3
Major 7sus2	R, M2, M7	P5	m3, M3, P4, m7
Major 7sus4	R, P4, M7	P5	m3, M3, M2, m7
Major 7♭5	R, M3, ♭5, M7	—	P5, m7, m3
Major 7♯5	R, M3, ♯5, M7	—	P5, m7, m3
Minor 7	R, m3, m7	P5	M7, M3
Minor-Major 7	R, m3, M7	P5	M3, m7
Half-Diminished 7	R, m3, ♭5, m7	—	P5, M3, M7
Fully Diminished 7	R, m3, ♭5, d7	—	m7, P5, M3, M7

Notice that the perfect 5th is optional for most chord families. This reflects how professional players actually voice chords, particularly in jazz where shell voicings commonly leave out the 5th. Requiring the 5th would cause the algorithm to fail on some of the most idiomatic voicings in common use.

Penalty tones are not hard rejections. The template is still scored, it just loses points. This handles cases where a note might simultaneously belong to one chord and partially fit another, and lets the score reflect the degree of fit rather than producing a binary yes/no.

Template scoring

For each candidate root (each pitch class present in the voicing), WhatChord rotates the pitch class mask relative to that root to get an interval mask. Then it scores that interval mask against all 22 templates.

// Rotate: compute intervals above rootPc for each sounding note
int rotateMaskToRoot(int pcMask, int rootPc) {
  var rel = 0;
  for (var pc = 0; pc < 12; pc++) {
    if ((pcMask & (1 << pc)) == 0) continue;
    final interval = (pc - rootPc) % 12;
    rel |= (1 << (interval < 0 ? interval + 12 : interval));
  }
  return rel;
}

The scoring formula accumulates raw points from several components:

Component	Weight	Notes
Each required tone present	+4.0	Structural foundation
Each missing required tone	-6.0	Max 1 allowed; 2+ causes the template to be rejected
Each optional tone present	+1.5	Adds color without being essential
Each penalty tone present	-3.0	Contradicts the chord quality
Each unexplained “extra” tone	-0.5	Before extension extraction; small because extensions are real
Bass is root or inversion tone	+1.0	Root position, 1st inv, 2nd inv, 3rd inv
Bass is color tone (7th-family chord)	+0.75	Upper-structure voicing, legitimate
Bass is extension (triad + slash)	+0.25	Add-chord slash notation
Bass unexplained by template	-0.25	Arbitrary slash
Alteration penalty (any altered extension)	-0.60	`-0.30` for fully dim7 (see Diminished 7th section below)
Lydian-dominant 13th coherence bonus	+2.1	Applied when root-position dominant has 9, ♯11, and 13 all present
6th chord without 5th (3-note voicing)	-0.60	Disambiguates C6(no5) from Am7/C

The raw score is then divided by sqrt(requiredToneCount) to normalize across chord complexities:

final denom = reqCount > 0 ? math.sqrt(reqCount.toDouble()) : 1.0;
final normalized = raw / denom;

Without normalization, 7th chords (which have more required tones) would consistently outscore triads just by having more opportunities to earn the +4.0 required-tone bonus. A perfectly matched C major triad would lose to a slightly-mismatched C dominant 7th. The square root normalization (rather than linear) preserves meaningful score separation while preventing that over-penalization of complex chords.

Extension extraction

After template scoring, any tone not accounted for by the base template (required + optional + penalty) lands in the “extras” mask. These get converted to named extensions:

Alterations (from the extras mask): flat 9 (semitone 1), sharp 9 (semitone 3), sharp 11 (semitone 6), flat 13 (semitone 8)
Natural extensions: 9 (semitone 2), 11 (semitone 5), 13 (semitone 9)

Whether natural extensions become “9/11/13” or “add9/add11/add13” depends on the stack below them. A 9 needs the 7th. An 11 or 13 needs both the 7th and 9th; the 11 itself is not required before labeling a 13, because 11ths are often omitted in chord symbols and real voicings. Without that support, the same pitch class is labeled as an add tone instead.

There is one dominant-context exception: when a dominant 7th shell has both the major third and flat seventh, interval 3 is treated as a sharp ninth rather than as the template’s minor-third penalty. That lets voicings such as G-B-D-F-A♯ score and spell as G7♯9 instead of plain G7 with an unexplained contradictory tone. The same exception applies to augmented dominant sevenths, so C-E-G♯-B♭-D♯ can be identified as C7♯5♯9.

How the weights were tuned

The scoring weights were not established arbitrarily. They were tuned empirically against a set of golden test cases: specific voicings where the desired output was known in advance. Each golden case captures a chord a musician would name unambiguously, together with structural assertions about the winning candidate’s root, quality, and extensions.

The test suite covers major, minor, diminished, dominant, altered, and extended voicings across different inversions and genuinely ambiguous situations. The tuning loop looked like this:

Run the golden test suite.
For any case that failed, use the chord-debug CLI tool to inspect the full ranked candidate list with score breakdowns.
Adjust weights, add rules, or add scoring bonuses until the failing case passed.
Re-run the full suite to verify no regressions.

The chord-debug tool runs the full analysis pipeline on any set of notes and prints each candidate with its score, individual weight contributions, and the ranking rule that decided its position relative to the previous candidate:

$ dart run tool/chord_debug.dart F# Bb C E

pcs: Bb, C, E, F# |  bass: F#  |  key: C major

 1) F#7b5          8.50
     members: root=F#  major3=A#  flat5=C  flat7=E
     req+16  bass+1  raw=17.00 / sqrt(4) => 8.50

 2) C7b5 / Gb      8.50  Δ +0.00 ~tie
     (vs prev: Prefer root position)
     members: root=C  major3=E  flat5=Gb  flat7=Bb
     req+16  bass+1  raw=17.00 / sqrt(4) => 8.50

 3) C7#11 / F#     6.73  Δ -1.77
     (vs prev: Score outside near-tie window)

One detail worth noting: MIDI gives the app pitch classes, not note names. Pitch class 10 can be written as A♯ or B♭. In this result, the algorithm has identified F♯ as the chord root and pitch class 10 as the major third above that root. An F♯ dominant seventh is spelled with A♯, so WhatChord prints A♯ even if the same piano key could be named B♭ in another context.

That kind of diagnostic visibility was essential for understanding why the algorithm chose wrong answers and what needed to change. A weight that fixed one case would sometimes break another, and the only way to make progress without regressing was to have the full ranked list visible while making targeted adjustments.

The ranking problem

The debug output above shows why scoring is only the first half of the problem. Once multiple readings are plausible, the analysis engine needs a separate ranking layer that encodes musical priorities more directly than a single numeric score can.

This is not an isolated case. Several common note sets produce near-identical scores for multiple plausible interpretations, and the raw score cannot distinguish which one a musician would name:

C-E-G-A: C6 vs. Am7/C (identical scores; the 6th chord in root position should win)
B-D-F-A♭: Bdim7 vs. G♯dim7/B vs. Ddim7/C♭ vs. Fdim7/C♭ (all four readings score identically due to dim7 symmetry)

Some ambiguities are close enough for ordinary tie-breakers. Others need stronger structural rules, because a musically unlikely reading can sometimes score higher than the conventional name.

WhatChord handles all of these cases with a two-layer decision process.

Hard rules

These rules always apply regardless of score difference, when certain structural conditions are met:

Prefer altered dominant 7th over diminished 7th slash. When a dominant 7th in root position has its shell tones (M3 + ♭7) present and color tones (extensions or alterations), and the competing diminished reading would be a slash chord whose bass is a color tone, prefer the dominant. This fires even when the diminished reading scores higher.
Prefer conventional altered seventh over add11 slash. When a complete altered seventh chord in a normal inversion competes against a remote, non-dominant slash reading whose natural 11 clashes with a major third, prefer the conventional seventh-chord name if the score difference is close.
Prefer close root-position dominant 7th over non-dominant slash. When a dominant 7th with shell tones and at least one extension or alteration would lose to a remote, non-dominant slash chord, the dominant wins (provided the score difference is not too large).
Prefer root-position altered-fifth dominant over slash. Flat-five and sharp-five dominant sevenths are tritone-symmetric. When a root-position altered-fifth dominant with a real alteration competes against a close slash reading that only has added or natural color tones, prefer the root-position name.

The near-tie window

If neither hard rule fires and the score difference is greater than 0.20 (the nearTieWindow constant), the higher-scoring candidate wins on score alone.

When scores are within the near-tie window, tie-breaker rules are applied sequentially. The first rule that produces a non-tie result decides the ordering:

Prefer root-position 6th over inverted 7th (the C6 vs. Am7/C case)
Prefer upper-structure dominant 7th slash (color bass with no other alterations)
Prefer root-position diminished 7th (symmetrical chords default to bass-as-root)
Prefer dominant 7th shell over dim7 slash
Prefer fewer altered/tension colors (including natural 11 against a major third)
Prefer diatonic chords (given the key signature)
Prefer the tonic chord (I) over other diatonic options
Prefer I when the bass is the tonic pitch class
Prefer natural extensions (9/11/13) over add-tones; then fewer total extensions
Prefer root position
Prefer 1st inversion over 2nd inversion
Prefer 7th chords over triads
Prefer fewer extensions
Avoid suspended chords

If all of these rules still have not produced a winner, there is a deterministic fallback: sort by root pitch class numerically. This ensures the output is always consistent for the same input, even for exotic voicings.

The ordering of these rules encodes musical priorities. Structural clarity (root position, shell tones) comes before contextual preferences (diatonic, tonic). Conventional naming (fewer alterations, natural extensions) comes before complexity. Suspended chords are deprioritized last because they are valid but less common, and should only win when nothing else fits.

Diminished 7th: a special case in scoring

Fully diminished 7th chords receive a softer alteration penalty (-0.30 instead of -0.60) when competing against other chord readings. The reason is their symmetry problem.

When you add one extra note to a diminished 7th voicing, the algorithm can try to reinterpret the original dim7 notes as a different root’s dim7 chord, treating the “extra” note as an addition. Without softening the penalty, this reinterpretation often scores better than the natural reading, because the reinterpreted dim7 has no “alteration” (the extra note is now natural to its root) while the original reading does (the extra note is an alteration). Musicians do not think about diminished 7ths that way, so the penalty is halved to restore the expected behavior.

Caching for real-time performance

Running the full pipeline (up to 12 candidate roots × 22 templates = 264 template evaluations) on every MIDI state change would be wasteful. In practice, a pianist tends to produce many repeated input states throughout a musical piece.

WhatChord uses a 512-entry Least Recently Used (LRU) cache implemented as a LinkedHashMap. The cache key is a hash of three inputs:

The pitch class set
The analysis context (key signature + tonality)
The take parameter (how many candidates to return, default 8)

The context is included in the key because diatonic preference rules depend on it; a different key signature can change which candidate ranks first even for identical voicings.

final key = Object.hash(input.cacheKey, context, take);
final cached = _cache[key];
if (cached != null) {
  // Promote on hit so eviction removes LRU, not FIFO
  _cache
    ..remove(key)
    ..[key] = cached;
  return cached;
}

The LinkedHashMap preserves insertion order. On a cache hit, the entry is removed and re-inserted at the end (most recently used). On eviction, the first key is removed (least recently used). This is the standard LRU pattern in Dart without a separate doubly-linked list.

The 512-entry capacity was set by benchmarking four workload types: random input, bounded exhaustive input, realistic tonal progressions, and simulated live note transitions (individual notes added and removed one at a time, as a player would actually play). Random and exhaustive inputs showed little cache locality because each chord state was unique. Tonal progressions and live transitions showed high reuse: real playing revisits the same chord states repeatedly within a short window. A 512-entry cache preserved nearly all of the realistic-workload hit rate, and increasing the cache size produced no material improvement. So the capacity, like the scoring weights, was set by measurement rather than intuition.

What the algorithm does not handle

A few things are known limitations or non-goals:

Polychords. Two simultaneous independent harmonies (a D♭ major triad over a C dominant 7th, common in Stravinsky) are not modeled. The algorithm will find the best single-chord description of the combined note set.
Temporal context. Each snapshot of sounding notes is analyzed independently. The algorithm does not track what chord came before and does not use progression history to inform interpretation. Using temporal context to further increase accuracy is a natural direction for future improvement.
Non-12-TET tuning. Western music theory operates in 12-tone equal temperament (12-TET), and MIDI uses integer semitone values to match. Microtonal intervals, quarter tones, and just intonation systems have no representation in this model.

The scoring heuristics are tuned from experience. They encode accumulated musical convention, but they are adjustable constants, not proven axioms. Edge cases and counterexamples help improve them.

The codebase

WhatChord is written in Dart using the Flutter framework. The chord analysis engine lives entirely in lib/features/theory/domain/analysis/, three files with no platform dependencies and a unit test suite that verifies known-correct outputs across major, minor, dominant, altered, extended, and ambiguous chord types.

The project is open source and released under the Zero Clause BSD License, which means you are free to use, modify, and share the code however you like.

If you find a chord that WhatChord misidentifies, the best way to report it is to long-press the chord card to open Analysis Details, copy the diagnostic output, and open a GitHub issue. The diagnostic output includes the exact pitch classes and context that produced the result, which makes it straightforward to reproduce and debug.

See it in action.

Free for iOS and Android. No subscription, no ads, all analysis on-device.

View source on GitHub