Under the Hood: Building a Real-Time Chord Recognizer

The problem is not a lookup

The first intuition when building a chord recognizer is to build a dictionary. There are only 12 pitch classes, which means there are only 2^12 = 4096 possible pitch-class sets. Store a name for each set, and when a user plays C-E-G, look up {C, E, G} and return “C major.”

The problem is not memory. Four thousand entries is trivial. The problem is meaning. A pitch-class set does not contain enough information to decide what musicians will call it.

Piano players often leave out notes that a dictionary entry might expect. Extended chords add notes that no fixed dictionary entry anticipates. And the same set of pitch classes, as discussed in the companion article, can legitimately be described as multiple different chords depending on musical context.

What you actually need is a scoring model. It has to evaluate how well any given set of notes fits each chord type, rank all plausible interpretations, and apply musical judgment when scores are close.

Overview: a four-stage pipeline

Before diving into each component, here is the overall shape of the algorithm. A snapshot of sounding notes enters at the top; a ranked list of chord interpretations comes out at the bottom.

Input: set of sounding pitch classes + lowest (bass) note

↓

Pitch-class bitmask

12-bit integer: one bit per semitone in the octave

↓

Candidate generation

Each sounding note becomes a candidate root, scored against every chord template, extensions extracted

↓

Score normalization

Raw scores are normalized for fair comparison across chord complexities

↓

Ranking

Musical heuristics resolve ambiguous scores; hard structural rules override when the score alone would pick the wrong answer

↓

Output: top ranked chord candidates, result cached in LRU

The rest of this article walks through each stage in detail, ending with a discussion of known limitations.

Pitch classes and bitmasks

WhatChord models the common 12-tone equal temperament (12-TET) pitch-class framework used by MIDI keyboards, which divides each octave into equal semitone positions. A pitch class is the note’s position within that octave, ignoring which octave it’s in, so middle C, the C above it, and the C three octaves below all share pitch class 0. In this engine, pitch classes are numbered 0 (C) through 11 (B).

For analysis, the engine collapses the sounding notes into a set of pitch classes plus the lowest sounding note as bass. The pitch-class set is represented as a 12-bit integer mask where bit n is set if pitch class n is present. C major (C=0, E=4, G=7) looks like this:

11	10	9	8	7	6	5	4	3	2	1	0
B	A♯	A	G♯	G	F♯	F	E	D♯	D	C♯	C
0	0	0	0	1	0	0	1	0	0	0	1

// Pitch classes: C=0, E=4, G=7
int pcMask = (1 << 0) | (1 << 4) | (1 << 7);
// pcMask == 0b000010010001 == 0x091

This representation is compact and fast. Checking whether a pitch class is present is a single bitwise AND. Counting present pitch classes is a popcount. Rotating the set relative to a candidate root is a loop over bits with modular arithmetic. All of these operations are cheap.

A key design decision: only pitch classes actually present in the voicing are tested as candidate roots. There are no “ghost roots” and the algorithm never proposes an interpretation where the chord is rooted on a note that is not being played. This keeps the candidate count small (bounded by the number of sounding notes, typically 3–7) and avoids obviously wrong readings.

This is a deliberate “solo keyboard” assumption. The current engine is optimized for the common case where the same MIDI stream contains both the harmony and the bass note. A future ensemble mode could relax that rule for settings where another instrument is carrying the bass, allowing rootless voicings to imply roots that are not literally present in the keyboard part.

Chord templates

Chord qualities are also defined as bitmask templates. Each one describes three sets of intervals relative to the root:

Required: tones that must be present to identify this quality. Missing more than one required tone causes the template to be skipped entirely.
Optional: tones frequently omitted in real voicings (almost always the perfect 5th). Present when played, unremarkable when absent.
Penalty: tones that actively contradict this quality. Having a major 3rd present when you are trying to identify a minor chord hurts the score.

The 26 templates, organized by complexity:

Quality	Required intervals	Optional	Key penalties / constraints
Major	R, M3	P5	m3, m7, M7
Major (♭5)	R, M3, ♭5	—	P5, m3, m7, M7
Minor	R, m3	P5	M3, m7, M7
Minor ♯5	R, m3, ♯5	—	M3, P5, m7, M7
Diminished	R, m3, ♭5	—	M3, P5
Augmented	R, M3, ♯5	—	m3, P5
Sus2	R, M2, P5	—	m3, M3, m7, M7
Sus4	R, P4, P5	—	m3, M3, m7, M7
Double sus (Sus2sus4)	R, M2, P4, P5	—	Exact match only
Major 6	R, M3, M6	P5	m3, m7, M7
Minor 6	R, m3, M6	P5	M3, m7, M7
Dominant 7	R, M3, m7	P5	M7, m3
7sus2	R, M2, m7	P5	m3, M3, P4, M7
7sus4	R, P4, m7	P5	m3, M3, M7
7♭5	R, M3, ♭5, m7	—	P5, M7, m3
7♯5	R, M3, ♯5, m7	—	P5, M7, m3
Major 7	R, M3, M7	P5	m7, m3
Major 7sus2	R, M2, M7	P5	m3, M3, P4, m7
Major 7sus4	R, P4, M7	P5	m3, M3, m7
Major 7♭5	R, M3, ♭5, M7	—	P5, m7, m3
Major 7♯5	R, M3, ♯5, M7	—	P5, m7, m3
Minor 7	R, m3, m7	P5	M7, M3
Minor 7♯5	R, m3, ♯5, m7	—	P5, M7, M3
Minor-Major 7	R, m3, M7	P5	M3, m7
Half-Diminished 7	R, m3, ♭5, m7	—	P5, M3, M7
Fully Diminished 7	R, m3, ♭5, d7	—	m7, P5, M3, M7

Notice that the perfect 5th is optional for most chord families. Requiring it would cause the algorithm to miss many idiomatic voicings in common use.

Penalty tones are not hard rejections. The template is still scored, it just loses points. This handles cases where a note might simultaneously belong to one chord and partially fit another, and lets the score reflect the degree of fit rather than producing a binary yes/no.

Template scoring

For each candidate root (each pitch class present in the voicing), the analyzer rotates the pitch class mask relative to that root to get an interval mask. Then it scores that interval mask against all 26 templates.

// Rotate: compute intervals above rootPc for each sounding note
int rotateMaskToRoot(int pcMask, int rootPc) {
  var rel = 0;
  for (var pc = 0; pc < 12; pc++) {
    if ((pcMask & (1 << pc)) == 0) continue;
    final interval = (pc - rootPc) % 12;
    rel |= (1 << (interval < 0 ? interval + 12 : interval));
  }
  return rel;
}

The scoring formula accumulates raw points from several components:

Component	Weight	Notes
Each required tone present	+4.0	Structural foundation
Each missing required tone	-6.0	Max 1 allowed; 2+ causes the template to be rejected
Each optional tone present	+1.5	Adds color without being essential
Each penalty tone present	-3.0	Contradicts the chord quality
Each added-complexity tone	-0.5	Before extension extraction; small because extensions are real
Stable bass fit	+1.0	Root position, 1st inv, 2nd inv, 3rd inv; also ninth bass in a complete natural dominant 13th
Bass is color tone (7th-family chord)	+0.75	Upper-structure voicing, legitimate
Bass is extension (triad + slash)	+0.25	Add-chord slash notation
Bass unexplained by template	-0.25	Arbitrary slash
Alteration penalty (altered extension)	-0.30 / -0.60	Softer value applies to plain triads and fully dim7; larger penalty is the default. A ♯11 as natural Lydian color is exempt
Split-ninth color penalty	-0.05	Applied when a chord uses both ♭9 and natural 9 color
Lydian-dominant partial-stack bonus	+0.70	Rewards a dominant whose 9 and ♯11 form a partial Lydian stack, so the score prefers it over remote altered-fifth slash readings
Lydian-dominant 13♯11 coherence bonus	+2.1	Stronger version for a complete Lydian or altered dominant stack (9, ♯11, and a 13 or ♭13), which reads as one chord rather than a slash reinterpretation
Fifthless natural extension stack bonus	+2.4	Rewards a fifthless major 7th (9 plus ♯11) or dominant 7th (9 plus 13) whose upper colors stack cleanly. A natural 11 against the major 3rd is too tense to count.
Fifthless major-thirteenth stack bonus	+1.9	Same idea for a full fifthless major 13(♯11). Slightly lower so complete dominant-13th inversions can still compete.
Complete dominant flat-13 shell bonus	+0.15 / +0.70	Keeps a full dominant 7th shell with ♭13 competitive against its enharmonic maj7♯5 reading. The larger value applies when a 9th (natural or altered) is also present.
Complete add9 slash triad bonus	+3.2	Applied to complete major/minor triads whose slash bass is the added ninth, such as D/E or C#/D#
Sus chord with suspended tone in bass	-2.00	Demotes unusual inversions like D7sus2/E (sus2 in bass) so competing add-chord or triad readings can win via tie-breakers
6th chord without 5th (3-note voicing)	-0.60	Disambiguates C6(no5) from Am7/C

The raw score is then divided by sqrt(requiredToneCount) to normalize across chord complexities:

final denom = reqCount > 0 ? math.sqrt(reqCount.toDouble()) : 1.0;
final normalized = raw / denom;

Without normalization, 7th chords (which have more required tones) would consistently outscore triads just by having more opportunities to earn the +4.0 required-tone bonus. A perfectly matched C major triad would lose to a slightly-mismatched C dominant 7th. The square root normalization (rather than linear) preserves meaningful score separation while preventing complex chords from systematically outscoring well-matched simpler ones.

Diminished 7th penalty

Fully diminished 7th chords receive a softer alteration penalty because their symmetry makes alternate roots score unusually well. Halving that penalty helps preserve the reading musicians expect when an added tone could otherwise make a rotated diminished interpretation look artificially cleaner.

Extension extraction

During template scoring, any tone not accounted for by the base template (required + optional + penalty) lands in the “extras” mask and adds a small complexity cost. A few context-specific penalty tones can be moved into that extras mask first when they function as chord color instead of true contradictions. These get converted to named extensions in the final chord identity:

Alterations (from the extras mask): flat 9 (semitone 1), sharp 9 (semitone 3), sharp 11 (semitone 6), flat 13 (semitone 8)
Split-third add tone: add sharp 9 (semitone 3) when a major-family triad already contains its major third
Natural extensions: 9 (semitone 2), 11 (semitone 5), 13 (semitone 9)

Whether natural extensions become “9/11/13” or “add9/add11/add13” depends on whether the chord has a 7th. With a 7th present, a 9, 11, or 13 reads as a stacked extension regardless of which lower stack members are also sounding, matching common chord-symbol practice where the inner extensions are freely omitted. Without a 7th, the same pitch class is labeled as an add tone instead, except where a triad plus a sixth is named a sixth chord (C6) rather than add13.

Interval 3 is normally a minor third, but the analyzer allows a few narrow musical exceptions where that pitch clearly functions as sharp-nine color instead: dominant 7th shells with ♯9 color, plain major seventh chords with both the major third and major seventh present, and major-family split-third voicings. These exceptions keep common blues, altered-dominant sounds, and explicit altered major-seventh colors from being misread as contradictions.

How the weights were tuned

The scoring weights were not established arbitrarily. They were tuned empirically against a set of golden test cases: specific voicings where the expected output was chosen in advance. Most golden cases capture chords a musician would name unambiguously; ambiguous cases pin the intended primary reading for the current scoring and ranking model.

The test suite covers major, minor, diminished, dominant, altered, and extended voicings across different inversions and ambiguous situations. The tuning loop looked like this:

Run the golden test suite.
For any case that failed, use the chord-debug CLI tool to inspect the full ranked candidate list with score breakdowns.
Adjust weights, add rules, or add scoring bonuses until the failing case passed.
Re-run the full suite to verify no regressions.

The chord-debug tool runs the full analysis pipeline on any set of notes and prints each candidate with its score, individual weight contributions, and the ranking rule that decided its position relative to the previous candidate:

$ dart run tool/chord_debug.dart F# Bb C E

notes: F♯ B♭ C E  |  bass: F♯ (pc 6)  |  key: C major

 1) F♯7♭5          8.50
     members: root=F♯  major3=A♯  flat5=C  flat7=E
     scoring: req+16  bass+1  =>  17.00 raw, 8.50 final

 2) C7♭5 / G♭      8.50  Δ +0.00 ~alt
     (vs prev: prefer root position)
     members: root=C  major3=E  flat5=G♭  flat7=B♭
     scoring: req+16  bass+1  =>  17.00 raw, 8.50 final

 3) C7♯11 / F♯     7.01  Δ -1.49
     (vs prev: score difference beyond tie-break range)

The same diagnostic output also exposes enharmonic spelling decisions: MIDI provides pitch classes, and the engine chooses note names from the winning chord context.

That kind of diagnostic visibility was essential for understanding why the algorithm chose wrong answers and what needed to change. A weight that fixed one case would sometimes break another, and the only way to make progress without regressing was to have the full ranked list visible while making targeted adjustments.

The ranking problem

The debug output above shows why scoring is only the first half of the problem. Once multiple readings are plausible, the analysis engine needs a separate ranking layer that encodes musical priorities more directly than a single numeric score can.

This is not an isolated case. Several common note sets produce near-identical scores for multiple plausible interpretations, and the raw score cannot distinguish which one a musician would name:

C-E-G-A: C6 vs. Am7/C (identical scores; the 6th chord in root position should win)
B-E-G with B in the bass: Em/B vs. G6/B (the complete triad should beat an inverted 6th-chord spelling whose fifth is absent)
B-D-F-A♭: Bdim7 vs. G♯dim7/B vs. Ddim7/C♭ vs. Fdim7/C♭ (C♭ = B enharmonically; all four readings score identically due to dim7 symmetry)

The analyzer handles these ambiguities with two ranking paths: narrow structural overrides for cases where the conventional name should win despite score, and ordered tie-breakers for candidates whose scores are already close.

Hard rules

Hard rules are intentionally narrow guardrails for known failure modes in the scoring model. They only fire when a pitch-class-valid but misleading interpretation scores above the name musicians would normally expect. Each rule is documented in code with the concrete voicing that motivated it, and covered by focused ranking tests so the exception stays bounded.

The near-tie window

The ordered list below applies only after those hard rules have had a chance to run. If none of them fire and the score difference is greater than 0.20 (the nearTieWindow constant), the higher-scoring candidate wins on score alone.

When scores are within the near-tie window, tie-breaker rules are applied sequentially. The first rule that produces a non-tie result decides the ordering:

The displayed alternatives use the same score window as a lower bound, then include every ranked candidate through the last score-window match. This keeps hard-rule ordering coherent when a higher-ranked candidate sits just outside the raw numeric window.

Prefer a voicing-supported upper-structure slash: a complete chord stacked above an isolated bass note, when the input carries real octaves
Prefer root-position 6th over inverted 7th
Prefer a complete major/minor triad over an incomplete inverted 6th chord
Prefer upper-structure dominant 7th slash
Prefer root-position dominant sus, including flat-nine sus colors, over remote slash reinterpretations
Prefer stable extended dominant inversions over altered-fifth dominant slash
Prefer complete altered-fifth dominants over altered major-seventh reinterpretations, including sharp-nine-bass voicings
Prefer a complete sharp-nine thirteenth dominant over a heavily colored sixth chord
Prefer a complete altered dominant thirteenth over an altered minor-thirteenth reading with rarer color
Prefer a complete natural thirteenth dominant over a minor-sixth reading that needs stacked added tones
Prefer a complete flat-nine flat-thirteen dominant over a remote diminished or seventh-family spelling
Prefer a root-position altered sharp-five dominant over remote minor-major or half-diminished reinterpretations
Prefer root-position 9(♯5,♯11) dominant spelling over an equivalent 9(♭5,♭13) spelling
Prefer half-diminished flat-color spellings over equivalent minor sharp-five spellings
Prefer a complete major-triad inversion over a minor sharp-five reading
Prefer complete Lydian 6/9♯11 readings over equivalent major-13-sus4 spellings
Prefer a complete major-triad inversion over a seventh-family chord where the bass is only an add-extension
Prefer root-position diminished 7th
Prefer dominant 7th shell over dim7 slash
Prefer dominant 7th slash over non-dominant seventh-family slash
Prefer a reading that names every tone over one that drops a tone
Prefer a harmonic-minor tonic over a split-third major-triad inversion
Prefer complete Lydian major-nine spellings over close major-thirteenth inversions that include a natural eleventh against the major third
Prefer a higher-scoring major-seventh-bass inversion over a slash reading where the bass is only a remote color tone
Prefer fewer altered/tension colors
Prefer diatonic chords
Prefer a root-position relative-minor seventh over the equivalent major-sixth slash reading
Prefer the tonic chord
Prefer I when the bass is the tonic pitch class
Prefer a complete triad with add-tone extensions over a sparse seventh-family reading that turns the same pitches into remote color
Prefer a root-position minor 6/9 over the equivalent half-diminished slash reading
Prefer natural extensions (9/11/13) over add-tones, then fewer overall, unless that would reward an incomplete slash chord
Prefer Lydian major-nine spelling over equivalent major-nine flat-five spelling
Prefer root position
Prefer altered-fifth dominant-ninth inversions whose bass is the seventh over equivalent readings where the bass is the altered fifth
Prefer the more common name when the corpus shows a strong preference between otherwise equivalent spellings
Prefer cleaner spelling for otherwise tied tritone-related flat-five dominant readings
Prefer more conventional inversion
Prefer 7th chords over triads when both fit
Prefer fewer extensions
Avoid suspended chords

If all of these rules still have not produced a winner, there is a deterministic fallback: sort by root pitch class numerically. This ensures the output is always consistent for the same input, even for exotic voicings.

The ordering of these rules encodes musical priorities. Structural clarity (root position, shell tones) comes before contextual preferences (diatonic, tonic). Conventional naming (fewer alterations, natural extensions, and common corpus labels) comes before complexity. Suspended chords are deprioritized late because they are valid but easy to over-detect when a third is absent, so they should win only when the surrounding evidence supports them.

Turning the comparison into a stable order

Because hard rules and the near-tie window deliberately override raw score, the candidate comparison is not guaranteed to be transitive: A can beat B, B can beat C, and yet C can beat A. A generic sort is undefined on a comparison like that and can bury a strong reading below a weaker one.

So the engine linearizes the candidates rather than sorting them directly: it repeatedly takes the one that nothing else outranks, breaking any cycle in a fixed, repeatable way. The result honors every rule above and always produces the same order for a given input.

Caching for real-time performance

Running the full pipeline (up to 12 candidate roots × 26 templates = 312 template evaluations) on every MIDI state change would be wasteful. In practice, a pianist tends to produce many repeated input states throughout a musical piece.

The engine uses a 512-entry Least Recently Used (LRU) cache implemented as a LinkedHashMap. The cache key is a hash of three inputs:

The pitch class set
The analysis context (key signature + tonality)
The take parameter (how many candidates to return, default 8)

The context is included in the key because diatonic preference rules depend on it; a different key signature can change which candidate ranks first even for identical voicings.

final key = Object.hash(input.cacheKey, context, take);
final cached = _cache[key];
if (cached != null) {
  // Promote on hit so eviction removes LRU, not FIFO
  _cache
    ..remove(key)
    ..[key] = cached;
  return cached;
}

The LinkedHashMap preserves insertion order. On a cache hit, the entry is removed and re-inserted at the end (most recently used). On eviction, the first key is removed (least recently used). This is the standard LRU pattern in Dart without a separate doubly-linked list.

The 512-entry capacity was chosen from benchmarks across random inputs, exhaustive inputs, tonal progressions, and simulated live note transitions. Realistic playing showed high reuse, and larger caches produced no material improvement.

What the algorithm does not handle

A few things are known limitations or non-goals:

Polychords. Two simultaneous independent sonorities (like Stravinsky's Petrushka chord, an F♯ major triad over a C major triad) are not modeled. The algorithm will find the best single-chord description of the combined note set.
Temporal context. Each snapshot of sounding notes is analyzed independently. The algorithm does not track what chord came before and does not use progression history to inform interpretation. Using temporal context to further increase accuracy is a natural direction for future improvement.
Non-12-TET tuning. This engine is built around 12 pitch classes and standard MIDI note numbers. Microtonal intervals, quarter tones, and just-intonation distinctions have no representation in this model.

The scoring heuristics are tuned from experience. They encode accumulated musical convention, but they are adjustable constants, not proven axioms. Edge cases and counterexamples help improve them.

The codebase

WhatChord is written in Dart using the Flutter framework. The chord analysis engine lives entirely in lib/features/theory/domain/analysis/, a handful of files with no platform dependencies and a unit test suite that verifies known-correct outputs across major, minor, dominant, altered, extended, and ambiguous chord types.

The project is open source and released under the Zero Clause BSD License, which means you are free to use, modify, and share the code however you like.

If you find a misidentified chord, the best way to report it is to long-press the chord card to open Analysis Details, copy the diagnostic output, and open a GitHub issue. The diagnostic output includes the exact pitch classes and context that produced the result, which makes it straightforward to reproduce and debug.

See it in action.

Free for iOS and Android. No subscription, no ads, all analysis on-device.

View source on GitHub

Prefer not to install? Try identifying chords in your browser →