Mel Scale Calculator for Psychoacoustic Pitch

Mel Scale Calculator

Convert between frequency and perceived pitch, then build mel-spaced bands for speech, music analysis, MFCC filters, and psychoacoustic reference work.

🎵Fast Psychoacoustic Presets

🎚Mel Conversion Inputs

Band mode also converts the center frequency entered below.
HTK is common in speech tooling; Slaney is common in MFCC filter banks.
Use the acoustic frequency, not MIDI note number.
Used when mode is mel to Hz, and shown for comparison otherwise.
Typical speech models start near 80 Hz or 100 Hz.
Keep this at or below the Nyquist frequency.
MFCC speech systems often use 20 to 40 mel filters.
Triangular mode reports edge count for filter-bank design.
Used to check Nyquist and FFT-bin placement.
Bin width equals sample rate divided by FFT size.
More decimals help when exporting analysis filters.
Shows mel distance against another musical or speech tone.
Frequency Position
1000
mel
Inverse Frequency
1000
Hz from entered mel
Local Resolution
7.6
Hz per 10 mel near target
FFT Placement
43
nearest bin
Mel-spaced band centers

📊Mel Scale Reference Data

1 kHz
HTK anchor near 1000 mel
700 Hz
Formula bend frequency
26
Common speech MFCC filters
40
Detailed music filter set
FrequencyHTK melERB rateCommon audio use
80 Hz122.2 mel3.7 ERBLow speech and bass fundamentals
125 Hz185.0 mel5.0 ERBLow male voice and room-mode checks
440 Hz549.6 mel10.9 ERBConcert A4 and musical reference tuning
700 Hz781.8 mel13.0 ERBTypical first-formant neighborhood
1000 Hz1000.0 mel15.6 ERBReference point for classic mel matching
3400 Hz2117.4 mel27.3 ERBTelephone speech upper passband
8000 Hz2840.0 mel33.3 ERBWideband speech and brightness cues
20000 Hz3816.9 mel40.6 ERBNominal full-range hearing limit
Formula comparison
FormulaForward mappingBest useInterpretation note
HTK / O'Shaughnessy2595 log10(1 + f / 700)Speech recognition, quick Hz to mel work1000 Hz maps almost exactly to 1000 mel
Natural log equivalent1127 ln(1 + f / 700)Codebases that prefer natural logarithmsNumerically equivalent apart from rounding
SlaneyLinear below 1 kHz, log aboveAuditory Toolbox and many MFCC pipelinesReports Slaney mel units, not HTK mel numbers
ERB-rate companion21.4 log10(1 + 0.00437f)Critical-band comparison onlyIncluded as context, not a mel replacement
Common mel band plans
ApplicationFrequency spanTypical filtersWhy it works
Narrow speech MFCC80 to 7600 Hz20 to 26Tracks phonetic detail while avoiding wasted high bands
Wideband speech50 to 8000 Hz32 to 40Preserves consonant energy for cleaner features
Music timbre analysis20 to 20000 Hz40 to 64Gives dense low-frequency centers and broader top bands
Vocal formant study250 to 3500 Hz12 to 20Covers F1 through F4 with perceptual spacing
Telephone band300 to 3400 Hz16 to 24Matches limited-band speech intelligibility tests
Mel spacing behavior
RegionPerceptual behaviorHz spacing effectDesign implication
Below 500 HzPitch changes are finely resolvedSmall Hz steps create noticeable mel changesUse more low-frequency centers for bass and voice fundamentals
500 to 1000 HzTransition toward logarithmic hearingMel and Hz spacing both remain easy to interpretGood anchor region for checking formula choice
1 to 5 kHzSpeech clarity and presence dominateEqual mel steps become wider in HzIdeal range for formants and consonant features
Above 5 kHzBrightness cues spread across broad bandsLarge Hz spans may equal modest mel changesKeep Nyquist and sample rate limits visible
Band-design tip: Build filter banks in mel space first, then convert each center or edge back to Hz before assigning FFT bins.
Formula tip: Keep one formula through a whole project. Mixing HTK and Slaney values can shift center frequencies even when the names both say mel.

The way that human hearing perceive sound frequency is not linear, but instead different than the way that a computer perceive sound frequency. A computer perceives sound in terms of hertz, which are unit of cycles per second. However, the human brain dont perceive hertz in the same way as a computer and does not understand them as a straight lines.

Instead, the human brain understands sound frequencies in a way that prioritize certain frequencies over others within the sound that is heard. Therefore, because the human brain do not perceive sound frequencies in a linear fashion, sounds of equal distance on a piano key will not sound equal to each other. In order to account for the way the human brain perceives sound frequencies, engineers utilize the mel scale.

How the Mel Scale Matches Human Hearing

The mel scale is useful in that it allow for the human to translate the physical units of hertz into the pitch that the human brain perceives. Humans are more sensitive to changes in low frequencies than high frequencies, which is why audio equalizers has wider bands of high frequencies. When converting from hertz to mels, or vice versa, there is different formulas that may be used.

For example, speech recognition software often uses the HTK formula in relation to how the human brain process language. In contrast, auditory toolboxes often use the Slaney scale in relation to MFCC analysis. Each of these formulas can be used for each project, but mixing the two may lead to shifts in the filters for the software, which may lead to inaccuracy in the softwares analysis of sound.

In the creation of a filter bank based on the mel scale, various considerations must be made. For speech recognition software, the filters should be dense in the low frequencies, which contain the vocal frequencies of an individual that speak, but become broader at higher frequencies, which contain the hiss of consonants in speech. A calculator may be of great use in translating mel frequencies to hertz, which allows for engineers to more easy form a filter bank for certain tasks.

For example, certain tasks may require more band of filters than others, such as recognizing a telephone signal versus recognizing a musical composition. In addition to the mel scale, another consideration is the Nyquist frequency. The Nyquist frequency is half the rate at which the signal is sampled.

Digital sounds cannot contain frequencies that is higher than the Nyquist frequency. For instance, using a sample rate of 16,000 Hz results in a Nyquist frequency of 8,000 Hz. In this example, any filter bank that have frequency measurements higher than 8,000 Hz will be of no use to the digital audio software.

Thus, a tool to measure the FFT placement of the filters can ensure the software return no errors. Another value is the Equivalent Rectangular Bandwidth, or ERB rate. Similar to the mel scale, the ERB rate consider the way that the human ear perceive sound.

Due to this similarity, many auditory engineer use this value in medical and hearing research projects. However, those who work in the fields of music or speech applications use mel scale due to its simplicity. Overall, the mel scale may be used for a variety of projects related to sound.

However, the reason for the use of the mel scale is to ensure that the computer software that recognize sound frequencies does so in the same way as the human brain and ears. Thus, regardless of the task that is to be perform with the software, engineers must stop thinking of sound in terms of linear bands of frequencies, and must instead think of sound in terms of a curve of frequencies so that the audio engineering software return accurate results. Youll find that people should of looked at teh curve more closely.

It is actualy a lot of work to make sure the results is correct.

Mel Scale Calculator for Psychoacoustic Pitch

Leave a Comment