Semantic AudiovisuaL Entertainment
Reusable Objects

style element

Audio Processing Terms

AAC (Advanced Audio Coding)
Shortened name for the MPEG-2 and MPEG-4 Advanced Audio Coding specification, declared an international standard of MPEG.
Acoustic fingerprinting
Full songs and/or segments of music or audio can be automatically recognized (and classified) using a system for automatic extraction of fingerprints from unlabelled (untagged or wrongly tagged) songs/segments, which is then matched with a database of fingerprints associated with editorial metadata (name of artist, album and track).
Several different systems for automatic extraction of fingerprints and database-matching exists, and they differ from each other in terms of robustness, accuracy and size of acoustic fingerprints as well as in what parameters/features is extracted from the audio signal. Acoustic fingerprinting has many different use-cases, including copyright protection, radio monitoring, and sound effect library management. See also Digital Watermarking. CBID (Content Based Identification) is a more general term, including both Watermarking and Fingerprinting systems of identification.
The study of sound, covering both the physics of sound and the psychoacoustic side thereof.
Attack, Decay, Sustain, Release. The most common elements used to describe time-domain envelope of a sound.
AIFF (audio interchange file format)
An audio standard for storing monaural and multichannel sampled sounds at a variety of sample and bit rates.
The magnitude or strength of a signal. For signals of audible frequency, amplitude can be said to roughly correspond with our perception of loudness. See also Loudness.
Audio media
The physical container of the audio data, usually representing a musical piece or a sample of a sound.
Authoring tool
A tool that allows creating content (semantic information in our case).
A range of frequencies within a given band.
Beat per minute. A measure commonly used for determining the tempo or “speed” of a piece of music.
Break point function (BPF)
A function represented by an ordered set of points, so that values between points are interpolated. The main difference with a dirac list is that position between diracs are not supposed to have a value. See also Dirac lists.
Measurement of pitch distance (or musical interval) corresponding to 1/100th of a semitone of a chromatic music scale. One cent corresponds to a frequency ratio of the 1200th root of 2. See also Scales.
In music theory: A combination of three or more pitches sounding simultaneously. Chord progression, refers to the successive (and ordered) sequence of chords and harmony in music. On music instruments: a “chord” can be synonym to a string. And in a recording studio: a chord can be the cable connecting two devises. In mathematics a chord can be a straight line joining the end of an arc.
Decibel (dB)
Measure of the amplitude (loudness) of a sound. The unit of sound level is measured as the logarithm (base-10) of the ratio of two quantities proportional to power. In recording studios, 0 dB can also be used as a reference point for dB ratios of signal strength.
Digital Audio
Any audio signal which is quantized (i.e., limited to a distinct set of values) into digits at discrete points in time. The accuracy of a digital value is dependent on the number of bits used to represent it. See also Quantization.
Digital Music
Term used in online music distribution contexts referring simply to music available (in any digital audio or video format) from internet stores or other web-sources (P2P etc).
Dirac List
An ordered list of weighted markers. Dirac lists are used, for instance, to represent note onsets with their time domain position (ms) and their relevance or to represent spectral peaks with their positions (Hz) on the spectral domain and their energy.
Discrete function
A function represented by an ordered set of equidistant abscissa values.
When artefacts are present in an output signal which weren't a part of the input signal of an audio device. Several types of such distorsion exist. In daily speech, it can also refer to the typical electric guitar distorsion effect pedal commonly used in popular music.
DSP algorithm
A structured set of instructions and operations tailored to accomplish a signal processing task. For example, a Fast Fourier Transform (FFT), or a finite impulse response (FIR) filter are common and basic DSP algorithms.
DTS (Digital Theater Systems)
A surround sound digital audio compression. Several different formats exist (DTS Extended, DTS consumer, DTS Cinema etc). Dolby´s AC-3 is a competing format.
Electroacoustic music
A subgenre of Modern Art Music, with composers like Karlheinz Stockhausen, referring to avantgarde or experimental music which incorporate electronic instruments & techniques within the western classical music paradigm. Older terms, or subsets of electroacoustic music is terms like Computer Music, Musique Concrete etc.
Electronic Music
A music genre within popular music, referring to music performed mainly with electronic instruments (electro, synth-pop, techno etc.).
Any curve joining successive peaks of i.e., a wave; or a set of data.
Equalisation (EQ)
Signal processing devise which alters the frequency response of an audio signal.
FFT (Fast Fourier Transform)
Very established method of analyzing the spectrum of sound in digital audio. Several different versions exist, in both frequency domain and in time domain. See also Spectral Analysis.
Various devises and techniques to basically “pass what you want, reject all else” from audio signals or data streams. For audio use, the most common electronic filter is a bandpass-filter, characterized by three parameters: center frequency, amplitude and bandwidth. Filter-theory is a fundamental field within Signal Processing. In general data filtering, a multitude of methods exists, such as for example in statistics for excluding outliers to influence the computation of a mean or variance of a distribution.
Fingerprint extraction
The process of taking multilevel information from the signal content of a song to identify it and its versions.
A text that can be applied or not to a given description subject (song, sample, audio segment, etc.).
Prominant frequency bands determining the phonetic quality of a wowel.
The level of amplification of a given signal. Sound professionals tend to use this term where the layman may say "volume".
Hardwired knowledge
The knowledge an AI system designer includes in the architecture of that system. Usually, this knowledge cannot be changed, becoming sort of 'instinct' for the system in question.
In spectral analysis: The relationship between two frequencies is considered “harmonic” if one frequency is an integer multiple of the other (e.g., the frequencies 101 Hz and 303 Hz are harmonically related since 101 x 3 = 303).
in music theory: the study of how simultaneous tones, chords and chord progressions is structured in pieces of music. It also relates to the study of perceived consonance or dissonance.
ID3 Tags
ID3 is a tagging format (de facto standard) for MP3 files. It allows editorial metadata such as the title, artist, album, track number, etc., to be stored in the MP3 file.
In music theory: The distance between two pitches, either successively (such as in melody) or simultaneously (as in harmony).
A pattern of regularity inferred from the attributes present in a set of objects.
Just Noticeable Difference (JND)
Psychoacoustic measurement of the smallest human noticible difference in i.e., pitch, loudness, duration etc.
On some instruments (like the piano) it refers to the key you press when playing. In music theory: key refers to the tonal fundamental pitch of a piece of music or a section thereof.
LFO (Low Frequency Oscillations/Oscillator)
The term refers normally to frequencies below the human threshold of pitch perception. Such frequencies can be useful to model time-domain variations, such as vibrato of a tone; or it can be used in a synthesizer for modulation.
In computer science: a loop is a series of instructions that performs the same task iteratively. In sound editing and music: this refer to a sound, a segment of a sound, or a relatively short segment of a musical phrase, which will seamlessly repeat itself.
The subjective psychological correlate of amplitude or sound intensity. Research in perception has clearly shown that perceived intensity, energy or loudness depends on many factors: including frequency, timbre, amplitude, duration, etc. See also Amplitude.
Loudness Descriptors
Extracts from the physical strength of a signal the psychologically more relevant correlate of sound intensity or energy, by modeling the human audible system.
Indicates an exact position on a given dimension.
MFCC (mel-frequency cepstral coefficients)
An important timbre-analysis parameter in audio content analysis. These coefficients describe the harmonic spectrum shape, approximately as perceived by the human auditory system.
MIDI (Musical Instrument Digital Interface)
Industry standard bus and protocol for interconnection and control of electronic music instruments.
In signal processing: altering the signal in accordance with the variations within a second signal. Amplitude modulation (AM) and Frequency Modulation (FM) are two common methods used in a wide range of areas, ranging from radio-broadcasting to sound synthesis. Describing the amount of variation in pitch (of vibrato in singing), or amplitude (in the case of tremolo on a string instrument), in terms of FM- and AM-modulation is useful in audio content analysis. In music theory, the term is referring to the ordered harmonic transitions between keys (over a significant portion of the music).
Monophonic Audio
Audio with only one “voice”, i.e. a single instrument recorded on one (mono) or two (stereo) channels. See also Polyphonic Audio.
In a synthesizer this refers to the capability for an electronic instrument to play more than one key/sound at the same time.
Music & Audio collection
A set or pool of organized music pieces or audio samples (sound effects) available through a given repository. See also Personal Audio Library.
Music Consumption
The activity of listening to certain music objects and the process of obtaining and/or transferring these items. When qualifying the noun habit it is understood as that persons musical taste.
MP3 (MPEG-1, Layer 3)
A type of digital audio compression format, popularized for enabling digital music to be distributed over the Internet and large collections to be stored in portable media players.
When a tone/sound begins or end. The onset part of a sound has been shown to contain important cues for instrument/sound source separation and recognition in human perception. Onset/Offset detection is therefore important for automatic audio segmentation and speech recognition/alignment algorithms.
Personal Audio Library
The audio repository owned by a certain user. These libraries are usually a bunch of music files or compact discs containing music titles or full albums, but it can also include audiobooks, podcasts, sample-CDs etc.
The scale-degree of a tone as perceived by a listener. Not necessarily the same as the fundamental frequency of a tone. It has been shown in psychology that the perception of music tones is categorical.
alteration of pitch or frequency of sound, with or without adjusting at the same time the duration of the sound. In psychology, the name of an auditory phenomenon as a type of illusion.
Polyphonic Audio
Audio with multiple “voices” recorded into a single set of channels, i.e., music in stereo. While transformation of a single sound source, i.e., to amplify the volume of a singer is easy if having access to the vocal track(s), it is a lot more difficult (but not impossible) if the music is “mixed down” to a stereo format with other sound sources sounding simultaneously. Opposite to Monophonic Audio.
In digital signal processing: the process of approximating a continuous range of values (or a very large set of possible discrete values) into a relatively small set of discrete symbols or integer values. For example: an acoustic signal needs to be quantized in A/D-conversion because of the transformation from continuous to discrete time & amplitude values. In music: quantization is commonly understood as the alignment of a set of musical events (i.e., notes) to conform to a grid. For example: in a MIDI-sequencer, the dimensions of this grid are set beforehand. When one instructs the music application to quantize a certain group of MIDI notes in a song, the program automatically moves each note to the closest point on the grid. The same can be automatically done with audio. In a recording studio this is a useful tool in the correction of timing errors.
A finite interval on a given dimension.
Region/Segment List
A list of segments defined with a common segmentation criteria.
Region/Segment Tree
A region / segment list where every region may have related a list of subregions.
Gradual decay of a sound due to multiple echos reflecting from the many surfaces of an acoustic environment.
In statistics: a single item out of a larger data collection; i.e., one test person out of a population. In digital audio: When a sound is converted from analogue to digital format, the amplitude of the signal is sliced into many segments of binary information and analysed, these slices are measured per second (CD-audio=44,100 times/second). This would be the correlate of "pixels" of a visual image: the more "samples" taken per second, the higher the sound quality. In music: a sample is sound, a segment of a sound, or a music phrase or varying length. See Sampling.
In music: sampling is to take a portion of a recorded sound or a song, and then reuse it as an element of composition for a new song. This is typically done with a sampler devise: a piece of hardware or software which allows you to cut out the sample, and assign it to a specific key on a controller, like a keyboard for example. In Signal processing: the process of converting a signal from continuous to discrete time, typically though an A/D-converter (analogue-to-digital). In statistics: the selection of individuals and/or individual observations to participate in collecting data to yield some representative knowledge of a larger population or data set of concern.
A graduated range of values forming a standard system for measuring or grading something. In psychology: scales often refer to methods of classifying observations or performing evaluations. In music theory: the organization of pitches (or sounds) into an ordered set, ranging from lowest to highest or the opposite (chromatic, diatonic etc.). It is common to use terms such as scale-degree, fundamental key, etc. This is closely related also to the theory of harmony (tonal, modal or atonal etc.), and tuning-systems (well-tempered scales, just scales etc.).
A region on the time dimension, which is the most common region case.
The process to identify the temporary boundaries of a semantic meaningful region on a piece of music or a sequence of audio events.
Signal to Noise ratio (SNR)
In audio: The difference between the level of background noise (noise floor), and the level of signal, measured in dB. In information theory: the ratio between relevant information and “noise” in a collection of data.
Spectral Analysis
In audio content analysis (and signal processing in general) the most fundamental step in extracting useful data from digital audio. Any sound, or tone, contains a spectrum of partials. The frequencies of, and relationship between, these partials is the basis for what we perceive when listening. By means of spectral analysis, using different transforms, we can decompose these fundamental elements of sound. It has been proved that the ear/brain do “compute” a spectral analysis, and it has also been shown that any possible sound can be identically resynthesized based on spectrum analysis techniques.
Statistics and Probability Theory
Two mathematics disciplines interested in providing tools for modeling real-world phenomenons. However they differ greatly in their approach. Statistical theory is concerned with modeling real- world phenomena in an objective way through the study of the frequency with certain events happen. Probability Theory, is an extension of logic, that offer tools to deal with uncertainty and incomplete information. Both disciplines use the same, or similar, methods to build models of phenomenons, such as probability distribution functions. But the difference comes in the interpretation of such models: whereas statistics equates the model to the known physical properties of the described phenomenon, probability see model as the account of the available information about phenomenon, integrating any “physical existence” level to the model itself. Both play an important role in Audio Content Analysis.
A region that is defined within another one. It should be inside the super-region bounds but it should also have some semantic value or meaning.
Sound Synthesis
The way of generating sounds by the use of electronic instruments (oscillators, filters, envelope generators etc) and/or DSP techniques. There are a numerous techniques and different approaches available, ranging from analogue to digital synthesis methods; modulation-based to physical modeling approaches etc. Analysis-by-synthesis is also a way to get knowledge on the key-components of a sound that are important for perception.
The character or quality of a sound as distinct from its pitch and loudness, i.e. two voices can have the same pitch but the “sound” of it is perceived as completely different; a trumpet or an electric guitar can play the same pitch with a multitude of different “timbre”. Timbre-parameters (i.e., MFCC) are an essential part of audio content analysis, such as in sound source recognition or genre classification.
Alteration of the duration of the sound, with or without “stretching” the perceived “articulation” or “unnaturally” altering the timbre of the sound.
A short noise burst. In sound engineering commonly used as an overall term for artifacts that should not be present in the signal. In sound analysis, the term refers to the noisy onset-part of a sound carrying important perceptual cues on its sound source. In perception: A click is a particular case of transients, since it may not be “noisy” at all. A click can be a short segment (less than 20 ms) of a periodic waveform, but with such a short duration that it is perceived as a transient (and not as a tone or periodic waveform). Click trains are important in studies on pitch perception and cognition.
In an electronic instrument or audio editing environment referred to as different characteristic audio wave-shapes of the audio (sine-wave, sawtooth, square-wave etc).
Zero Crossing
Both sounds and electrical signals are primarily oscillations, and they oscillate around an equilibrium or axis known as the "zero crossing point", where a negative signal crosses over into being a positive signal, or vice versa. It is also sometimes referred to as the "null point". Zero Crossing Rate (ZCR) is a valuable low-level feature extractor, i.e., providing information about how noisy or periodic the sound is.