Automatic feature extraction
The process of extracting multilevel features from a musical piece or a sound sample, using automated algorithms. Also known as Audio Content Analysis.
Automatic search
A type of search which uses a topic as input, and outputs results with no human intervention at all.
Classification Scheme
A set of classes to be considered for a classification.
Collaborative filtering
Term applied to information retrieval systems (such as Music Recommendation systems) that rely on humans to extract descriptors for profiling items.
Corpus

The set of documents a search system operates on.
Confusion matrix
A 2x2 matrix (A,A’,B,B’), computed from the results of a search system, where A is the number of retrieved documents known to be relevant, A’ the number of retrieved documents known to be irrelevant, B the number non-retrieved documents known to be relevant, and B’ the number of non-retrieved documents known to be irrelevant.
Content-based filtering
Term applied to information retrieval systems that rely on machine descriptor extraction procedures for profiling items.
Data Mining
Analysis of data in a database using tools which look for trends or anomalies without knowledge of the meaning of the data. This “pattern uncovering” can be interpreted as “knowledge discovery” since it may lead humans or software components to grasp the meaning of the data.
(1) A set of data that compactly represents a feature.
(2) In MPEG-7, a descriptor is an atomar unit of a metadata description. Sets of related descriptors form description schemes.
Description Scheme
A set of descriptors to be considered for a concrete task.
Digital Watermarking
A system originally developed for copyright protection of audiovisual content, which is embedding hidden copyright notices or verification messages to digital audio, video or image signals. The issue is to insert the hidden watermark signal without changing the original signal to a perceptually great extent (this is possible in many cases due to perceptual masking effects). However, watermarking has been shown quite vulnerable to hacking, while acoustic fingerprinting is not. See also Acoustic fingerprinting.
The unit in information retrieval, that is, the most elementary information addressed by a search system. Examples of documents include Web pages, images, sound tracks, videos.
DRM (Digital Rights Management)
Controlling mechanisms for exchanging intellectual property in digital form over the Internet or other electronic media. Basically, DRM is an encryption distribution scheme with built-in payment methods. Content is encoded, and to decode it a user must do something like supply a credit card, or provide an e-mail address, to gain access to the item. Content owners set the conditions.
Enumerated Label
A text based annotation for a song, region or marker whose values are among a closed set of values.
Extraction hint
Extra information provided to an automated feature extraction process in order to improve the results or achieve them faster.
A property derived from a part of audiovisual data that can be used to describe and represent the AV data. (Also a function applied on a document by a search system during its processing.) One usually distinguishes between low-level features (such as dominant colour, colour histogram, motion, etc.) and high-level features (such as face recognizers, people walking, etc.)
Feature level
The degree of abstraction for a given feature. Feature level scale depends on the target application and user.
An information supplied by a user to an information system, which rates how much a retrieved document satisfies his expectation.
Global analysis
A reference to information retrieval methods which retrieve documents by analyzing a collection in its whole, and in no case in independent parts. Opposite: local analysis.
In information retrieval: the purpose that guides a user while querying the system. This goal can be completely encoded in the query itself – e.g. a list of constraints – or might be implicitly defined by the context where the querying action takes place.
High level feature
A feature directly related to semantic information about the content and therefore meaningful for the end-user. High level features are usually extracted by combining low- and mid-level features and adding contextual information.
Indexed items
The items classified or indexed by the information retrieval system, i.e. only those both stored and classified in an item database, but not those that are stored but not (yet) indexed by the system.
Inference network
A particular information retrieval model, in which documents, queries, and features are nodes of a Bayesian network.
Information Retrieval Model
The set of rules used by a retrieval system to rank and retrieve documents. These may include some low-level or high-level features, and possibly, the user’s feedback.
Information Retrieval
The part of computer science interested in the study of retrieval of information from a given collection, that is, the relevancy of these documents with respect to the expectations of a user or of a set of users.
Intelligent Agent
In computer science a software agent that exhibits some form of artificial intelligence. In data mining often referred to as bots, and are often based on fixed pre-programmed rules and/or have the ability to adapt and learn.
Interactive search
A type of search which uses a human formulated query as input, produces results automatically, and allows the user to reformulate his query upon consultation of these results.
Item filtering
In data mining the action of mapping a subset of the indexed items with the user's goals or expectations.
Item profile
Item profiles are built on descriptors. These descriptors are expected to convey the “item content” concept.
A text-based annotation that can be attached to a media item, a segment or a marker with a concrete description role. ie. the title, the discography etc.
Local analysis
A reference to information retrieval methods able to analyze a collection part by part, and independently. Opposite: global analysis.
Local motion
Motion calculated for blocks, regions or small pixel patches, that only takes the direct neighbourhood into account. The result is typically a motion vector field, that describes the displacement of every pixel or block of pixels.
Low-level feature
Low-level features are extracted directly from the digital representation, such as directly on the very most basic frequency, amplitude and time domains. From these data, further analysis normally takes place and information is extracted up to high-level features, where the information has semantic meaning to end-users.
Manual Annotation
The process to manually add/modify the description on a media item.
Manual search
A type of search which uses a human formulated query as input, and produces results automatically, with no human intervention.
The semantic information that is bundled with the digital support of a given media item, e.g. ID3 tags, EXIF, the serial number of a CD or any other additional information that can be read in a digital way. Some metadata can also be automatically extracted by means of Content Analysis.
Mid level feature
Mid level features are extracted from one or many combined low level descriptors. They are not necessarily meaningful for the end-user although they may be interpreted depending on the user's background knowledge.
MIR (Music Information Retrieval)
The field within Information Retrieval which specially studies the way music is indexed, stored, organized, query and retrieval, display etc. in Search Engines and Information Retrieval systems. An international conference "ISMIR" is dedicated to Music Information Retrieval, and takes place in different countries every year.
MPEG-7 (ISO/IEC 15938, formally named “Multimedia Content Description Interface”) is a standard for describing multimedia content, independent of the encoding of the content, and allows different levels of granularity of the description. MPEG-7 has been designed to support a broad range of applications. MPEG-7 descriptions can be represented either as XML (textual format, TeM) or in a binary format (binary format, BiM).
Optical Character Recognition (OCR)
The recognition and extraction of text printed in still images or video frames, and by language extension, systems that perform this task.
Set of ordered music pieces or media files ready to be played by the user.
The proportion of correctly retrieved documents amongst all those retrieved by a search system. This corresponds to A/(A+B) from the definition of the confusion matrix. See also: recall.
The formulation of a user’s expectations in terms of material and relations supported by a search system. A query may include text, images, video sequences, or any boolean combination of them (similar to such image but without this text, or with such keyframe, etc.).
Query by example
A particular type of query in which a search system formalizes the user’s expectations from one or several examples provided by him, and possibly their related feedback.
The part of retrieved documents known to be relevant amongst all those retrieved by a search system. This corresponds to A/(A+A’) from the definition of the confusion matrix. See also: precision.
A measure which indicates how much a user thinks a document satisfies his expectation.
Search engine
The part of software responsible for processing queries in an information retrieval system.
In Machine Learning context this adjective is used to refer to those systems that are able to learn by themselves, since its performance critic is part of the system itself. The algorithms used for achieving this kind of systems are usually labeled with the term Unsupervised Learning.
Semantic indexing
A one-to-many correspondence between each document of a collection and a set of high-level concepts.
A statistical measurement which is inversely proportional to distance. For example, if two patterns are compared yielding a small distance, then the patterns would exhibit a large (or high degree of) similarity
The process to build a summarized version of a song or a sequence of audio(visual) events so that a human can recognize the original song or scene within a shorter amount of time.
User profile
In information retrieval systems (Music Recommendations): By analyzing human behaviour with music & audio collections, as well as the content analysis of audio itself, a profile of preferences can be made. This is an approach to convey the “music taste” concept. Setting up a user profile is important in order to i.e., “personalize” information retrieval.