This site will look much better in a browser that supports web standards, but it is accessible to any browser or Internet device.
|
|
Towards Intelligent Assembly of Media Assets for Automated Character Animation Paper (pdf) M. Hausenblas, R. Mörzinger, P. Hofmair, W. Haas (JRS) 1st Workshop on Multimedia Annotation and Retrieval enabled by Shared Ontologies (December 2007, Genova, Italy) Creating character animations manually is an expensive and laborious task. In this work we analyse the current, manual workflow of creating character animations. We derive requirements for an automated process, and propose to utilise linked open datasets for context management, along with ontologies to assemble and reuse character animations. First experiences with the prototypical implementation of the context manager are reported. |
|
|
TRECVid 2007 - High Level Feature Extraction Experiments at JOANNEUM RESEARCH Paper (pdf) R. Mörzinger, G. Thallinger (JRS) TRECVid Evaluation Workshop (November 2007, Gaithersburg, USA) This paper describes our experiments for the high level feature extraction task in TRECVid 2007. We submitted the following five runs:
Our submission made use of support vector machines based on a variety of image and video features. The results of the experiments show that four out of five runs achieved a performance above the TRECVid median, including a run with 18 out of 20 evaluated high level features equal or above the median compared with inferred average precision. The mean inferred average precision of our baseline run is 0.056. Early fusion performed slightly better than late fusion on average, although the latter produced more scores above the TRECVid median. The experiment on concept correlation generally impaired the performance and outscored the baseline only for a few features. Heuristic low-level feature combinations displayed a rather poor performance. We assume that the good baseline is due to the effective grounding of a variety of low-level visual features and the generalization capability of the SVM framework with high-dimensional feature spaces. |
|
Why Real-World Multimedia Assets Fail to Enter the Semantic Web Paper (pdf), Presentation (pdf) T. Bürger (LFUI), M. Hausenblas (JRS) Semantic Authoring, Annotation and Knowledge Markup Workshop (October 2007, Whistler, Canada) Making multimedia assets on the one hand first-class objects on the Semantic Web, while keeping them on the other hand conforming to existing multimedia standards is a non-trivial task. Most proprietary media asset formats are binary, optimized for streaming or storage. However, the semantics carried by the media assets are not accessible directly. In addition, multimedia description standards lack the expressiveness to gain a semantic understanding of the media assets. There exists an array of requirements regarding media assets and the Semantic Web, already. Based on a critical review of these requirements we investigate how ontology languages fit into the picture. We finally analyse the usefulness of formal accounts to describe spatio-temporal aspects of multimedia assets in a practical context. |
|
|
The Need for Formalizing Media Semantics in the Games and Entertainment Industry Paper (pdf) T. Bürger (LFUI), H. Zeiner (JRS) I-MEDIA '07 - 1st International Conference on New Media Technology (September 2007, Graz, Austria) The digital media and games industry is one of the biggest IT based industries worldwide. Recent observations therein showed that current production workflows may be potentially improved as multimedia objects are mostly created from scratch due to insufficient reusability capacities of existing tools. In this paper we provide reasons for that, provide a potential solution based on semantic technologies, show the potential of ontologies, and provide scenarios for the application of semantic technologies in the digital media and games industry. |
|
|
Annotating Music Collections: How content-based similarity helps to propagate labels Paper (pdf) M. Sordo, C. Lauriel, O. Celma (UPF) ISMIR 2007 - 8th International Conference on Music Information Retrieval (September 2007, Vienna, Austria) In this paper we present a way to annotate music collections by exploiting audio similarity. Similarity is used to propose labels (tags) to yet unlabeled songs, based on the content-based distance between them. The main goal of our work is to ease the process of annotating huge music collections, by using content-based similarity distances as a way to propagate labels among songs. We present two different experiments. The first one propagates labels that are related with the style of the piece, whereas the second experiment deals with mood labels. On the one hand, our approach shows that using a music collection annotated at 40% with styles, the collection can be automatically annotated up to 78% (that is, 40% already annotated and the rest, 38%, only using propagation), with a recall greater than 0.4. On the other hand, for a smaller music collection annotated at 30% with moods, the collection can be automatically annotated up to 65% (e.g. 30% plus 35% using propagation). |
|
|
Discriminating Expressive Speech Styles by Voice Quality Parameterization Paper (pdf) C. Monzo, F. Alías, I. Iriondo, X. Gonzalvo, S. Planet (URL) ICPhS07 - International Congress of Phonetic Sciences (August 2007, Saarbrücken, Germany) In this work, the capability of voice quality parameters to discriminate among different expressive speech styles is analyzed. To that effect, the data distribution of these parameters, directly measured from the acoustic speech signal, is used to train a Linear Discriminant Analysis that conducts an automatic classification. As a result, the most relevant voice quality patterns for discriminating expressive speech styles are obtained for a diphone and triphone Spanish speech corpus with five expressive speaking styles: neutral, happy, sad, sensual and aggressive. |
|
|
Expressive Speech Corpus Validation by Mapping Subjective Perception to Automatic Classification Based on Prosody and Voice Quality Paper (pdf) I. Iriondo, S. Planet, F. Alías, J.C. Socoró, F. Alías, C. Monzo, E. Martínez, E. (URL) ICPhS07 - International Congress of Phonetic Sciences (August 2007, Saarbrücken, Germany) This paper presents the validation of the expressive content of an acted corpus produced to be used in speech synthesis, due to this kind of emotional speech can be rather lacking in authenticity. The goal is to obtain an automatic classifier able to prune the bad utterances - from an expressiveness point of view. The results of a previous subjective test are used for training a multistage emotional identification system based on statistical features computed from the speech prosody and voice quality. Finally, the system provides a set of utterances to be checked and definitely eliminated if appropriate. |
|
Task-Based Mood Induction Procedures for the Elicitation of Natural Emotional Responses B. Vaughan, S. Kousidis, and Ch. Cullen (DIT) CCCT 2007 - The 5th International Conference on Computing, Communications and Control Technologies (July 2007, Orlando, USA) |
|
|
Validation of an Expressive Speech Corpus by Mapping Automatic Classification to Subjective Evaluation Book chapter (from Springer) I. Iriondo, S. Planet, J.C. Socoró, F. Alías, E. Martínez (URL) IWANN 2007 - 9th International Work-Conference on Artificial Neural Networks (June 2007, San Sebastián, Spain) This paper presents the validation of the expressive content of an acted corpus produced to be used in speech synthesis. The use of acted speech can be rather lacking in authenticity and therefore its expressiveness validation is required. The goal is to obtain an automatic classifier able to prune the bad utterances -with wrong expressiveness-. Firstly, a subjective test has been conducted with almost ten percent of the corpus utterances. Secondly, objective techniques have been carried out by means of automatic identification of emotions using different algorithms applied to statistical features computed over the speech prosody. The relationship between both evaluations is achieved by an attribute selection process guided by a metric that measures the matching between the misclassified utterances by the users and the automatic process. The experiments show that this approach can be useful to provide a subset of utterances with poor or wrong expressive content. |
|
|
Extracting User Preferences by GTM for aiGA Weight Tuning in Unit Selection Text-to-Speech Synthesis Book chapter (from Springer) Ll. Formiga, F. Alías (URL) IWANN 2007 - 9th International Work-Conference on Artificial Neural Networks (June 2007, San Sebastián, Spain) Unit-selection based Text-to-Speech synthesis systems aim to obtain high quality synthetic speech by selecting previously recorded units. These units are selected by a dynamic programming algorithm guided through a weighted cost function. Weights should be tuned by means of perception from listening users to obtain proper quality. In previous works we have proposed to subjectively tune these weights through an interactive evolutionary process, also known as Active Interactive Genetic Algorithm. The problem comes out when different users, although being consistent, evolve to different weight configurations. In this proof-of-principle work, we introduce GTM as a method to extract knowledge from user specific preferences. The experiments show that GTM is able to capture user preferences, thus, avoiding selecting the best evolved weight configuration by means of a new preference test. |
|
|
Enhancing CBIR Through Feature Optimization, Combination and Selection Paper (pdf, available to IEEE subscribers) X. Hilaire, J. Jose (UG) CBMI 2007. International Workshop on Content-Based Multimedia Indexing (June 2007, Bordeaux, France) We present a Content-Based Image Retrieval (CBIR) method based on the combination and selection of several image features. The novelty of our approach over existing methods is threefold: we provide a statistical optimization of the similarity distance for each feature; we replace certain features by a selection in a non-linear expansion of them; and we perform a linear combination of the features. We demonstrate superior capabilities of our method in certain cases over support vector machines (SVM) on a COREL image collection. |
|
|
Simulated testing of an adaptive multimedia information retrieval system Paper (pdf) F. Hopfgartner, J. Urban, R. Villa, J. Jose (UG) CBMI 2007. International Workshop on Content-Based Multimedia Indexing (June 2007, Bordeaux, France) The Semantic Gap is considered to be a bottleneck in image and video retrieval. One way to increase the communication between user and system is to take advantage of the user’s action with a system, e.g. to infer the relevance or otherwise of a video shot viewed by the user. In this paper we introduce a novel video retrieval system and propose a model of implicit information for interpreting the user’s actions with the interface. The assumptions on which this model was created are then analysed in an experiment using simulated users based on relevance judgements to compare results of explicit and implicit retrieval cycles. Our model seems to enhance retrieval results. Results are presented and discussed in the final section. |
|
|
HMM-Based Spanish Speech Synthesis Using CBR as F0 Estimator Paper (pdf) X. Gonzalvo, I. Iriondo, J.C. Socoró, F. Alías, C. Monzo (URL) NOLISP 2007 - An ISCA Tutorial and Research Workshop on NOn LInear Speech Processing (May 2007, Paris, France) Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is a technique for generating speech from trained statistical models where spectrum, pitch and durations of basic speech units are modelled altogether. The aim of this work is to describe a Spanish HMM-TTS system using CBR as a F0 estimator, analysing its performance objectively and subjectively. The experiments have been conducted on a reliable labelled speech corpus, whose units have been clustered using contextual factors according to the Spanish language. The results show that the CBR-based F0 estimation is capable of improving the HMM-based baseline performance when synthesizing nondeclarative short sentences and reduced contextual information is available. |
|
|
Objective and Subjective Evaluation of an Expressive Speech Corpus I. Iriondo, S. Planet, J.C. Socoró, F. Alías (URL) NOLISP 2007 - An ISCA Tutorial and Research Workshop on NOn LInear Speech Processing (May 2007, Paris, France) This paper presents the validation of the expressiveness of an acted oral corpus produced to be used in speech synthesis. Firstly, an objective validation has been conducted by means of automatic emotion identification techniques using statistical features extracted from the prosodic parameters of speech. Secondly, a listening test has been performed with a subset of utterances. The relationship between both objective and subjective evaluations is analyzed and the obtained conclusions can be useful to improve the following steps related to expressive speech synthesis. |
|
VAMP: Semantic Validation for MPEG-7 Profile Descriptions Technical Report (pdf) R. Troncy (Centrum voor Wiskunde en Informatica), W. Bailer, M. Hausenblas, M. Höffernig (JRS) Technical report published by Centrum voor Wiskunde en Informatica, INS - Information Systems (April 2007, Amsterdam, Netherlands) MPEG-7 can be used to create complex and comprehensive metadata descriptions of multimedia content. Since MPEG-7 is defined in terms of an XML schema, the semantics of its elements has no formal grounding. In addition, certain features can be described in multiple ways. MPEG-7 profiles are subsets of the standard that apply to specific application areas and that aim to reduce this syntactic variability, but they still lack formal semantics. We propose an approach for expressing the semantics explicitly by formalizing the constraints of various profiles using ontologies and logical rules, thus enabling interoperability and automatic use for MPEG-7 based applications. We have implemented VAMP, a full semantic validation service that detects any inconsistencies of the semantic constraints formalized. Another contribution of this paper is an analysis of how MPEG-7 is practically used. We report on experiments about the semantic validity of MPEG-7 descriptions produced by numerous tools and projects and we categorize the most common errors found. |
|
|
Prosody Modelling of Spanish for Expressive Speech Synthesis i. Iriondo, J.C. Socoró, F. Alías (URL) ICASSP'07 - International Conference on Acoustic, Speech, and Signal Processing (April 2007, Hawaii, USA) This paper presents the use of analogical learning, in particular case-based reasoning, for the automatic generation of prosody from text, which is automatically tagged with prosodic features. This is a corpus-based method for quantitative modelling of prosody to be used in a Spanish text to speech system. The main objective is the development of a method for predicting the three main prosodic parameters: the fundamental frequency (F0) contour, the segmental duration and energy. Both objective and subjective experiments have been conducted in order to evaluate the accuracy of our proposal. |
|
Content-Based Audio Search: From Fingerprinting to Semantic Audio Retrieval Dissertation (pdf) P. Cano (UPF) Dissertation at the Pompeu Fabra University (2007, Barcelona, Spain) This dissertation is about audio content-based search. Specifically, it is on exploring promising paths for bridging the semantic gap that currently prevents wide deployment of audio content-based search engines. Music search sound engines rely on metadata, mostly human generated, to manage collections of audio assets. Even though time-consuming and error-prone, human labeling is a common practice. Audio content-based methods, algorithms that automatically extract description from audio files, are generally not mature enough to provide the user friendly representation that users demand when interacting with audio content. Mostly, content-based methods provide low-level descriptions, while high-level or semantic descriptions are beyond current capabilities. |
|
Spectral Processing of the Singing Voice Dissertation (pdf) A. Loscos (UPF) Dissertation at the Pompeu Fabra University (2007, Barcelona, Spain) This dissertation is centered on the digital processing of the singing voice, more concretely on the analysis, transformation and synthesis of this type of voice in the spectral domain, with special emphasis on those techniques relevant for music applications. The digital signal processing of the singing voice became a research topic itself since the middle of last century, when first synthetic singing performances were generated taking advantage of the research that was being carried out in the speech processing field. Even though both topics overlap in some areas, they present significant differentiations because of (a) the special characteristics of the sound source they deal and (b) because of the applications that can be built around them. More concretely, while speech research concentrates mainly on recognition and synthesis; singing voice research, probably due to the consolidation of a forceful music industry, focuses on experimentation and transformation; developing countless tools that along years have assisted and inspired most popular singers, musicians and producers. The compilation and description of the existing tools and the algorithms behind them are the starting point of this thesis. |
|
|
SALERO: Semantic Audiovisual Entertainment Reusable Objects Paper (pdf), Poster (pdf) W. Haas, G. Thallinger (JRS), P. Cano (UPF), Ch. Cullen (DIT), T. Bürger (LFUI) 1st International Conference on Semantic and Digital Media Technologies - SAMT 2006 (December 2006, Athens, Greece) The Integrated Project SALERO aims to advance the state of the art in digital media to the point where it becomes possible to create audiovisual content for cross-platform delivery using intelligent content tools, with greater quality at lower cost, to provide audiences with more engaging entertainment and information at home or on the move. SALERO will build on and extend research in media technologies, web semantics and context based image retrieval, to reverse the trend toward ever-increasing cost of creating media. |
|
Modelado y estimación de la prosodia mediante razonamiento basado en casos (Modelling and Estimation of Prosody by Means of Case-Based Reasoning) Paper (pdf, Spanish language) I. Iriondo, J.C. Socoró, L. Formiga, X. Gonzalvo, F. Alías, P. Miralles (URL) IV Jornadas en Tecnología del Habla (November 2006, Zaragoza, Spain) This paper presents the use of analogical learning, in particular case-based reasoning, for the automatic generation of prosody from text, which is automatically tagged with prosodic features. This is a corpus-based method for quantitative modeling of prosody to be used in a Spanish text to speech system. The main objective is the development of a method for predicting the three main prosodic parameters: the fundamental frequency (F0) contour, the segmental duration and energy. Both objective and subjective experiments have been conducted in order to evaluate the accuracy of our proposal. |
|
Estudio de Heurísticas para la implementación de A* en CTH basados en selección de unidades (Heuristics for Implementing the A* Algorithm for Unit Selection TTS Synthesis Systems) Paper (pdf, Spanish language) L. Formiga, F. Alías (URL) IV Jornadas en Tecnología del Habla (November 2006, Zaragoza, Spain) The Unit Selection based Text to Speech Systems (USTTS) need to perform an optimal search of units in a speech-corpus, hence in order to obtain a high-quality synthesis. This search, until nowadays, has been carried out by a Viterbi algorithm. Our work replaces the formerly used algorithm for the A* algorithm to enhance its computational efficiency. With that goal, a review of previous work that intend this substitution is detailed. Afterwards, a benchmark is defined to score its efficiency and results are analyzed to validate, in the last step, its theoretical argumentation. |
|
|
Generation of High Quality Audio Natural Emotional Speech Corpus using Task Based Mood Induction Paper (pdf) Ch. Cullen, B. Vaughan, S. Kousidis, Y. Wang, C. McDonnell, D. Campbell (DIT) 1st International Conference on Multidisciplinary Information Sciences and Technologies (October 2006, Mérida, Spain) Detecting emotional dimensions in speech is an area of great research interest, notably as a means of improving human computer interaction in areas such as speech synthesis. In this paper, a method of obtaining high quality emotional audio speech assets is proposed. The methods of obtaining emotional content are subject to considerable debate, with distinctions between acted and natural speech being made based on the grounds of authenticity. Mood Induction Procedures (MIP’s) are often employed to stimulate emotional dimensions in a controlled environment. This paper details experimental procedures based around MIP 4, using performance related tasks to engender activation and evaluation responses from the participant. Tasks are specified involving two participants, who must co-operate in order to complete a given task within the allotted time. Experiments designed in this manner also allow for the specification of high quality audio assets (notably 24bit/192Khz), within an acoustically controlled environment, thus providing means of reducing unwanted acoustic factors within the recorded speech signal. Once suitable assets are obtained, they will be assessed for the purposes of segregation into differing emotional dimensions. The most statistically robust method of evaluation involves the use of listening tests to determine the perceived emotional dimensions within an audio clip. In this experiment, the FeelTrace rating tool is employed within user listening tests to specify the categories of emotional dimensions for each audio clip. |
|
|
The Use of Task Based Mood-Induction Procedures to Generate High Quality Emotional Assets. Poster (pdf) B. Vaughan, Ch. Cullen, S. Kousidis , Y. Wang , C. McDonnell, D. Campbell (DIT) IT&T - Information Technology and Telecommunications Conference (October 2006, Carlow, Ireland) Detecting emotion in speech is important in advancing human-computer interaction, especially in the area of speech synthesis. This poster details experimental procedures based on Mood Induction Procedure 4, using performance related tasks to engender natural emotional responses in participants. These tasks are aided or hindered by the researcher to illicit the desired emotional response. These responses will then be recorded and their emotional content graded to form the basis of an emotional speech corpus. This corpus will then be used to develop a rule-set for basic emotional dimensions in speech. |
|
Groovator - An Implementation of Real-Time Rhythm Transformations Paper (pdf) J. Janer, J. Bonada, S. Jordà (UPF) 121st AES Convention (October 2006, San Francisco, USA) This paper describes a real-time system for rhythm manipulation of polyphonic audio signals. A rhythm analysis module extracts information of tempo and beat location. Based on this rhythm information, we apply different transformations: Tempo, Swing, Meter and Accent. This type of manipulation is generally referred as Content-based transformations. We address characteristics of the analysis and transformation algorithms. In addition, user interaction plays also an important role in this system. Tempo variations can be controlled either by tapping the rhythm with a MIDI interface or by using an external audio signal such as percussion or the voice as tempo control. We will conclude pointing out several use-cases, focusing on live performance situations. |
|
Esophageal Voice Enhancement by Modeling Radiated Pulses in Frequency Domain Paper (pdf) A. Loscos, J. Bonada (UPF) 121st AES Convention (October 2006, San Francisco, USA) Altough esophageal speech has demonstrated to be the most popular voice recovering method after laryngectomy surgery, it is difficult to master and shows a poor degree of intelligibility. This article proposes a new method for esophageal voice enhancement using speech digital signal processing techniques based on modeling radiated voice pulses in frequency domain. The analysis-transformation-synthesis technique creates a non-pathological spectrum for those utterances featured as voiced and filters those unvoiced. Healthy spectrum generation implies transforming the original timbre, modeling harmonic phase coupling from the spectral shape envelope, and deriving pitch from frame energy analysis. Resynthesized speech aims to improve intelligibility, minimize artificial artifacts, and acquire resemblance to patient’s pre-surgery original voice. |
|
A Corpus with Teeth Presentation (pdf) D. Campbell, M. Meinardi, B. Richardson, C. Mcdonnell (DIT) EUROCALL Conference (September 2006, Granada, Spain) ReCALL Journal (Vol 19, No. 1, January 2007, University of Hull, United Kingdom) This paper outlines the ongoing construction of a speech corpus for use by applied linguists and advanced EFL/ESL students. The first section establishes the need for improvements in the teaching of listening skills and pronunciation practice for EFL/ESL students. It argues for the need to use authentic native-to-native speech in the teaching/learning process so as to promote social inclusion and contextualises this within the literature, based mainly on the work of Swan, Brown and McCarthy. The second part addresses features of native speech flow which cause difficulties for EFL/ESL students (Brown, Cauldwell) and establishes the need for improvements in the teaching of listening skills. Examples are given of reduced forms characteristic of relaxed native speech, and how these can be made accessible for study using the Dublin Institute of Technology’s slow-down technology, which gives students more time to study native speech features, without tonal distortion. The final section introduces a novel Speech Corpus being developed at DIT. It shows the limits of traditional corpora and outlines the general requirements of a Speech Corpus. This tool - which will satisfy the needs of teachers, learners and researchers - will link digitally recorded, natural, native-to-native speech so that each transcript segment will be linked to its associated sound file. Users will be able to locate desired speech strings, play, compare and contrast them - and slow them down for more detailed study. |
|
|
A Pitch Marks Filtering Algorithm based on Restricted Dynamic Programming Paper (pdf) F. Alías, C. Monzo, J.C. Socoró (URL) InterSpeech2006 -International Conference on Spoken Language Processing (ICSLP) (September 2006, Pittsburgh, USA) In this paper, a generic pitch marks filtering algorithm (PMFA) is introduced in order to achieve reliable and smooth pitch marks from any input pitch tracking or marking algorithm. The proposed PMFA is a simple yet effective filtering process based on restricted dynamic programming, but very helpful for minimizing human intervention when creating large speech corpora. Moreover, this work introduces a novel pitch marking evaluation measure for directly comparing pitch marking algorithms with different location criteria. The experiments demonstrate that the proposed PFMA improves the results of the input state-of-the-art pitch tracking and marking algorithms dramatically. |
|
|
Current Perspectives on Music Technologies & Multimedia Presentation (pdf) G. Holmberg (UPF) ENGAGE 2006 (September 2006, Jakarta, Indonesia) Within a near future, when the analogue radio & TV net is closed down, we will most probably have in our home some kind of digital Home Entertainment Platform/Media Center. And even to a greater extent than today, we will carry with us portable media players & storage devises. A true digital revolution will radically alter our behavior with multimedia objects, such as music & audio. We will have constant access to Internet, with all music & media of all times and origins available. This will necessarily require on the one hand new and advanced methods of search & retrieval. This is the field of MIR (Music Information Retrieval) and Audio Content Analysis. And on the other hand, we have the field of Audio Transformation & Synthesis: you will no longer be restricted to only download & passively press "play". You will be able to interact with media objects, such as play the song in a different key; or slower/faster; suppress vocals and sing-along & you will be able to remix & play around with music and broadcast yourself and easily create new, personalized "versions" of the media object. We believe that the boundary between professional audio & media creation technology and home-entertainment is just about to merge, into an explosion of breath-taking technological developments & human creative power. |
|
|
Transcripción fonética de acrónimos en castellano utilizando el algoritmo C4.5 (Phonetic Transcription of Spanish Acronyms by using C4.5 algorithm) Paper (pdf, Spanish language) C. Monzo, F. Alías, J.A. Morán, X. Gonzalvo (URL) XXII Congreso de la SEPLN (September 2006, Zaragoza, Spain) This work presents an automatic acronyms transcription system in order to increase the synthetic speech quality of text-to-speech systems, in the presence of acronyms in the input text. The acronyms transcription is conducted by using a decision tree (C4.5 algorithm). The work presents the results obtained for different algorithm configurations, validating its performance with respect to other learning systems. |
|
|
Letting the Corpus Speak Presentation (pdf) D. Campbell (DIT) IVACS - Inter Varietal Corpus Studies (June 2006, Limerick, Ireland) This presentation outlines the current state of development of DIT’s nascent speech corpus. This will allow a body of spoken material to be searched for features of informal native speech via a normalised transcription. Once located, the original sound files can be played at normal speed or slowed down in order to better study the speech act itself. That this aspect of language learning has been neglected for decades has frequently been lamented by natural language specialists such as Richard Cauldwell. |
|
Let the Corpus Speak! Presentation (pdf) D. Campell (DIT) 40th IATEFL Annual Conference and Exhibition (April 2006, Harrogate, United Kingdom) This presentation contrasts existing corpora with the novel Speech Corpus being developed at DIT. It points up the limits of existing - written, and even spoken - corpora and outline the general requirements of a Speech Corpus. This tool - which will satisfy the needs of teachers, learners and researchers- will link digitally recorded, natural, native-native speech acts (in WAV format) with their idealised, orthographic transcriptions. The transcriptions can be fed through a concordancer, with each transcript segment linked to its associated sound file. The segments will also be be tagged for speed of delivery, which will allow users to locate the desired speech strings, play them, compare and contrast them, and - if necessary - slow them down for more detailed study. |
|
|
SALERO: Semantic Audiovisual Entertainment Reusable Objects Abstract (pdf), Poster (pdf) W. Haas, G. Thallinger (JRS) 2nd European Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies (November 2005, London, United Kingdom) Ever since the idea of convergence was floated, the media industry has been talking about cross-platform exploitation as a way of producing more exciting content more cost-effectively. But while technology has helped to produce better quality sounds and images, the costs continue to rise. It is virtually impossible to re-use items from previous productions (regardless of issues of copyright) in different contexts, as the majority of sounds and images only work in the context and media type for which they were originally made. |