Funded By:

Ministerio de Economía y Competitividad







ASLP-MULAN: Audio, speech and language processing for multimedia analytics


Society moves motivated by a lot of influences from fashions to built tendencies. Moreover, nowadays this movement is also highly modulated by the instant exchange of information fostered by social media far beyond TVs and radios. Internet social media sharing opportunities have reached a high percentage of the population and it is crucial, not only for the companies but for all economic and administration drivers in general, to know about the opinions, reputation feeling, political polarities and tendencies auto induced inside the social media. Having this information is relevant to drive new marketing policies and also have high relevance for security and defense in other contexts.

This relevance has been already detected by some companies that offer market surveys and of reputation studies acquired in the social media by products, companies and other entities as political parties and administrations. The study is mostly based on costly polls and superficial hand analysis (as opposed to automatic) on a sampling on some limited sources using simple criteria in the analysis and we believe that they need a technological impulse to improve their capacities.

Big data science community has begun to apply their specific abilities to these data content analysis. In parallel, some audio, speech and language automatic technologies are available or gaining enough degree of maturity as to be able to help to this objective: automatic speech translation, query by spoken example, spoken information retrieval, natural language processing, unstructured multimedia contents transcription and description, multimedia files summarization, spoken emotion detection and sentiment analysis, speech and text understanding, etc. They seem to be worthwhile to be joined and put at work on automatically captured data streams coming from several sources of information like YouTube, Facebook, Twitter, online newspapers, web search engines, etc. to automatically generate reports that include both scientific based scores and subjective but relevant summarized statements on the tendency analysis and the perceived satisfaction of a product, a company or another entity by the general population.

Our intention is working in this direction and generating the right mixture of audio, speech and language technologies with big data ones as to be able to offer it to both the analytics companies interested in this crucial arena, improving their capacity to offer their services with increased quality, accuracy and usability of their reports, and also directly to companies or administrations willing to gain this information on they own via deploying our new solutions in their marketing or intelligence departments.

Research groups supporting this project have extensive experience in speech technology, text analytics and multimedia processing. Although they all have a comprehensive understanding of the different components needed for the project, there is a diversified specialization among them, which makes cooperation necessary for the development of a project like this. This cooperation has already taken place in a successful way with the development of several previous joint projects, including the TIMPANO coordinated project (TIN2011-28169-05), where several technological bases on which we propose this new project already appeared.