Documenting CEEMID For Open Data Use


CEEMID is a data integration system that could provide to be a model and starting point of building a European Music Observatory based on open data, open-source software using best statistics, data science and AI practices. CEEMID has created thousands of high-value, hard music industry indicators using open data sources, industry data sources, surveys and various APIs to relevant other data sources.

CEEMID is aiming to transfer thousands of indicators and a verifiable, open-source software that creates them to the European Music Observatory to give Europe-wide access timely, reliable, actionable statistics and indicators for the music industry, policymakers and music professionals. (Read more about our data coverage and our pan-European geographical coverage.)

With the help of Consolidated Independent, CEEMID created the Central European Music Industry Report - online, pdf and epub versions 2020, which provides a rich illustration on European cross-country research to provide the music industry businesses and music policymakers with rich data and high-quality indicators for analysis and AI for the innovation pillar of the European Music Observatory by creating input data on the Music Economy of Europe, Music Diversity and Circulation and on Music, Society & Citizenship.

CEEMID is a private initiative. We developed data products for the needs of the industry: to calculate adequate royalty price (tariff) levels, to calculate adequate compensation for damages, to design better grants, to advocate better CCI policies, to pinpoint discriminative taxation of the industry, to monitor public performance licenses, to calculate the size of the value gap, to train algorithms that find new audience for small nation repertoires abroad, and so on. Because the industry is made of microenterprises all over Europe and the world, and even larger organizations, such as collective management societies are at most SMEs, the industry has very limited market research or research & development capacities. This puts the music industry in a particularly vulnerable position because the biggest distributors of music, such as Alphabet’s Google and YouTube, or Apple’s iTunes and streaming services, or Spotify are data-driven companies that apply AI in all their workflows.

Why open data?

The music industry cannot get more favourable position in disputes if it is not able to ramp up its data analytics capacities. This is only possible in a collaborative way, because the combined research capacity of the European industry is smaller than its main partner’s capacities individually.

We believe that CEEMID could fulfil the functions of the European Music Industry if it would find a sustainable financing that makes access to all our data open. We need to be able to avoid the tragedy of the commons, where only a few industry users contribute to the financing of thousands of indicators that could potentially benefits tens of thousands of stakeholders in the EU.

We believe that a very large segment of CEEMID’s and the European Music Observatory’s data coverage is based on public-interest data that is collected by public authorities. We have also realized that the music industry stakeholders and policymakers, particularly in cultural ministries and similar functions in cities and regions underestimate the availability of open data. The EU open data regime gives access to almost all information stored in government data systems. Because this is in fact re-use of the information, the data for this purpose is usually not documented. While the data is available for free or almost free, only high level of metadata and data processing know-how is required to access it. This know-how is usually missing from cultural policy institutions and music industry bodies.

Therefore, CEEMID was based on the regulatory framework provided by the Directive on open data and the re-use of public sector information provides a common legal framework for a European market for government-held data (public sector information). It a regulatory framework that is built around two key pillars of the internal market: transparency and fair competition. In our view, these principles should apply to European Music Observatory, too.

In the EU, open data is governed by the Directive on open data and the re-use of public sector information - in short: Open Data Directive (EU) 2019/1024. It entered into force on 16 July 2019. It replaces the Public Sector Information Directive, also known as the ‘PSI Directive’ which dated from 2003 and was subsequently amended in 2013^[@eu_directive_2019_1024; @eu_directive_2013_37; @eu_directive_2003_98]. The founder of CEEMID, Daniel Antal has been involved in Open Data and PSI since 2008.

Why open collaboration in data integration?

We believe that the European Music Observatory must rely on open-source statistical software written in the R statistical language like CEEMID, and it must be funded on the principle of open collaboration with the industry, public authorities and academia.

Open-source software (OSS) is a type of computer software in which source code is released under a license in which the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose. Open-source software is often developed in a collaborative public manner, and which is a prominent example of open collaboration.

Generally, the use of open source software, including the open source R language and its software packages or libraries in the national statistical offices are encouraged by four important considerations:

  1. lower cost, 2. security, 3. no vendor ‘lock in’ and 4. better quality. We believe that the number of data scientists in the music domain is so few that only an open collaboration can guarantee adequate data quality. (In our CEEMID documentation you find more information about the [R language]( and the concept of open-source software.)

We believe that the European Music Observatory must rely on open-source statistical software written in the R statistical language like CEEMID, and it must be funded on the principle of open collaboration with the industry, public authorities and academia.

CEEMID uses the open source statistical programming language R, and various open source R programs. CEEMID also releases some of its customary program code developed to create its indicators. This allows an open collaboration with statisticians working in national statistics authorities and in independent research institutions in the EU and globally.

The use of open source software and the open source R statistical language allows a continuous peer-review of data ingestion, processing, corrections and indicator creation by statisticians, data scientists and academics. Statistical products of national statistical offices, sometimes Eurostat itself, not to mention data providers that are not part of the system of national statistical offices, such as the European Audiovisual Observatory, are plagued with data errors that are corrected and amended relatively slowly. We believe that unless the European Music Observatory becomes part of Eurostat (a highly unlikely organizational scenario), it should embrace the priniciples of open collaboration in creating its indicators.

CEEMID has been releasing some of its data processing, integration and indicator creation code on an open-source license on the Comprehensive R Archive Network, or CRAN, is a collection of sites which carry identical material, consisting of the R distribution(s), the contributed extensions, documentation for R, and binaries. (More information in our documentation.) We would like to find funding to release thousands of reproducible data sets and the reproduction code from source to end result on an open data, open collaboration basis.

Our economic impact assessment code

Our iotables package programmatically creates gross-value added, employment, taxation effect and multiplier indicators from real national accounts data for all EU countries by providing a data processing and modelling software to implement the use cases of the Eurostat Manual of Supply, Use and Input-Output Tables. It was released on rOpenGov.

The package was originally designed for the economic impact assessment of film production in Hungary and for the Slovak music industry. (See related publication.) With the help of this package, we can, for example, calculate the effect of 1 million euro crisis relief package on the GDP, employment and future tax receipts between car manufacturing, banking, the music industry and book publishing in all EU countries. By releasing our software code, we could compare test results on calculating economic impact indicators for the creative industries and other industries with the UK statistical office. We want to release more and more of CEEMID’s software code, so that professional statisticians can validate that they work perfectly or suggest improvements.

Creating regional indicators

Our second major open-source release makes it possible to create regional statistical time series and data panels from Eurostat’s regional data - not only for the creative industries, although our original aim was to support scientific research into book piracy in a project with IVIR at the University of Amsterdam.

While national borders do not change frequently, the borders of more than 1000 European NUTS1, NUTS2 and NUTS3 regions change on average every three years. The change of regional boundaries makes time-wise comparison very difficult. Because the current statistical regulations to do not make it compulsory that member states re-cast their historical data after a boundary change, Eurostat does not contain, for example, comparable data on French or Lithuanian or Hungarian regions in the previous decade.

CEEMID’s program code does exactly this, because it traces, and whenever possible, corrects boundary changes back till 2007. We released this very complicated program code as part of the eurostat R package within the rOpenGov initiative. This program package has tens of thousands of users in research institutions, industry and in the national statistical offices themselves.

We released the program code, because it requires special geostatistical knowledge, and we believe it is such a critical application that only open peer-review can guarantee its flawless and error-free work.

From the Centre of Europe to the European Music Observatory

Historically CEEMID started out as the Central and Eastern European Music Industry Databases out of necessity following a CISAC Good Governance Seminar for European Societies in 2013. The adoption of European single market and copyright rules, and the increased activity of competition authority and regulators required a more structured approach to set collective royalty and compensations tariffs in a region that was regarded traditionally as data-poor with lower quantity of industry and government data sources available.

In 2014 three societies, Artisjus, HDS and SOZA realized that need to make further efforts to modernize the way they measure their own economic impact, the economic value of their licenses to remain competitive in advocating the interests vis-?-vis domestic governments, international organizations like CISAC and GESAC and the European Union. They signed a Memorandum of Understanding with their consultant to set up the CEEMID databases and to harmonize their efforts. The decentralized, data integration strategy chosen by CEEMID however allowed a quick expansion of CEEMID to be a Pan-European data resource.