about data stuff
I am putting some resources here for you that can help you find data, these are things that I have put in place for this purpose.
You will see that I like the reproducible research principles, as well as encourage open science. Data Science helps nagivate through economic complexity.
Sanger W. and Warin Th. (2019) “Jaccard Similarity of 1517 European Political Manifestos across 27 Countries (1945-2017)” Data in Brief, DIS-S-18-02150 [DOI: 10.1016/j.dib.2019.103907]
de Marcellis-Warin N., Sanger W. and Warin Th. (2019) “Text-as-Data Analysis of Political Parties versus Government Parties: To Blend or not to Blend? The Appendix”, DOI: 10.6084/m9.figshare.7781051.v2, pp.1-63, Fe
To explore further some topics: warin.ca/publications.html
EpiBibR Github. EpiBibR stands for “epidemiology-based bibliography for R.” It is the second largest dataset about global coronavirus research and the largest one in R. The R package is under the MIT License and as such is a free resource based on the open science principles (reproducible research, open data, open code). The resource may be used by researchers, whose domain is scientometrics, but also by researchers from other disciplines. For instance, the scientific community in Artificial Intelligence and Data Science may use this package to accelerate new research insights about covid-19. The package follows the methodology put in place by the Allen Institute and its partners to create the CORD-19 dataset with some differences. The latter is accessible through downloads of sub-sets or through a REST API. The data provide important information such as authors, methods, data, and citations to make it easier for researchers to find relevant contributions to their research questions. Our package proposes 22 features for the 139,724 references (on April 16, 2021) and access to the data has been made as easy as possible in order to integrate efficiently in almost any researcher’s pipeline (Warin T, “Global Research on Coronaviruses: An R Package”, J Med Internet Res 2020;22(8):e19615, DOI: 10.2196/19615, PMID: 32730218, PMCID: 7423387).
oxfoR Github. oxforR is based on the Oxford COVID-19 Government Response Tracker (OxCGRT) and allows to retrieve their latest data in a R format. The tracker shows governmental responses to COVID-19 through 17 indicators for all countries.
statcanR CRAN Github. Easily connect to Statistics Canada’s Web Data Service with R. Open economic data (formerly known as CANSIM tables, now identified by Product IDs (PID)) are accessible as a data frame, directly in the user’s R environment.
spiR CRAN Github. In 2015, The 17 United Nations’ Sustainable Development Goals were adopted. ‘spiR’ is a wrapper of several open datasets published by the Social Progress Imperative (https://www.socialprogress.org/), including the Social Progress Index (a synthetic measure of human development across the world). ‘spiR’‘s goal is to provide data to help policymakers and researchers prioritize actions that accelerate social progress across the world in the context of the Sustainable Development Goals. The Social Progress Index proposes a new perspective on social challenges and needed efforts to accelerate social progress in line with the Sustainable Development Goals. In this context, the goal of ’spiR’ is to allow an easy connection with R to the Social Progress Index in order to benefit from the “power of crowds.” ‘spiR’ is an R wrapper to easily access the Social Progress Index datasets.
EpiBibR ExploR here
statcanR ExploR here
de Marcellis-Warin, N., Marty, F., Thelison, E., Warin, Th. (2021) “Anti-Trust Index” AI Transparency Institute here
including an interactive coding platform:
Économie industrielle avec R here
Machine Learning for International Business with R here
Data Pipeline with R here
Foundations in quantitative analysis for International Business with R here
With the development of different data packages, I have developed several tutorials to collect data from these packages and other APIs. Visit my API tutorials.