R

Graph Theory 101 with corruption cases in Spain

According to CIS’ barometer, political corruption is the second biggest concern in Spain, only behind unemployment, and has been in this position since 2013, as we see Spanish news talking about open trials and new investigations on a regular basis. The European Commission estimates that corruption alone costs the EU economy 120 billion of euros per year, just a little less than the annual budget of the European Union.

Herds of statistical models

Big datasets found in statistical practice often have a rich structure. Most traditional methods, including their modern counterparts, fail to efficiently use the information contained in them. Here we propose and discuss an alternative modelling strategy based on herds of simple models. Big Data: How big datasets came to be Data has not always been big. Classical datasets such as the famous Anderson’s iris dataset, were often small. Many of the best known statistical methods do also deal with the problems posed by data scarcity rather than data abundance.

Automated testing with 'testthat' in practice

You test your code. We know you do. How else are you sure that your changes don’t break the program? But after you commit, you discard those pesky scripts and throw away code. Don’t you think it’s a bit of a waste to dump all that effort that took you quite a decent chunk of your day to conjure? Well, here you are, so let’s see another way. A better way.

Spatial Data Analysis with INLA

Introduction In this session I will focus on Bayesian inference using the integrated nested Laplace approximation (INLA) method. As described in Rue et al. (2009), INLA can be used to estimate the posterior marginal distribution of Bayesian hierarchical models. This method is implemented in the INLA package available for the R programming language. Given that the types of models that INLA can fit are quite wide, we will focus on spatial models for the analysis of lattice data.

Strange Attractors: an R experiment about maths, recursivity and creative coding

Learning to code can be quite hard. Apart from the difficulties of learning a new language, following a book can be quite boring. From my point of view, one of the bests ways to become a good programmer is choosing small and funny experiments oriented to train specific techniques of programming. This is what I usually do in my blog Fronkonstin. In this tutorial, we will learn to combine C++ with R to create efficient loops.

Mastering R presentations

Do you want to know how to make elegant and simple reproducible presentations? In this talk, we are going to explain how to do presentations in different output formats using one of the easiest and most exhaustive statistical software, R. Now, it is possible create Beamer, PowerPoint, or HTML presentations, including R code, \(\LaTeX\) equations, graphics, or interactive content. After the tutorial, you will be able to create R presentations on your own with R Markdown in RStudio.

Getting from flat data a world of relationships to visualise with Gephi

Network analysis offers a perspective of the data that broadens and enriches any investigation. Many times we deal with data in which the elements are related, but we have them in a tabulated format that is difficult to import into network analysis tools. Relationship data require a definition of nodes and connections. Both parts have different structures and it is not possible to structure them in a single table, at least two would be needed.

Simple yet elegant Object-Oriented programming in R with S3

The R language is peculiar in many ways, and its approach to object-oriented (OO) programming is just one of them. Indeed, base R supports not one, but three different OO systems: S3, S4 and RC classes. And yet, probably none of them would qualify as a fully-fledged OO system before the astonished gaze of an expert in languages such as Python, C++ or Java. In this tutorial, we will review the S3 system, the simplest yet most elegant of them.

An introduction to Stan with R

Stan is a probabilistic programming language for specifying statistical models. Stan provides full Bayesian inference for continuous-variable models through Markov Chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan can be called through R using the rstan package, and through Python using the pystan package.

LSTM with Keras & TensorFlow

The aim of this tutorial is to show the use of TensorFlow with KERAS for classification and prediction in Time Series Analysis. The latter just implement a Long Short Term Memory (LSTM) model (an instance of a Recurrent Neural Network which avoids the vanishing gradient problem). Introduction The code below has the aim to quick introduce Deep Learning analysis with TensorFlow using the Keras back-end in R environment. Keras is a high-level neural networks API, developed with a focus on enabling fast experimentation and not for final products.