blog/skills.org
2018-11-13 20:00:01 +01:00

2.8 KiB

— title: "Skills in Statistics, Data Science and Machine Learning" —

Statistics

  • Knowledge of Linear Models and Generalised Linear Models (including logistic regression), both in theory and in applications
  • Classical Statistical inference (maximum likelihood estimation, method of moments, minimal variance unbiased estimators) and testing (including goodness of fit)
  • Nonparametric statistics
  • Bootstrap methods, hidden Markov models
  • Knowledge of Bayesian Analysis techniques for inference and testing: Markov Chain Monte Carlo, Approximate Bayesian Computation, Reversible Jump MCMC
  • Good knowledge of R for statistical modelling and plotting

Data Analysis

  • Experience with large datasets, for classification and regression
  • Descriptive statistics, plotting (with dimensionality reduction)
  • Data cleaning and formatting
  • Experience with unstructured data coming directly from embedded sensors to a microcontroller
  • Experience with large graph and network data
  • Experience with live data from APIs
  • Data analysis with Pandas, xarray (Python) and the tidyverse (R)
  • Basic knowledge of SQL

Graph and Network Analysis

  • Research project on community detection and graph clustering (theory and implementation)
  • Research project on Topological Data Analysis for time-dependent networks
  • Random graph models
  • Estimation in networks (Stein's method for Normal and Poisson estimation)
  • Network Analysis with NetworkX, graph-tool (Python) and igraph (R and Python)

Time Series Analysis

  • experience in analysing inertial sensors data (accelerometer, gyroscope, magnetometer), both in real-time and in post-processing
  • use of statistical method for step detection, gait detection, and trajectory reconstruction
  • Kalman filtering, Fourier and wavelet analysis
  • Machine Learning methods applied to time series (decision trees, SVMs and Recurrent Neural Networks in particular)
  • Experience with signal processing functions in Numpy and Scipy (Python)

Machine Learning

  • Experience in Dimensionality Reduction (PCA, MDS, Kernel PCA, Isomap, spectral clustering)
  • Experience with the most common methods and techniques
  • Random forests, SVMs, Neural Networks (including CNNs and RNNs), both theoretical knowledge and practical experience
  • Bagging and boosting estimators
  • Cross-validation
  • Kernel methods, reproducing kernel Hilbert spaces, collaborative filtering, variational Bayes, Gaussian processes
  • Machine Learning libraries: Scikit-Learn, PyTorch, TensorFlow, Keras

Simulation

  • Inversion, Transformation, Rejection, and Importance sampling
  • Gibbs sampling
  • Metropolis-Hastings
  • Reversible jump MCMC
  • Hidden Markov Models and Sequential Monte Carlo Methods