Courses

The courses will be given on the first day (May 8, 2023) of the EMR Conference. They will open when ten or more participants are enrolled. Each course is limited to a maximum of fifteen participants. The available courses are listed below. Click the course for details.

  • > Course 1: Introduction to explainable machine learning with examples in healthcare

    Instructor(s):

    The aim of the workshop is to introduce participants to explainable artificial intelligence (XAI) methods that can be used to build predictive models and extract knowledge from predictive models. The workshop will combine discussion of the theoretical basis together with examples with code for your own execution. We will use real-world data for a mortality prediction problem for covid or classification problem for heart disease.

    The discussed methods are available in many programming languages and various libraries, but the workshop will be based on examples in R using the DALEX library. The scope of the workshop coincides with that of the book Explanatory Model Analysis https://ema.drwhy.ai/.

    The first part of the workshop is dedicated to exploratory data analysis tools and preparing for modelling. The second part of the workshop is focused on tools for developing predictive models. For the purposes of the example, we will discuss decision trees, random forests and techniques for automatic tuning of random forests. The third part will focus on local model explanation techniques. We will discuss SHAP (Shapley values), break-down and LIME, the most popular methods for local exploration of models. The fourth part will be devoted to global model explanation techniques. We will discuss the permutation importance technique for variables and the Partial Dependence technique. The workshop will be based on material from https://github.com/BetaAndBit/RML

    Why?

    Complex machine learning models are frequently used in predictive modeling. There are a lot of examples for random forest-like or boosting-like models in medicine, finance, agriculture, etc. But who trusts in black boxes? In this workshop we will show why and how one would analyse the structure of the black-box model. This will be a hands-on workshop. In each part there will be a short lecture and then time for practice and discussion. Using the example of analysing a specific dataset, we will show the basics of modelling with tree models. We will then show how to evaluate and analyse such models using XAI techniques. From the packages, we will learn about randomForest, party, mlr3, DALEX, modelStudio and arenar.

  • > Course 2: Removing unwanted variation from large-scale RNA sequencing data with PRPS

    Instructor(s):

    Large scale datasets generated by different omics technologies present unique challenges in terms of normalization and integration. This course focuses on expanding biostatistical and bioinformatics methods for such challenges. We will be focusing on the RUV normalization methods, which have shown great promise in dealing with the challenges presented by large scale datasets from TCGA. RUV-PRPS which is a novel strategy (Molania et al, 2023, Nat. Biotech, https://www.nature.com/articles/s41587-022-01440-w#code-availability) uses pseudo-replicates of pseudo-samples (PRPS) to normalize RNA-seq data in situations when technical replicate is not available. In this course we will be presenting the new RUV-PRPS package we have been developing, which is a user-friendly R package that enable researchers to run RUV-PRPS method and to visualize diagnostic plots before and after normalization to assess the quality and consistency of their data.

    Session 1: Introduction to large-scale RNA sequencing and RUV methods - Theorical session

    Session 2: Identification of unwanted variation in RNA-seq data - Hands on session

    • RNA-seq from the Cancer Genome Atlas (TCGA) and their provided normalisations
    • RUV-PRPS package with statistical methods to identify unwanted variation:
      • Functions to identify variation in categorical variables: PCA, silhouette coefficient, ARI, ANOVA, vector correlation.
      • Functions to identify variation in continuous variables: Linear regression, correlation.

    Session 3: How to apply RUV-PRPS - Hands on session

    • Selection of negative control genes
    • Function from RUV-PRPS package to create pseudo-replicates of pseudo-sample (PRPS) to correct for library size, batch effects and tumour purity.
    • Function from RUV-PRPS package to run RUV-PRPS method.

    Session 4: Normalisation performance assessment - Hands on session

    • RUV-PRPS package with statistical methods to assess the performance of normalization method:
      • Functions to identify variation in categorical variables: PCA, silhouette coefficient, ARI, ANOVA, vector correlation.
      • Functions to identify variation in continuous variables: Linear regression, correlation.
    • How unwanted variation can influence down-stream analysis including gene-gene correlation, survival analysis

  • > Course 3: How to use cloud technologies for reliable and responsible Data Science projects?

    Instructor(s):

    Researchers are using cloud environments for biomedical data sharing, run analysis tools, and collaborate. In this hands-on course, we will cover the following topics:

    • Create a reliable and secure cloud environment. Data Sharing and Auto scale of your compute solutions – 60 mins
    • Deploy and use Jupyter Lab, VS Code Server on terra.bio – 30 mins
    • Selected topics in genomic visualization and analysis with Bioconductor on cloud - 75 mins
    • Responsible Data Science use-cases – 15 mins

    Requirements:

    • Mid-level (200) Linux OS experience
    • Mid-Level (200) R programming experience
    • Mid-Level (300) Python programming experience
    • Experienced in ‘Jupyter Notebook/Lab OR Hub’ usage with different kernel types (R, Python, Julia, Spark etc.)
    • Virtual Machines will be provided in the course, but participants SHOULD BRING THEIR LAPTOPS OR PC.

    Quota: Max. 10 participants.
    Time: 3 hours.