Collecting RSE training towards the goal of a unified curriculum

Existing Research Software Engineering Training Material

Below is an ongoing collection of RSE training material, workshops, and resources. We are compiling this list as a starting point for future activities. For now, these are all provided “as is” to the community. They haven’t been reviewed for content or accuracy. They are presented in no particular order.

We eagerly welcome additions to this list. Links to slides, videos, notebooks etc that fall loosely under the Research Software Engineering umbrella are all encouraged.

We are especially seeking material that goes beyond basic research computing competency (e.g. what The Carpentries does so well) and is general enough to span multiple domains. Specific tools and technologies used only in one domain, or applicable to only one subset of computing (i.e. HPC) are typically too narrowly focused. When in doubt, submit it to be included or reach out and we’d be happy to discuss.

See below for instructions on how to add to this list.

*Note some items may be listed more than once due to overlapping topics

CI

  1. Software testing. CodeRefinery,
    This lesson is about why testing often needs to be part of the software development cycle and how such a cycle can be implemented. In demonstrates how automated testing works and shows how tests can be designed and implemented in different programming languages. [Tutorial]
  2. Continuous Integration. Mark C. Miller, ATPSEC, 2020-08-06
    A short and broad brush answer to the question “What is Continuous Integration?”. Several of the key concepts related to Continuous Integration testing are introduced including the use of either local or cloud resources as well as a brief hands-on example with TravisCI and Codecov. [Slides]
  3. Open Source Best Practices: From Continuous Integration to Static Linters. Daniel Smith and Ben Pritchard, Molecular Sciences Software Institute (MolSSI), IDEAS-Producitivity, 2018-10-17
    This webinar will continue the discussion of open source software (OSS) opportunities within the scientific ecosystem to include the many cloud and local services available to OSS free of charge. The services to be discussed include continuous integration, code coverage, and static analysis. The presenters will demonstrate the usefulness of these tools and how a small time investment at the beginning is traded for long-term benefits. These services and ideas are agnostic to software language or HPC software application and should apply to any party interested in tools that help ease the burden of software maintenance. [Slides]
  4. Continuous Integration / Continuous Development (CI/CD): Introduction. Giordon Stark, HEP Software Foundation (HSF),
    The aim of this module is to explore what it means to build a CI/CD workflow and expand on concepts unique to GitLab’s CI/CD which is essential to anyone working in ATLAS. [Hands-on] [Video]
  5. Software Testing and Continuous Integration. Kyle Niemeyer, URSSI Winter School, 2019-12-18
    This Lecture provides on the basics of softwate testing and continuous integration. [Slides] [Hands-on]

Collaboration

  1. Collaborative distributed version control with Git. CodeRefinery,
    Intermediate lesson on collaborative distributed version control with Git. Covers remotes, pull requests, code review, centralized and forking workflows, Git hooks, bare vs non-bare repositories. [Tutorial]
  2. Collaboration with Git & GitHub. Karthik Ram, URSSI Winter School, 2019-12-18
    This talk provides on the basics of Collaboration with Git & GitHub. [Slides]
  3. Git Exercises. James Howison, URSSI Winter School, 2019-12-18
    Collaborative git exercises to be done in groups of three people. [Hands-on]
  4. Contemporary Peer Code Review Practices in Research Software. Jeff Carver, URSSI Winter School, 2019-12-19
    This presentation provides on Peer Code Review Practices in Research Software. [Slides] [Hands-on]
  5. Agile Methodologies. James M. Willenbring, ATPSEC, 2020-08-06
    This talk provides a basic introduction to Agile methodologies as well as how to apply them to a small computational science team. There is a focus on starting with a small number of practices and building from there, rather than adopting a heavy-weight process. [Slides]
  6. The Turing Way Collaboration guideline. The Turing Way Community,
    This guide covers topics related to effective and inclusive collaboration. [Tutorial]
  7. Intermediate Research Software Development in Python. Steve Crouch, James Graham, Sam Mangham,
    This course aims to teach a core set of established, intermediate-level software development skills and best practices for working as part of a team in a research environment using Python. [Workshop]

Git

  1. Introduction to version control with Git. CodeRefinery,
    Introductory/intermediate lesson on version control using Git. It starts from the basics but proceeds to cover branching, merging, conflict resolution, inspecting history, sharing repositories online, undoing, using the staging area, aliases/configuration, Git under the hood. [Tutorial]
  2. Collaborative distributed version control with Git. CodeRefinery,
    Intermediate lesson on collaborative distributed version control with Git. Covers remotes, pull requests, code review, centralized and forking workflows, Git hooks, bare vs non-bare repositories. [Tutorial]
  3. Collaboration with Git & GitHub. Karthik Ram, URSSI Winter School, 2019-12-18
    This talk provides on the basics of Collaboration with Git & GitHub. [Slides]
  4. Git Exercises. James Howison, URSSI Winter School, 2019-12-18
    Collaborative git exercises to be done in groups of three people. [Hands-on]
  5. Intermediate Research Software Development in Python. Steve Crouch, James Graham, Sam Mangham,
    This course aims to teach a core set of established, intermediate-level software development skills and best practices for working as part of a team in a research environment using Python. [Workshop]

Licensing

  1. Social coding. CodeRefinery,
    This lesson is about how and why to share code, what kind of licenses are used in what situation and how software can be cited. [Tutorial]
  2. Introduction to Software Licensing. David E. Bernholdt, ORNL, IDEAS-Producitivity, 2018-12-05
    Software licensing and related matters of intellectual property can often seem confusing or hopelessly complicated, especially when many present their opinions as dogma. This presentation takes a different approach getting you to think about software licensing from the standpoint of what you want others to be able to do (or not do) with your software. We will start by developing a common understanding of the terminology used around software licenses. Then we’ll consider various scenarios of what you might want to accomplish with a software license, and what to look for in the license. We’ll also discuss some pragmatic issues around actually applying a license to your software. A list of resources will be provided to help with further exploration of these topics. [Slides]
  3. Open Science & Software Citation. Kyle Niemeyer, URSSI Winter School, 2019-12-19
    This lecture provides on licensing, copyright, open science practices and software citation. [Slides]

Packaging

  1. CMake hands-on workshop. ENCCS,
    CMake is a language-agnostic, cross-platform build tool and is nowadays the de facto standard, with large projects using it to reliably build, test, and deploy their codebases. You will learn how to Write a CMake build system for C, C++, and Fortran projects producing libraries and/or executables; Run tests for your code with CTest; Ensure your build system will work on different platforms; Detect and use external dependencies in your project; Safely and effectively build mixed-language projects (Python+C/C++, Python+Fortran, Fortran+C/C++). [Tutorial]
  2. Modern CMake. Henry Schreiner, US-ATLAS Computing Bootcamp 2020,
    The aim of this tutorial is to cover the basics of using CMake. This workshop covers the basics of making and building a project, and some details of design. [Tutorial]
  3. Basics of Packaging Python Programs. Kyle Niemeyer, URSSI Winter School, 2019-12-17
    This Presentation provides basics on python packages. [Slides] [Hands-on]
  4. Python Packages. Tomas Beuzen & Tiffany Timbers,
    Python Packages is an open source textbook that describes modern and efficient workflows for creating Python packages. [Hands-on]
  5. Best Practices in Python Package Development. The Molecular Sciences Software Institute,
    This workshop is designed for researchers in the chemical sciences. In this course, students create a Python package using the MolSSI CookieCutter. The workshop covers an introduction to version control, hosting on GitHub, project collaboration, testing, and documentation strategies. [Tutorial]
  6. Python 201: Building Better Scientific Software in Python. Geoffrey Lentner, PEARC21, 2021-07-19
    This tutorial exposes researchers to several best practices in scientific software engineering including Python packaging, automated testing, documentation management, logging, command-line interfaces, performance profiling and optimization. [Tutorial]
  7. Powerful Python Packaging for Scientific Codes. Henry Schreiner, PyHEP 2021, 2021-07-08
    This talk covers the the best practices of making a highly compatible and installable Python package based on the Scikit-HEP developer guidelines and scikit-hep/cookie. There is a strong focus on compiled extensions. The latest developments in key libraries, like pybind11, cibuildwheel, and build are covered, along with potential upcoming advancements in Scikit-Build + CMake. [Hands-on]

Performance

  1. High Performance Data Analytics in Python. ENCCS,
    This lesson gives an overview of working with research data in Python using general libraries for storing, processing, analysing and sharing data. The focus is on high performance. After covering tools for performant processing on single workstations the focus shifts to profiling and optimising, parallel and distributed computing and finally GPU computing. [Tutorial]
  2. Julia for high-performance scientific computing. ENCCS,
    This lesson starts with the basics of Julia, its syntax, multiple-dispatch paradigm, package development and best practices. It then moves on to topics relevant to high-performance scientific computing, including an overview of powerful libraries for modeling and machine learning, visualization, parallelization and GPU computing. [Tutorial]
  3. High-Performance Python and Interoperability with Compiled Code. Jim Pivarski, Princeton University Workshop, 2019-04-08
    This three day workshop examines the numerical processing ecosystem that has grown up around Python. The key library in this ecosystem is Numpy, which enables fast array programming and also provides a common data structure for sharing large, numerical datasets. We will walk through the process of restructuring “for loop” algorithms as “columnar” algorithms based on Numpy, as well as using Numba to speed up “for loop” algorithms by compiling the Python code. We’ll do the same on a GPU using CuPy (a Numpy clone written for GPUs) and Numba. We’ll also explore methods of mixing Python and C++, both for performance and for compatibility with existing libraries. Finally, I’ll introduce Pandas as a convenient front-end to Numpy for data analysis. [Workshop]
  4. Accelerating Python. Jim Pivarski, CoDaS-HEP 2019,
    Numba is an alternative that compiles Python to run as fast as C, but only if the code consists purely of numbers and arrays that don’t change type. Quite a few call out to C++, such as pybind11, Cython, and PyROOT, which is another way of escaping Python for tight loops. There are also many tools to parallelize Python, though there are some pitfalls to consider. [Hands-on]
  5. High Performance Python: CPUs. Henry Schreiner, Princeton University Workshop, 2020-11-04
    This workshop will introduce participants to high performance Python using techniques such as Just In Time (JIT) compilation through Numba. We will look at several problems, and develop solutions using several different techniques, and compare the performance gained by doing so with the (potential) loss in expressivity and clarity. [Hands-on]
  6. High Performance Python: GPUs. Henry Schreiner, Princeton University Workshop, 2019-12-04
    This workshop will introduce participants to high performance Python on GPUs using tools to provide “simplified” GPU programming, as well as offer a brief look into creating custom kernels by hand. [Hands-on]

Reproducibility

  1. Reproducible research. CodeRefinery,
    Lesson on different methods and tools for better reproducibility in research software and data. It demonstrates how version control, workflows, containers, and package managers can be used to record reproducible environments and computational steps. [Tutorial]
  2. Improving Reproducibility Through Better Software Practices. David E. Bernholdt, ATPSEC, 2020-08-06
    This presentation provides some background on the origins of concerns about reproducibility, some of the actions the larger community is taking to raise awareness and attention to it, and a more extensive discussion of how to make software-based research more reproducible at all stages of the R&D process. [Slides]
  3. Research Reproducibility in Theory and Practice (Examples and Focus on Biological Sciences). Daniel S. Katz, FSCI 2020, 2020-08-12
    This course will focus on issues of reproducibility in research from a broad perspective. It will include an introduction to the differing types of reproducibility, and a discussion of grant review guidelines and the philosophy that underpins them. [Slides] [Hands-on]

Software Engineering

  1. Modular code development. CodeRefinery,
    Type-along/demo on aspects of (un)modular code development. Focus is on the “why”, not on the “how”. [Tutorial]
  2. Introduction to Software Design. Jeff Carver, URSSI Winter School, 2019-12-17
    This presentation provides some background on Software Design. [Slides]
  3. Think Like a Programmer. Andrew Loftus, URSSI Winter School, 2019-12-17
    This presentation provides the paradigms for program design. [Slides] [Hands-on]
  4. Contemporary Peer Code Review Practices in Research Software. Jeff Carver, URSSI Winter School, 2019-12-19
    This presentation provides on Peer Code Review Practices in Research Software. [Slides] [Hands-on]
  5. Documentation. Kyle Niemeyer, URSSI Winter School, 2019-12-19
    This talk provides on the basics of software documentation. [Slides] [Hands-on]
  6. Agile Methodologies. James M. Willenbring, ATPSEC, 2020-08-06
    This talk provides a basic introduction to Agile methodologies as well as how to apply them to a small computational science team. There is a focus on starting with a small number of practices and building from there, rather than adopting a heavy-weight process. [Slides]
  7. Scientific Software Design. Anshu Dubey, ATPSEC, 2020-08-06
    This lecture provides a basic of Scientific Software Design methodology. [Slides]
  8. Software Testing. Anshu Dubey, ATPSEC, 2020-08-06
    This presentation provides a basic of software testing and verification. [Slides]
  9. Writing Clean Scientific Software. Nick Murphy,
    This presentation discusses strategies for writing clean scientific software. This presentation encourages us to think of code as communication. [Slides]
  10. Managing Research Software Projects. Daniel Standage, Greg Wilson,
    Describe the basics of software project management with a particular focus on the sorts of projects commonly found in research settings. [Tutorial]

Python in Research

  1. High Performance Data Analytics in Python. ENCCS,
    This lesson gives an overview of working with research data in Python using general libraries for storing, processing, analysing and sharing data. The focus is on high performance. After covering tools for performant processing on single workstations the focus shifts to profiling and optimising, parallel and distributed computing and finally GPU computing. [Tutorial]
  2. High-Performance Python and Interoperability with Compiled Code. Jim Pivarski, Princeton University Workshop, 2019-04-08
    This three day workshop examines the numerical processing ecosystem that has grown up around Python. The key library in this ecosystem is Numpy, which enables fast array programming and also provides a common data structure for sharing large, numerical datasets. We will walk through the process of restructuring “for loop” algorithms as “columnar” algorithms based on Numpy, as well as using Numba to speed up “for loop” algorithms by compiling the Python code. We’ll do the same on a GPU using CuPy (a Numpy clone written for GPUs) and Numba. We’ll also explore methods of mixing Python and C++, both for performance and for compatibility with existing libraries. Finally, I’ll introduce Pandas as a convenient front-end to Numpy for data analysis. [Workshop]
  3. Accelerating Python. Jim Pivarski, CoDaS-HEP 2019,
    Numba is an alternative that compiles Python to run as fast as C, but only if the code consists purely of numbers and arrays that don’t change type. Quite a few call out to C++, such as pybind11, Cython, and PyROOT, which is another way of escaping Python for tight loops. There are also many tools to parallelize Python, though there are some pitfalls to consider. [Hands-on]
  4. Basics of Packaging Python Programs. Kyle Niemeyer, URSSI Winter School, 2019-12-17
    This Presentation provides basics on python packages. [Slides] [Hands-on]
  5. High Performance Python: CPUs. Henry Schreiner, Princeton University Workshop, 2020-11-04
    This workshop will introduce participants to high performance Python using techniques such as Just In Time (JIT) compilation through Numba. We will look at several problems, and develop solutions using several different techniques, and compare the performance gained by doing so with the (potential) loss in expressivity and clarity. [Hands-on]
  6. High Performance Python: GPUs. Henry Schreiner, Princeton University Workshop, 2019-12-04
    This workshop will introduce participants to high performance Python on GPUs using tools to provide “simplified” GPU programming, as well as offer a brief look into creating custom kernels by hand. [Hands-on]
  7. Python Packages. Tomas Beuzen & Tiffany Timbers,
    Python Packages is an open source textbook that describes modern and efficient workflows for creating Python packages. [Hands-on]
  8. Research Software Engineering with Python. Damien Irving, Kate Hertweck, Luke Johnston, Joel Ostblom, Charlotte Wickham, and Greg Wilson,
    A semester-long course in Research Software Engineering with Python targeting researchers who are already using Python for their data analysis, but who want to take their coding and software development to the next level. [Tutorial]
  9. Best Practices in Python Package Development. The Molecular Sciences Software Institute,
    This workshop is designed for researchers in the chemical sciences. In this course, students create a Python package using the MolSSI CookieCutter. The workshop covers an introduction to version control, hosting on GitHub, project collaboration, testing, and documentation strategies. [Tutorial]
  10. Python 201: Building Better Scientific Software in Python. Geoffrey Lentner, PEARC21, 2021-07-19
    This tutorial exposes researchers to several best practices in scientific software engineering including Python packaging, automated testing, documentation management, logging, command-line interfaces, performance profiling and optimization. [Tutorial]
  11. Powerful Python Packaging for Scientific Codes. Henry Schreiner, PyHEP 2021, 2021-07-08
    This talk covers the the best practices of making a highly compatible and installable Python package based on the Scikit-HEP developer guidelines and scikit-hep/cookie. There is a strong focus on compiled extensions. The latest developments in key libraries, like pybind11, cibuildwheel, and build are covered, along with potential upcoming advancements in Scikit-Build + CMake. [Hands-on]
  12. Level Up Your Python. Henry Schreiner, PyHEP 2021, 2021-07-05
    Part 1 covers class design patterns, the python memory model, debugging, profiling, and more. Part 2 covers Python features and packaging. Part 3 covers common Python packages. [Tutorial]
  13. Intermediate Research Software Development in Python. Steve Crouch, James Graham, Sam Mangham,
    This course aims to teach a core set of established, intermediate-level software development skills and best practices for working as part of a team in a research environment using Python. [Workshop]

Documentation

  1. Code documentation. CodeRefinery,
    This lesson discusses different solutions for implementing and deploying code documentation. It shows how to build documentation with the documentation generator Sphinx (and compare it with others) and how to deploy it to Read the Docs, a service which hosts open documentation for free. It also shows how to deploy a project website or personal homepage to GitHub Pages. [Tutorial]
  2. Documentation. Kyle Niemeyer, URSSI Winter School, 2019-12-19
    This talk provides on the basics of software documentation. [Slides] [Hands-on]

Design

  1. Modular code development. CodeRefinery,
    Type-along/demo on aspects of (un)modular code development. Focus is on the “why”, not on the “how”. [Tutorial]
  2. Introduction to Software Design. Jeff Carver, URSSI Winter School, 2019-12-17
    This presentation provides some background on Software Design. [Slides]
  3. Intermediate Research Software Development in Python. Steve Crouch, James Graham, Sam Mangham,
    This course aims to teach a core set of established, intermediate-level software development skills and best practices for working as part of a team in a research environment using Python. [Workshop]

Refactoring

  1. Refactoring. Anshu Dubey, ATPSEC, 2020-08-06
    This presentation provides a basic of Refactoring. [Slides] [Hands-on]

Testing

  1. Software testing. CodeRefinery,
    This lesson is about why testing often needs to be part of the software development cycle and how such a cycle can be implemented. In demonstrates how automated testing works and shows how tests can be designed and implemented in different programming languages. [Tutorial]
  2. Software Testing and Continuous Integration. Kyle Niemeyer, URSSI Winter School, 2019-12-18
    This Lecture provides on the basics of softwate testing and continuous integration. [Slides] [Hands-on]
  3. Software Testing. Anshu Dubey, ATPSEC, 2020-08-06
    This presentation provides a basic of software testing and verification. [Slides]
  4. Python 201: Building Better Scientific Software in Python. Geoffrey Lentner, PEARC21, 2021-07-19
    This tutorial exposes researchers to several best practices in scientific software engineering including Python packaging, automated testing, documentation management, logging, command-line interfaces, performance profiling and optimization. [Tutorial]
  5. Intermediate Research Software Development in Python. Steve Crouch, James Graham, Sam Mangham,
    This course aims to teach a core set of established, intermediate-level software development skills and best practices for working as part of a team in a research environment using Python. [Workshop]

Other

  1. Introduction to Docker. Matthew Feickert, HEP Software Foundation (HSF),
    An opinionated introduction to using Docker as a software development tool. [Tutorial]


Links can be added by filling out this form or directly by submitting a pull request to the website’s GitHub repository.