Collecting RSE training towards the goal of a unified curriculum

Existing Research Software Engineering Training Material

Below is an ongoing collection of RSE training material, workshops, and resources. We are compiling this list as a starting point for future activities. For now, these are all provided “as is” to the community. They haven’t been reviewed for content or accuracy. They are presented in no particular order.

We eagerly welcome additions to this list. Links to slides, videos, notebooks etc that fall loosely under the Research Software Engineering umbrella are all encouraged.

We are especially seeking material that goes beyond basic research computing competency (e.g. what The Carpentries does so well) and is general enough to span multiple domains. Specific tools and technologies used only in one domain, or applicable to only one subset of computing (i.e. HPC) are typically too narrowly focused. When in doubt, submit it to be included or reach out and we’d be happy to discuss.

See below for instructions on how to add to this list.

*Note some items may be listed more than once due to overlapping topics

CI

  1. Continuous Integration. Mark C. Miller, ATPSEC, 2020-08-06
    A short and broad brush answer to the question “What is Continuous Integration?”. Several of the key concepts related to Continuous Integration testing are introduced including the use of either local or cloud resources as well as a brief hands-on example with TravisCI and Codecov.
  2. Open Source Best Practices: From Continuous Integration to Static Linters. Daniel Smith and Ben Pritchard, Molecular Sciences Software Institute (MolSSI), IDEAS-Producitivity, 2018-10-17
    This webinar will continue the discussion of open source software (OSS) opportunities within the scientific ecosystem to include the many cloud and local services available to OSS free of charge. The services to be discussed include continuous integration, code coverage, and static analysis. The presenters will demonstrate the usefulness of these tools and how a small time investment at the beginning is traded for long-term benefits. These services and ideas are agnostic to software language or HPC software application and should apply to any party interested in tools that help ease the burden of software maintenance.
  3. Continuous Integration / Continuous Development (CI/CD): Introduction. Giordon Stark, HEP Software Foundation (HSF),
    The aim of this module is to explore what it means to build a CI/CD workflow and expand on concepts unique to GitLab’s CI/CD which is essential to anyone working in ATLAS.
  4. Software Testing and Continuous Integration. Kyle Niemeyer, URSSI Winter School, 2019-12-18

Collaboration

  1. Collaboration with Git & GitHub. Karthik Ram, URSSI Winter School, 2019-12-18
  2. Git Exercises. James Howison, URSSI Winter School, 2019-12-18
    Collaborative git exercises to be done in groups of three
  3. Contemporary Peer Code Review Practices in Research Software. Jeff Carver, URSSI Winter School, 2019-12-19
    http://carver.cs.ua.edu/Slides/2019/URSSI-WinterSchool/URSSI-WinterSchool-PeerCodeReview.pdf
  4. Agile Methodologies. James M. Willenbring, ATPSEC, 2020-08-06
    This talk provides a basic introduction to Agile methodologies as well as how to apply them to a small computational science team. There is a focus on starting with a small number of practices and building from there, rather than adopting a heavy-weight process.
  5. The Turing Way Collaboration guideline. The Turing Way Community,
    This guide covers topics related to effective and inclusive collaboration.

Git

  1. Collaboration with Git & GitHub. Karthik Ram, URSSI Winter School, 2019-12-18
  2. Git Exercises. James Howison, URSSI Winter School, 2019-12-18
    Collaborative git exercises to be done in groups of three

Licensing

  1. Introduction to Software Licensing. David E. Bernholdt, ORNL, IDEAS-Producitivity, 2018-12-05
    Software licensing and related matters of intellectual property can often seem confusing or hopelessly complicated, especially when many present their opinions as dogma. This presentation takes a different approach: getting you to think about software licensing from the standpoint of what you want others to be able to do (or not do) with your software. We will start by developing a common understanding of the terminology used around software licenses. Then we’ll consider various scenarios of what you might want to accomplish with a software license, and what to look for in the license. We’ll also discuss some pragmatic issues around actually applying a license to your software. A list of resources will be provided to help with further exploration of these topics.
  2. Open Science & Software Citation. Kyle Niemeyer, URSSI Winter School, 2019-12-19

Packaging

  1. Modern CMake. Henry Schreiner, US-ATLAS Computing Bootcamp 2020,
    The aim of this tutorial is to cover the basics of using CMake. This workshop covers the basics of making and building a project, and some details of design.
  2. Basics of Packaging Python Programs. Kyle Niemeyer, URSSI Winter School, 2019-12-17
  3. Python Packages. Tomas Beuzen & Tiffany Timbers,
    Python Packages is an open source textbook that describes modern and efficient workflows for creating Python packages.
  4. Best Practices in Python Package Development. The Molecular Sciences Software Institute,
    This workshop is designed for researchers in the chemical sciences. In this course, students create a Python package using the MolSSI CookieCutter. The workshop covers an introduction to version control, hosting on GitHub, project collaboration, testing, and documentation strategies.

Performance

  1. High-Performance Python and Interoperability with Compiled Code. Jim Pivarski, Princeton University Workshop, 2019-04-08
    This three day workshop examines the numerical processing ecosystem that has grown up around Python. The key library in this ecosystem is Numpy, which enables fast array programming and also provides a common data structure for sharing large, numerical datasets. We will walk through the process of restructuring "for loop" algorithms as "columnar" algorithms based on Numpy, as well as using Numba to speed up "for loop" algorithms by compiling the Python code. We'll do the same on a GPU using CuPy (a Numpy clone written for GPUs) and Numba. We'll also explore methods of mixing Python and C++, both for performance and for compatibility with existing libraries. Finally, I'll introduce Pandas as a convenient front-end to Numpy for data analysis.
  2. Accelerating Python. Jim Pivarski, CoDaS-HEP 2019,
    Numba is an alternative that compiles Python to run as fast as C, but only if the code consists purely of numbers and arrays that don't change type. Quite a few call out to C++, such as pybind11, Cython, and PyROOT, which is another way of escaping Python for tight loops. There are also many tools to parallelize Python, though there are some pitfalls to consider.
  3. High Performance Python: CPUs. Henry Schreiner, Princeton University Workshop, 2020-11-04
    This workshop will introduce participants to high performance Python using techniques such as Just In Time (JIT) compilation through Numba. We will look at several problems, and develop solutions using several different techniques, and compare the performance gained by doing so with the (potential) loss in expressivity and clarity.
  4. High Performance Python: GPUs. Henry Schreiner, Princeton University Workshop, 2019-12-04
    This workshop will introduce participants to high performance Python on GPUs using tools to provide “simplified” GPU programming, as well as offer a brief look into creating custom kernels by hand.

Reproducibility

  1. Improving Reproducibility Through Better Software Practices. David E. Bernholdt, ATPSEC, 2020-08-06
    This presentation provides some background on the origins of concerns about reproducibility, some of the actions the larger community is taking to raise awareness and attention to it, and a more extensive discussion of how to make software-based research more reproducible at all stages of the R&D process.
  2. Research Reproducibility in Theory and Practice (Examples and Focus on Biological Sciences). Daniel S. Katz, FSCI 2020, 2020-08-12
    This course will focus on issues of reproducibility in research from a broad perspective. It will include an introduction to the differing types of reproducibility, and a discussion of grant review guidelines and the philosophy that underpins them.

Other

  1. Introduction to Docker. Matthew Feickert, HEP Software Foundation (HSF),
    An opinionated introduction to using Docker as a software development tool.
  2. Introduction to Software Design. Jeff Carver, URSSI Winter School, 2019-12-17
  3. Think Like a Programmer. Andrew Loftus, URSSI Winter School, 2019-12-17
  4. Documentation. Kyle Niemeyer, URSSI Winter School, 2019-12-19
  5. Scientific Software Design. Anshu Dubey, ATPSEC, 2020-08-06
  6. Software Testing. Anshu Dubey, ATPSEC, 2020-08-06
  7. Refactoring. Anshu Dubey, ATPSEC, 2020-08-06
  8. Writing Clean Scientific Software. Nick Murphy,
    This presentation discusses strategies for writing clean scientific software. This presentation encourages us to think of code as communication.
  9. Research Software Engineering with Python. Damien Irving, Kate Hertweck, Luke Johnston, Joel Ostblom, Charlotte Wickham, and Greg Wilson,
    A semester-long course in Research Software Engineering with Python targeting researchers who are already using Python for their data analysis, but who want to take their coding and software development to the next level.


Links can be added by filling out this form or directly by submitting a pull request to the website’s GitHub repository.