Issues

How to write an issue

GitHub issues are often the first step in the PR process, since many of our PRs are reactions to bug reports and feature requests we receive. Because of this, it is worth spending a little time discussing what makes a high-quality issue.

While it might seem like opening an issue with just the error message you received is “good enough” for the package maintainer to figure it out, this typically isn’t the case in practice. Maintainers generally need enough context to reproduce the exact issue themselves before working on a solution. Crafting a great issue is a critical part of helping me (the maintainer) help you (the issue opener).

Great issues are composed of two parts:

  • Context around what you are trying to accomplish.

  • A minimal reproducible example of the problem.

Context

Issues come in all shapes and sizes, and package maintainers have to work through many of them in a single day, so providing context around exactly what was expected to happen vs what actually happened is extremely helpful. Useful context takes the form of a complete paragraph including:

  • The overarching type of problem you are experiencing - such as a bug, performance regression, or a missing feature.

  • The specific functions involved.

  • Any relevant package versions.

  • Links to any relevant issues or pull requests.

This dplyr issue is a great example of providing useful context:

Using case_when in a mutate call with a grouping variable is much, much slower in v1.1.0 compared to v1.0.10. The code works but it’s causing a tremendous slowdown in many of the packages I maintain.

The issue author goes on to provide a full before and after benchmark reprex (which is excellent!), but the context was actually so comprehensive that we could have reproduced this ourselves without it.

Reprex

A reproducible example, or reprex, is the most important part of a high-quality issue. To create a reprex, you’ll use the reprex package. To learn how to use the package, read this article. If you have the time, you should also watch Jenny Bryan’s webinar (linked from that article) that discusses how to use the package.

Without a reprex, it is typically extremely difficult for maintainers to reproduce an issue, which significantly decreases the chance of it getting resolved in a favorable way.

As you get into the habit of creating reprexes for your issues, you’ll find that the process of creating the reprex often leads you to solve the problem on your own. Because reprex forces you to run your example in a new R session, effectively giving you a “fresh start,” you’ll commonly discover that you had something “weird” in your main R session that was causing the problem to begin with.

Once you’ve gotten comfortable with creating reprexes, you should next work on making them minimal. Creating a minimal reprex is a bit of an artform, and involves stripping down an existing reprex to the minimum amount of packages and code needed to reproduce the issue. Sometimes this requires going a level deeper into the code to figure out where the underlying problem is coming from. For example, with the dplyr issue mentioned in the context section, Hadley further refined the original reprex by removing the mutate(), group_by(), and data frame from the example to show that the problem was directly related to an unexpected amount of overhead in case_when(). Creating minimal reprexes allows us to quickly focus in on the exact problem, resulting in faster PRs, like this one, to resolve them.

If your reprex involves some kind of performance regression, we recommend that you use the bench package to include a before and after reprex. This involves installing a version of the package before the regression (with devtools::install_version()), running a reprex that involves a call to bench::mark() which demonstrates the issue, and then rerunning that same reprex with the faulty version of the package installed. Hadley’s minimal reprex mentioned above is a great example of this.

New issues vs closed issues

If you find that you are having a problem that looks similar to an existing closed issue, we in the tidyteam greatly prefer that you open a new issue rather than commenting on an existing closed one. Some repositories have even set up a GitHub Actions workflow that locks old issues, preventing this. It is typically much easier for us to keep track of open issues, so the chance of your issue being resolved is much higher if you open a new one with full context, a reprex, and a link to the issue that you believe is related.

Issue first, PR second

If you are an external contributor to a package and feel that you have found a bug, it is highly recommended that you open an issue first before attempting a PR. This gives the maintainer a chance to confirm that what you are seeing is actually an issue, and that they are willing to accept a PR for it. Importantly, it avoids the tough situation of a PR author spending a bunch of time on a PR only for it to be closed by the reviewer because they didn’t believe it was a bug or didn’t agree with the approach taken by the author.

For existing issues, it is typically still worth adding a comment on the issue asking if the maintainer would accept a PR for it. They might have plans to revamp that section of the codebase entirely, which may invalidate your PR altogether. Even if they say yes, remember that you might not get an immediate review if the maintainer isn’t actively working on the package, as mentioned in the patterns of collaboration section, but we still value all contributions!