Supporting Hypothesis
In September, Stripe is supporting the development of Hypothesis, an open-source testing library for Python created by David MacIver. Hypothesis is the only project we’ve found that provides effective tooling for testing code for machine learning, a domain in which testing and correctness are notoriously difficult.
Instead of unit tests, Hypothesis lets you define certain properties of your functions that should hold true for every input. A property is a statement like “My sorting function should return a sorted list given any input list.” Every time the tests run, Hypothesis attempts to prove your properties wrong by feeding in thousands of automatically generated example inputs. If any of your properties break, Hypothesis returns the smallest possible example of failing input.
Here’s an example of a Hypothesis test:
This style of testing is a perfect match for machine learning workflows. We use machine learning to make products like Radar, which helps hundreds of thousands of Stripe users fight fraud at a global scale, more effective. Testing machine learning code is especially critical when your systems can have material consequences for users. Every day, we train many models on large datasets, but unit tests alone can’t capture all of the complexity of the possible input data. For the past few months we’ve been using Hypothesis to generate input data for our tests of the models behind Radar.
While working with Hypothesis, we found that support for property-based testing with Pandas and NumPy wasn’t built out. We’re excited to support the project in making concrete progress towards integrating with these two foundational, commonly-used libraries in Python’s ML toolkit.
We plan to use Hypothesis more broadly at Stripe and hope that the project’s development over the next few months also helps other companies reliably integrate machine learning into more products.
At Stripe, we regularly contribute to open-source projects and rely on open-source software for developing many different parts of our stack. We have a particularly strong interest in areas where the right tooling can provide outsized leverage to the larger developer community. If you’re working on such a project, we’d love to hear from you!