Personal history of NumFOCUSJun 14, 2016 by Andy R. Terrel
A NumFOCUS History: Sustaining Open Source Scientific Codes
After a few of the last board meetings and discussions on board votes, I’ve been pondering a number of questions. Questions about who is NumFOCUS and why we exist. Where we are going and what problems in the world we would like to solve. I feel that our small organization has started to make a big turn in our journey. We are receiving more attention from the organizations we serve and the wider community. The PyData conference has begun to become a profitable enterprise that allows us to see a sustainable path but also requires a great amount of our time. At the same time we need to support our projects and programs in a functional way that promotes our mission.
In 2011 a group of scientific python community members got together to find a way to sustain the tools the community was building. Two of the gentlemen, Travis Oliphant and John Hunter, the creators of NumPy and MatPlotLib, respectively, had left academia and used their tools in industry.
It was clear that many parts of finance, oil and gas, and analytics fields were using their tools for much of the business. While many organizations would have liked to give back it was not an easy thing to do. The communities were divided up at many universities, government labs, and companies. Who would take money for the project and who would own the results?
While Travis and John brought a strong passion to help corporations to give back to the community through donations that would help. The other three founding members, Perry Greenfield, Fernando Perez and Jarrod Millman, were still in academia seeing first hand how hard it had become to receive awards to develop software critical to the success of scientific innovation. For so many the academic software business was the thing you did on the side. I remember sneaking my software by the university’s technological commercialization program so the work wouldn’t be hidden.
In that initial set of founders, we had several project leaders for AstroPy, IPython, NumPy, SciPy, and MatPlotLib. Each of these projects are free and open source built in a bazaar model of development. Meaning everything was done in the open, accepting patches and bugs from anyone who gave enough attention to make them. These projects also strived to not form any restrictions on their use, as seen by the base layer for much of the NumFOCUS ecosystem. While this may seem like a minor point, it becomes a bit of a defining feature of the tools. While many frameworks will take over a program and prevent portability, these libraries were intended to be used by anyone. Notably the licenses for these projects were also quite liberal preferring BSD-like licenses. These decisions made the projects viable options to the enterprise thus are critical to their sustainability.
The initial board was formed and Leah Silen was hired as our Executive Director to help us organize the non profit and daily actions. I don’t have much to say about those early days as the organization tried to form itself. Looking back it was clear there wasn’t as straight a path forward as the initial fervor of the five founders might have liked. Additionally tragedy struck as John suddenly passed away mid 2012. And Travis left his position at Enthought to form Continuum Analytics.
The initial struggles of the organization were clear. The board was too busy to be a working board at the level needed. Thus the board was grown to 9 (which included myself) with a very small quorum of 3. This policy didn’t change until 2016. The board built the organization just like any open source project with Github accounts, mailing lists and many discussions about the minutia of running a non profit. Needless to say this was not going to be your typical non-profit organization.
In those early years, we sustained from three major corporate donations. Continuum Analytics, the newly formed company by Travis, gave us office space and paid Leah’s salary. JP Morgan Chase gave us a generous donation to help build our diversity program and Microsoft gave a donation to the IPython team. Additionally, we started a small conference series called PyData.
It’s not hard to see how overwhelmed we quickly became. With nine board members, one full time staffer and now projects, conferences, and a diversity program to run. We struggled to communicate all we did effectively much less get it started. In hindsight we should have been a bit more cautious about diving in head first. Leah got to know the night cleaners at the office and really poured more dedication into our effort than I can ever describe.
It is worth a moment to pause and discuss why these particular activities. The simple answer is that it takes much more for a community to survive than just coders committing to a repository. Just like all humans we have to have time to gather, discuss, and collaborate. Additionally, at that time there were two very important topics our community was grappling with, diversity and data science.
I think the first surprise to most folks is that our mission is not to write software. In fact I question any person whose mission is to write software. It’s a bit like meeting a delivery driver whose mission is to consume gasoline. While certainly a side effect of making deliveries, not the essence of the activity.
Additionally, when we started we knew we wanted to support the ecosystem of numerical tools that scientists used. NumFOCUS was an acronym of this idea “Numerical Foundation for Open Code and Usable Science”. This has also become a sticking point over the years. What about tools that are not numerical? How about those that are important but have a very small user base? Our mission became much more about promotion, education, and fiscal sponsorship. From our webpage (June 2016):
The mission of NumFOCUS is to promote sustainable high-level programming languages, open code development, and reproducible scientific research. We accomplish this mission through our educational programs and events as well as through fiscal sponsorship of open source data science projects. We aim to increase collaboration and communication within the scientific computing community.
With that as our defining goal let’s review our activities and how they pertain to that goal.
Fiscal sponsorship program
The initial idea and structure of the organization was to provide a fiscal home to projects that fit. This is a bit of a foreign concept to most open source codes so I’ll try to explain a bit. Let’s look a somewhat standard progression of open source scientific software.
A group of scientists determine a need in the community, usually at a conference, and pursue a grant from somewhere like the NSF or NIH. The project starts a shared repository and contributors from several different organizations start committing code. A license is added to the repo, now-a-days mandated by the granting institution, and it becomes usable by those the license stipulates. If the project is successful, a user base pops up and folks start putting things on mailing lists. Decisions start to be made that impact a larger group of people and thus some form of governance will emerge. The grant ends, as they are usually only for a few years, and there is a pile of code with the starting of a community.
I usually liken this to the building of a house. While the house may be built and be sound, it is far from done. Every week the house is cleaned and the grounds are attended. Every month requires minor maintenance and at least once a year major maintenance. New additions are built, rooms are rearranged, and furniture comes and goes. Even the people in the house grow and change. Eventually, people move out and new people move in. Some houses get turned into businesses, others host many overnight guests, but most serve as homes to the people who own them. Until an end of life event, such as being torn down for the progress of another more up to date home or even a restructuring of the entire city. But during all these years of a house’s life, it must be cleaned, attended, and maintained.
Just like a house, a software community has the same burdens of maintenance for sustainability. Communities also come and go. Few people use Numeric or NumArray the two libraries that NumPy was based upon. LinPACK was replaced by BLAS and LAPACK. Since the first lines of FORTRAN, the university and government lab has been a place to house scientists who are building these communities. During the last 10 years our academies no longer employ most of the scientific software writers. I have seen reports of as low as 12 percent of graduate students stay in academia.
Thus as graduate students no longer become professors, they move to industry and have taken their tools with them. It is hard to find a software project in technical fields that don’t use some open source software that comes from the academy. But without a home at universities or labs, the scientific community needs a place to hold its assets. Just like a grounds keeper of a house owns a hoe to weed, software developers gather tools and computers for maintaining their code. Project need websites for gathering information about the development. Additionally, they need meetings to gather and discuss the future of their projects.
The NumFOCUS Fiscal Sponsorship program was designed to help communities achieve these goals. We help projects to promote and organize themselves. We help them seek funding from foundations and corporations. We provide infrastructure for websites and code maintenance. We do this because we believe that giving scientists access to better tools helps everyone.
Not every project fits well with our fiscal sponsorship. For example if you are at a single institution, then that institution should be the sponsor. They provide the overhead to keep your team working and usually want you to pay your share when you do raise money. But as we have all seen it is sometimes quite hard to work with universities and labs whose goals do not meet the projects. We have an additional program, our Affiliate Project, for projects who for whatever reason cannot be a fiscal sponsored program. Being an Affiliate Project means that the board has approved your project for applying for grants from our general funds, but it subject to board approval.
There are a lot more legal details in our sponsorship projects, but we think it fills a gap for our community. As of this writing we have 15 sponsored projects and have given dozens of grants to sponsored and affiliate projects.
PyData and Conferences
In my story above about the creation of scientific software communities, one will notice that conferences and meetings come up quite a bit. It is my belief that these conferences are the most important part of our ecosystem. They allow us to promote and teach our tools. They recruit young aspiring scientists to our projects. Furthermore, they allow an open forum of thought that helps guide our future.
With this notion several people started the PyData community, notably Travis Oliphant, Peter Wang, Julie Steele, and Lynn Bender. Initially it was a one day workshop after Strata San Jose where a group of scientists got together with Guido van Rossum. The idea was to help Guido understand our use case of the Python programming language and how the core dev team could help us, such as providing better support to non standard CPython builds. The result of that meeting was a kind of admission by Guido that our community should make separate solutions as the Python community would not be focusing our needs.
The PyData Silicon Valley event was repeated in New York City, another place we knew our tools were being used in the finance sector. Van Lindberg, chair of the Python Software Foundation (PSF), gave a keynote and gave the suggestion that PyData become a conference to support the NumFOCUS foundation like PyCon does for the PSF. The conference was sold out and became an instant hit, so the notion became appealing.
Currently we have 12 PyData conferences throughout North America and Europe. One of the keys to the success has been to leave a PyData Meetup group in all the cities we visit. Some have worked and those, such as London, have created a thriving community. It gives the local community a way to keep engaged year round.
PyData is only one conference we support. We have supported many such as JuliaCon, SciPy, EuroSciPy, SciPy Latin America, Scientific Software Days, AstroPy, PyCon, and Scikit Learn sprints. These meetings provide us the opportunity to teach the world how to use our tools. As most sessions have tutorials for beginners. They also give you an opportunity to sit down with the developers of a project and understand the problems they solve. So to me they have been a huge part of our organization.
Our field, science and code, has come to admit a major diversity problem in the recent years and NumFOCUS has been trying to do our part in helping fix the problem. I, like many in the field, was blindsided by the issue. I remember distinctly the moment I saw the problem in its full breadth. Matt Davis tweeted at the SciPy 2012 Conference, where I was program chair, “More men have walked on the moon than women are at the conference.” I went back to our registration and count a whole 3 of 200 participants were female.
In the years since, NumFOCUS has given a fellowship to the SciPy Conference. At each conference we held a diversity event where people talked about ways to increase the diversity of our community. I think the number one take away for me at those events was to attract a diverse community, one has to invite a diverse set of individuals. I have always been more interested in events that I felt invited too and the scientific computing community wasn’t the most inviting bunch.
To that end, NumFOCUS has put on many diversity workshops and training programs. I think my favorite that is still ambling along is the training week for high school aged females. One of our board members Cindee Madison started the program and helped lead several workshops. Its definitely one of the programs that is currently struggling as volunteer help is always needed. I believe the program was going to spin out on its own but lost a bit of steam when Cindee had some family medical problems.
One of the best parts of being the President of NumFOCUS these past two years has been to work with some of the brightest minds on the planet try to tackle the issues facing our community. Each year our board comes up with new challenges to try and address. Some programs take off and others die.
One notable program that I didn’t speak much about is the Technical Fellowship Program. We were able to give a graduate student, Olga Botvinnik, time to work on the seaborn visualization library while working in biology. An experience that is difficult to get while pursuing a PhD. I think Olga was a great candidate but in the years since we have failed to come together and decide how we want the program to evolve. While giving a student time to develop their technical skill, it seems that money could also go to developers who work full time on the project. I’m sure we will revamp the program in the future.
We’ve given out grants to some interesting projects that are cross cutting in their concerns. One example is PyPy in a Box, which the PyPy jit accelerated Python implementation was built to work with scientific codes. Ultimately it didn’t take off. Another ongoing project is to help make Microsoft Windows a first class citizen in our development community by building a tool chain with functioning Fortran compilers and compatibility with system libraries. Yes, we still us a lot of Fortran code and you are welcome to come and try to rewrite it. Many have fallen to this task.
Sustaining Our Future
Now I’ve spent the better part of 3K words to get to the part I wanted to write about. One has to know the history to the places we are ready to go, but I think folks only ever see one part of the organization. We have many lives at NumFOCUS these days. Part conference organizers, part grant administrators, part project governance advisors, and always under funded. Our staff has grown from a single Executive Director to a team of four.
Our main source of income comes from the PyData conference series. The conference is designed to be cheap enough that students can afford to go but nice enough to fly in good keynote speakers. Without the conference series we would not be able to do the work we do.
We also receive some money from the fiscal sponsored projects. Initially, we had a very low overhead fee as we wanted to give as much to the projects as possible. We still only charge about 10% which is still absurdly low in the non profit space, but we found there is more work than we could possibly manage without more funding.
But for the most part, this is just the beginning. We need to flesh out our corporate donations program more and start applying for institutional funding for our own programs.
For now we have some basic income and can start thinking again about better ways to support our community. I would like to see us able to hire folks to help maintain projects better. For example run the infrastructure that is necessary and provide workshops for scientists to sustain their projects.