By Sarah Stevens
It seems after three years of not attending any off-campus conferences, I’ve gone a little overboard the last six weeks. You may have read my blog post about attending Posit Conf 2023 a few weeks ago. In October, I attended both the US Research Software Engineer (US-RSE) Association Conference and the Academic Data Science Alliance (ADSA) meeting. While both events related to data and computing in research, they were rather different.
One question that kept coming up in my mind throughout the US-RSE conference was, “How do groups do code review?” I have long been obsessed with the idea that more research software/code should be reviewed before it is put into production or published. In theory, this sounds simple, but in practice it is much harder. Many research software engineers and data scientists are the sole developer on a project. And even if they work as part of a group, it is difficult to share enough context for them to effectively evaluate each other’s code. Ideally, you would have two or more developers working on a single project, but in the academic setting this is often financially infeasible.
Luckily, the second day of the conference had a “Birds of a Feather” session to discuss code review. It seemed like a lot of the folks who attended had run into similar issues. The panelists presented several models for how they were managing code review in their communities. One was a community of practice in the digital humanities where individuals and teams can submit their work for review. This shared domain of expertise allows researchers to more easily review each other’s work, somewhat like journal submissions.
Another model presented was a service at Princeton University, where researchers can submit their work for a Repo Review by a research software engineer. What I found interesting about this model was that they use a checklist of common things to look for, and they have a time limit for the process. They start with a one-hour consultation meeting, then they spend an hour reviewing the repository of code, and they conclude with a one-hour follow up session. Ideally, code review is focused on incremental changes to the code, with a small group working on the same project. However, I think the Repo Review checklist could be a great start to implementing code review in a group with many different projects. Some of the checklist processes could even be automated.
This topic of conversation was very active up until the end of the session, and it was moved into the US-RSE Slack workspace for continued discussion. I’m looking forward to seeing what recommendations and resources are compiled by the group.
Academic Data Science Alliance Meeting
I was invited to speak at the ADSA meeting, and the Data Science Institute (DSI) supported my travel through their affiliate program. I presented the new Collaborative Lesson Development Training, which was recently announced as an official Carpentries program, that I helped create with the Carpentries. If you are interested in creating new data science/computing lessons for researchers using best practices and an open, collaborative curriculum template, please reach out to me: email@example.com.
The ADSA meeting included a day focused on “Health, Well-being, and the Arts.” I attended a panel describing several art and data science collaborations where the art informed the data science and wasn’t just used to visualize the science. There was also a session where we explored data visualization through our individual movement around a room. I found many of the data science and art fusion examples inspiring and related to the work being done in the Illuminating Discovery Hub at the Wisconsin Institute for Discovery. My favorite exhibit was a robot that played a piano harp, with different pitches resonating directly through its rotating arms.
The keynote presentation by Dr. Dominique Duval-Diop, the Chief Data Scientist of the United States, stood out to me. Dr. Duval-Diop talked about her winding path to becoming a data scientist through her work in economics, geospatial analysis, and policy. She also discussed the loneliness of being the “first” and “only” person (for example, the first Black woman Chief Data Scientist of the US) in various spaces of her career, and how her identity was an important part of the perspective she brought to her work. Dr. Duval-Diop talked about the “superpower” of having both domain knowledge and data science skills. I am biased, as this is where I fall as a researcher and what I try to help other researchers achieve, but I wholeheartedly agree with this sentiment. Dr. Duval-Diop also gave a very interesting (and maybe controversial to the audience) answer to the question, “Do you need a data science master’s degree to be a data scientist?” She answered “no,” sharing that she doesn’t have a data science master’s degree, and that she believes skills and experience are more important than academic degrees in this field. She added that degrees were one way to get experience, but emphasized that it should not be a limiting credential when hiring.
There were many more notable sessions at ADSA, but I will most remember the interactivity of the attendees. There were excellent questions and lots of discussions between sessions. I also found a group of individuals providing research support and uniquely applying data science in ways that could inspire future programs at UW–Madison.