Notes: Reproducible Science IG Call Aug 17

jringo · August 21, 2023, 7:13pm

Reproducible Science Aug 17

Please reach out to jon@numfocus.org if you would like to have 20 minutes to discuss a project in a future interest group call!

Attending

Jonathan Starr (NumFOCUS)
- Program manager for OSSci at NumFOCUS.
- Part of a startup building software to address the reproduction and replication crisis.
Tim Bonnemann
- Based in San Jose, California, but currently in Germany.
- Community Lead, Open Source Science (OSSci)
- Works at IBM Research.
Ian Buckley
- Community and partnerships lead at Agnostic, a startup in Toronto, Canada.
- Agnostic is behind Covalent, an open-source workflow orchestration platform for HPC (High-Performance Computing) and quantum computing.
- Covalent is quantum-ready and can be used for any high-performance computing workflow orchestration, including simulations, optimizations, and ML/AI models.
Mike Croucher
- Works at Mathworks.
- Prior experience of 20 years in academia.
- Was part of the research software engineering movement which focused on computing in research groups.
- Joined Mathworks 3 years ago with an interest in interfacing Matworks with open source.
Travis Wrightsman
- Graduate student at Cornell University in Plant Breeding Genetics.
- Works on machine learning models to predict plant cell traits from DNA sequences.
- His research tries to decipher how DNA regions near a gene can influence the gene’s expression.
- Aims to utilize findings in an agricultural context to improve crop breeding.
Chris Erdmann
- Associate Director for Open Science at Michael J. Fox Foundation.
- Has a long-standing history in the Open Science and software community.
- His role involves ensuring reproducibility and quality of research funded by the Michael J. Fox Foundation.
- Working on a strategy for the organization’s approach to open science and reproducibility.
Alexy
- Director of OSSci.

Major Discussion Points:

Introduction to the Reproducibility Topic:

Highlighted the significance of reproducibility in the scientific community.
Discussed the social technical challenges tied to defining tools and standards for reproducibility.
Introduced reproducibility as a two-sided market, focusing on the differing needs based on the reproducer’s perspective.
A request for viewpoints on the significance of reproducibility across various sectors.

Reproducibility in Machine Learning:

Addressed the obstacles when non-experts try to reproduce machine learning models.
Underlined the value of code accessibility and its adaptability to diverse applications.
Noted the distinction between model sharing and genuinely achieving reproducible results.
Introduced a project, “MLC@Home”, aiming to close the gap between theoretical models and actual outcomes.

Sharing Models and Best Practices:

Emphasized the quality of information when disseminating models.
Pointed out the current inadequate model sharing practices in specific domains.
Presented the Open Modeling Foundation and its objectives.
Collaborative efforts with platforms to set standards without resorting to proprietary tools.
Highlighted the importance of sharing and collaboration, particularly in Parkinson’s research.

Reproducibility’s Broad Impact and Ecosystem:

Defined reproducibility as ensuring data and code are accessible and usable.
Discussed the varied forms of reproducibility depending on user needs.
Stressed feedback from varied stakeholders: funders, researchers, institutions.
Worked on identifying and collaborating with organizations focusing on reproducibility.
Explored the intricate relationship between model sharing and reproducibility, especially in the realm of machine learning.

Reproducibility in Industrial Settings & Basic Measures:

Pointed out the risks of using non-reproducible machine learning models in industries.
Reflected on foundational steps towards reproducibility, like sharing code/data, and the origin and achievements of the research software engineering idea.

Metrics and Levels of Reproduction:

Discussed the scientific community’s progress in ensuring basic reproducibility.
Proposed the establishment of metrics for reproducibility across varied disciplines.
Discussed the complexities and categories in determining reproducibility.

Case Study & Open Science Indicators:

Introduced the “Aligning Science Across Parkinson’s” initiative as a valuable case study.
Discussed the use of expansive open science indicators to track reproducibility progress.

Complex Model Sharing.

Explored challenges in sharing intricate models and workflows, highlighting solutions like “Covalent”.

Role of MathWorks & Persistence Concerns:

Presented MathWorks’ commitment to aiding the research community and inquired about possible improvements.
Brought up the feature allowing MATLAB code opening directly from GitHub.
Tackled concerns about long-term accessibility and persistence of shared links and resources.

Collaboration, Education, and Cultural Shift:

Highlighted the value of education and feedback in advancing reproducibility.
Emphasized the necessity of incorporating reproducibility into the research culture and proposed indicators to signal research without associated software or data.

Resources and Links Shared During Discussion

MLC@Home
Industrial AI Symposium at Stanford
- K1st.world
RSECon23 in Wales
- RSECon23
Open Modeling Foundation
- Openmodelingfoundation.org
PLOS Blog on Open Science Indicators
- The PLOS Blog article
French Open Science Monitor
- French Open Science Monitor
ACM REP ’23 Conference
- ACM REP ’23
Nodes-v2 by DeSci Labs
- Nodes-v2
RAID Project Identifier
- RAID
Parkinson’s Roadmap Catalog
- Parkinsons Roadmap Catalog
ASAP CRN 2023 San Diego Open Science Training
- Google Slides
Covalent Platform
HPC Focused Workshop on Containerization
- Canopie-HPC CFP
Open Source Science Community Forum
- Open Source Science Forum

Action items

Introduce yourself on Discourse!
- Introductions - #5 by aoun.abel
Check the UI mockup of the Map of Open Source Science (MOSS)
- https://map.opensource.science/
Share links to projects similar to MOSS to avoid redundant efforts.
- Literature review and similar efforts
Contribute to the Map of Open Source Science by identifying tools used in research
- Mapping the OSS tool landscape - #3 by tbonnemann
Join the Google group for future notifications
- https://groups.google.com/a/opensource.science/g/reproducible-science
Add the Google Calendar
- https://calendar.google.com/calendar/u/0?cid=Y183YjVlZDNlNTZjNTI2YzE4MGY2MzcwYTZiYWM5N[…]ODFjZWJjMzY0OTJiMjg2YzNlQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20
Post any thoughts or ideas on the Discourse forum.
- https://community.opensource.science/
Check out the other interest groups
- MOSS
  - https://groups.google.com/a/opensource.science/g/map-of-open-source-science
- Climate and Sustainability
  - https://groups.google.com/a/opensource.science/g/climate-and-sustainability

twrightsman · August 21, 2023, 11:47pm

Enjoyed being part of the call and wished I could have stayed for the whole thing!

The part on complex model sharing really resonates with me as an machine learning applier because I commonly find myself spending a lot of time moving back and forth between having flexible code to rapidly iterate with and scalable code that can test lots of models quickly. Covalent looks like a really cool solution to this; will have to try it out.

In terms of reproducibility, it would be nice as a researcher to have some sort of score attached to a paper which publishes a model I’d like to try out that evaluates how easy it will be for me to modify it for my problem. Maybe something like an Altmetric, but model accessibility. Maybe Open Modeling Foundation already has guidelines for this?

jringo · August 23, 2023, 1:27pm

That’s a great question! @christopher.c.erdman might know more on whether OMF has existing guidelines.

I’d love to see multiple scores developed that use a variety of metrics that might satisfy a viewers needs in various contexts. I think a lot of issues with reproducibility and “scoring” science in general is that reality is so much more complex than a single score. And as was brought up during the call (sad you had to drop out early!), reproducibilty means different things to different “users” of science based on, among other this, the context behind their interaction with the research or knowledge. There has to be variety and choice, which would mean developing standards and tools that “score” or “impact” developers can use to experiment.

And for Covalent, I think Ian or someone else from the team might give a presentation on it in a couple calls, so keep an eye out!

christopher.c.erdman · August 24, 2023, 7:12pm

RE: OMF, best practices/guidelines are in development. But speaking to the score comment, we showed the radar chart that is part of our reports from Dataseer regarding the outputs in a paper. It is a visual at least, but otherwise, we can look at indicators.

Topic		Replies	Views
Notes: Reproducible Science IG Call Sept 21 Reproducible Science	0	216	September 25, 2023
Notes: Map of Open Source Science IG Call Aug 14 Map of Science	0	239	August 21, 2023
Notes: Map of Open Source Science IG Call Sept 18 Map of Science	0	215	September 25, 2023
Notes: Climate & Sustainability IG Call Aug 10 Climate and Sustainability call-notes	0	213	August 14, 2023
OSSci as a c6? Response to Paul Ivanov's Post General	2	134	May 12, 2024