Reproducible Science Aug 17
Please reach out to jon@numfocus.org if you would like to have 20 minutes to discuss a project in a future interest group call!
Attending
- Jonathan Starr (NumFOCUS)
- Program manager for OSSci at NumFOCUS.
- Part of a startup building software to address the reproduction and replication crisis.
- Tim Bonnemann
- Based in San Jose, California, but currently in Germany.
- Community Lead, Open Source Science (OSSci)
- Works at IBM Research.
- Ian Buckley
- Community and partnerships lead at Agnostic, a startup in Toronto, Canada.
- Agnostic is behind Covalent, an open-source workflow orchestration platform for HPC (High-Performance Computing) and quantum computing.
- Covalent is quantum-ready and can be used for any high-performance computing workflow orchestration, including simulations, optimizations, and ML/AI models.
- Mike Croucher
- Works at Mathworks.
- Prior experience of 20 years in academia.
- Was part of the research software engineering movement which focused on computing in research groups.
- Joined Mathworks 3 years ago with an interest in interfacing Matworks with open source.
- Travis Wrightsman
- Graduate student at Cornell University in Plant Breeding Genetics.
- Works on machine learning models to predict plant cell traits from DNA sequences.
- His research tries to decipher how DNA regions near a gene can influence the gene’s expression.
- Aims to utilize findings in an agricultural context to improve crop breeding.
- Chris Erdmann
- Associate Director for Open Science at Michael J. Fox Foundation.
- Has a long-standing history in the Open Science and software community.
- His role involves ensuring reproducibility and quality of research funded by the Michael J. Fox Foundation.
- Working on a strategy for the organization’s approach to open science and reproducibility.
- Alexy
- Director of OSSci.
Major Discussion Points:
Introduction to the Reproducibility Topic:
- Highlighted the significance of reproducibility in the scientific community.
- Discussed the social technical challenges tied to defining tools and standards for reproducibility.
- Introduced reproducibility as a two-sided market, focusing on the differing needs based on the reproducer’s perspective.
- A request for viewpoints on the significance of reproducibility across various sectors.
Reproducibility in Machine Learning:
- Addressed the obstacles when non-experts try to reproduce machine learning models.
- Underlined the value of code accessibility and its adaptability to diverse applications.
- Noted the distinction between model sharing and genuinely achieving reproducible results.
- Introduced a project, “MLC@Home”, aiming to close the gap between theoretical models and actual outcomes.
Sharing Models and Best Practices:
- Emphasized the quality of information when disseminating models.
- Pointed out the current inadequate model sharing practices in specific domains.
- Presented the Open Modeling Foundation and its objectives.
- Collaborative efforts with platforms to set standards without resorting to proprietary tools.
- Highlighted the importance of sharing and collaboration, particularly in Parkinson’s research.
Reproducibility’s Broad Impact and Ecosystem:
- Defined reproducibility as ensuring data and code are accessible and usable.
- Discussed the varied forms of reproducibility depending on user needs.
- Stressed feedback from varied stakeholders: funders, researchers, institutions.
- Worked on identifying and collaborating with organizations focusing on reproducibility.
- Explored the intricate relationship between model sharing and reproducibility, especially in the realm of machine learning.
Reproducibility in Industrial Settings & Basic Measures:
- Pointed out the risks of using non-reproducible machine learning models in industries.
- Reflected on foundational steps towards reproducibility, like sharing code/data, and the origin and achievements of the research software engineering idea.
Metrics and Levels of Reproduction:
- Discussed the scientific community’s progress in ensuring basic reproducibility.
- Proposed the establishment of metrics for reproducibility across varied disciplines.
- Discussed the complexities and categories in determining reproducibility.
Case Study & Open Science Indicators:
- Introduced the “Aligning Science Across Parkinson’s” initiative as a valuable case study.
- Discussed the use of expansive open science indicators to track reproducibility progress.
Complex Model Sharing.
- Explored challenges in sharing intricate models and workflows, highlighting solutions like “Covalent”.
Role of MathWorks & Persistence Concerns:
- Presented MathWorks’ commitment to aiding the research community and inquired about possible improvements.
- Brought up the feature allowing MATLAB code opening directly from GitHub.
- Tackled concerns about long-term accessibility and persistence of shared links and resources.
Collaboration, Education, and Cultural Shift:
- Highlighted the value of education and feedback in advancing reproducibility.
- Emphasized the necessity of incorporating reproducibility into the research culture and proposed indicators to signal research without associated software or data.
Resources and Links Shared During Discussion
- MLC@Home
- Industrial AI Symposium at Stanford
- RSECon23 in Wales
- Open Modeling Foundation
- PLOS Blog on Open Science Indicators
- French Open Science Monitor
- ACM REP ’23 Conference
- Nodes-v2 by DeSci Labs
- RAID Project Identifier
- Parkinson’s Roadmap Catalog
- ASAP CRN 2023 San Diego Open Science Training
- Covalent Platform
- HPC Focused Workshop on Containerization
- Open Source Science Community Forum
Action items
- Introduce yourself on Discourse!
- Check the UI mockup of the Map of Open Source Science (MOSS)
- Share links to projects similar to MOSS to avoid redundant efforts.
- Contribute to the Map of Open Source Science by identifying tools used in research
- Join the Google group for future notifications
- Add the Google Calendar
- Post any thoughts or ideas on the Discourse forum.
- Check out the other interest groups