MOSS Progress So Far: Thoughts and Question

I’ve spent some time mapping and exploring the ins and outs of MOSS and wanted to share my current thoughts and pose some questions to hopefully spur some discussion.

Progress so far:

  • Tim created a fantastic start of the map on Kumu at SciPy.
  • Since then, the map is steadily taking “shape”. I’ve added numerous projects, dependencies, contributors, domains, areas of focus, databases, and languages while playing around with potential visualizations and metadata structures. Through these tests, I’ve distilled numerous questions (some mentioned below). Your input would be gold.
  • Tim is currently attending a conference and is discovering and adding new projects as he comes across them. Read his update here
  • The engagement has been phenomenal. People are very excited to see this map come to fruition.
  • First impressions are that this initiative has enormous potential, however you should consider the map as it stands as a series of experiments to determine the best metadata and visual representation methods. We will be updating with the latest interactive version of the map in the coming days, so you’ll be able to explore it all yourself soon.

And join us for the upcoming MOSS interest group call on Monday, September 18th to discuss everything in person! Discussions will also include an updated concept paper and one page synopsis of the project. More information including a shared google calendar can be found here.

Many of the immediate questions I have revolve around the metadata structure of our database and, unfortunately, ontological definitions.

Short Term Questions:

  • Project Scope? Here’s where we currently stand on scope/goals. Everything is very much open to discussion!

    • Make it simpler for researchers to find what tools will help with their immediate needs
    • Make it simpler for researchers to find which projects have the community to help them tweak a tool to their specific needs
    • Make it simpler for supporters of OS to find which tools need funding and which funding will create the greatest impact
    • Identify gaps in the ecosystem – which domains need more tools
    • Avoid abandonware by making it simple to identify points of possible collaboration and integration
    • Map OS contributors so that contributors can get credit and recognition for their work
  • Target Audience? Here’s who we’ve identified thus far:

    • Scientists
    • Software Developers
    • Hiring Committees
    • Funders
    • Scientometrists
    • Software Administrators
  • Mapping Contributors to Open Source:
    Should people be on the map? What are the pros and cons? If people are on the map, should they be mapped by github username, full name, or something else? Should every contributor be scraped from the repo, or should it involve active engagement with a project, or something else?

  • Defining “Domain”:
    How granular should we be?

    For example, yt works with volumetric analysis and visualization, which is used in multiple scientific domains. Do we create a “volumetric analysis and visualization” node, a “volumetric analysis” and a “visualization” node, or do we create nodes for all the scientific domains in which yt is a common tool for volumetric analysis and visualization, or some combination?

  • Defining “Project”:
    What granularity do we want for tooling - library, package, module, etc.?

  • Node Hierarchy: Thoughts?

Longer Term Considerations:

  • The possibility of users upvoting/downvoting/flagging nodes.
  • Handling PII and user profiles.
  • Displaying the community health of a project.
  • Prioritizing based on user needs.
  • Distinguishing various types of project relationships.

My Take on Some of these Questions:

  • Scope & Expected Users: I’m particularly looking forward to thoughts here.

  • Listing Contributors:

    • Pros:
      • Recognition can elevate stellar contributors
      • Mapping contributors across an ecosystem can help reduce onboarding friction to open-source in general
    • Cons: Overemphasis on recognition might skew motivations
  • Granularity – Domain, focus, etc. and Project, package, library, etc.:
    There are a couple experiments of different schemas on the current version of the map, however ultimately visualization and representation are contingent on user context. There is a myriad of ways to visualize the map database, and ideally, in my opinion, the database should be open for anyone to craft their own visualization to share with the world.

    For example, maybe we would want to focus on illuminating project uses to people looking for a tool for a specific purpose. For this we will visualize things differently than if we want to illustrate common domain preference of tooling.

    Another way to think about it: if I am just looking for a tool, I want to see a map of things I can use right now. If I am looking to contribute to software, I might want to know what’s a package and what’s a library etc.

    Let’s also consider the yt example: a “visualization” node would probably be useful in many contexts, but that node would be connected to tools that visualize many different things. A user looking for volumetric visualization tools would have to sort through every other visualizer to find the one they need.

  • Hierarchy:
    The hierarchy I’m leaning toward is: Language → Domain → Project → Paper → Package → Person. However, this is based only on my experimentations so far and contingent on the resolutions of other questions.

I’m looking forward to your feedback and thoughts on MOSS as we start to accelerate the development of the map!

And again, join us for the upcoming MOSS interest group call on Monday September 18th to discuss everything in person. More information including a shared google calendar can be found here.

1 Like