Sharing and Archiving Data Science Course Projects to Support Pedagogy for Future Cohorts
Founded in 2018, the Halıcıoğlu Data Science Institute (HDSI) is a significant new organization on the UC San Diego campus. As part of their pedagogical processes, HDSI faculty desired a way to store and share student capstone projects with future cohorts, so students could easily access reusable, raw datasets and analytical workflows, with the potential to expand on work done by previous cohorts. The UC San Diego Library has been managing an institutional data repository for over a decade, with established ingest workflows and tools. The Library's Digital Collections accommodates collections of objects, complex organization of data and metadata, batch ingest options, and deposit of large datasets. Curators review data submissions and work with data depositors to enrich collections with descriptive metadata, controlled vocabularies, and persistent identifiers. HDSI faculty are working with the Library's Data Science Librarian and members of the Library's Research Data Curation program to design a custom workflow to facilitate ingest of an entire class worth of projects within a relatively quick time frame and have students prepare their materials with minimal one-on-one instruction from Library staff. The first test of this workflow ingests project materials from the Data Science & Engineering Master of Advanced Study capstone course into the Library's digital repository and makes the materials available using community-standard discoverability tools. This process represents the first ingestion of data- and computationally-intensive student projects into the repository and is intended to provide a template for a scalable workflow to accommodate other courses, ultimately creating a series of course data collections to support teaching and learning.