Experiential & Online Education

National Data Platform (NDP)

National Data Platform (NDP)

The National Data Platform (NDP) is a federated and extensible data ecosystem, which promotes and enables collaboration, innovation, and the equitable use of data atop existing cyberinfrastructure capabilities. Since the inception of the NDP project, our team based at the San Diego Supercomputer Center (SDSC) has worked with our partners at the University of Utah, University of Colorado Boulder (CU Boulder), and EarthScope Consortium to build out a platform that provides its users with capabilities including:

Scalable Detection of Rural Schools in Africa Using Convolutional Neural Networks and Satellite Imagery

Scalable Detection of Rural Schools in Africa Using Convolutional Neural Networks and Satellite Imagery

Many countries typically lack sufficient civic data to assess where and what challenges communities face. High resolution satellite images can provide honest assessments of neighborhoods and communities to guide aid workers, policy makers, private sector, and philanthropists. Although humans are very good at detecting patterns, manually inspecting high resolution satellite imagery at scale can be costly and time consuming. Machine learning has the potential to scale this process significantly and automate the detection of regions of interest.

Biomedical Big Data Training Collaborative (BBDTC): An Effort to Bridge the Talent Gap in Biomedical Science and Research

Biomedical Big Data Training Collaborative (BBDTC): An Effort to Bridge the Talent Gap in Biomedical Science and Research

The BBDTC (https://biobigdata.ucsd.edu) is a community-oriented platform to encourage high-quality knowledge dissemination with the aim of growing a well-informed biomedical big data community through collaborative efforts on training and education. The BBDTC is an e-learning platform that empowers the biomedical community to develop, launch and share open training materials. It deploys hands-on software training toolboxes through virtualization technologies such as Amazon EC2 and Virtualbox. The BBDTC facilitates migration of courses across other course management platforms.

Workflow-Driven Distributed Machine Learning in CHASE-CI: A Cognitive Hardware and Software Ecosystem Community Infrastructure

Workflow-Driven Distributed Machine Learning in CHASE-CI: A Cognitive Hardware and Software Ecosystem Community Infrastructure

The advances in data, computing and networking over the last two decades led to a shift in many application domains that includes machine learning on big data as a part of the scientific process, requiring new capabilities for integrated and distributed hardware and software infrastructure. This paper contributes a workflow-driven approach for dynamic data-driven application development on top of a new kind of networked Cyberinfrastructure called CHASE-CI.

Ten Simple Rules for Writing and Sharing Computational Analyses in Jupyter Notebooks

Ten Simple Rules for Writing and Sharing Computational Analyses in Jupyter Notebooks

As studies grow in scale and complexity, it has become increasingly difficult to provide clear descriptions and open access to the methods and data needed to understand and reproduce computational research. Numerous papers, including several in the Ten Simple Rules collection, have highlighted the need for robust and reproducible analyses in computational research, described the difficulty of achieving these standards, and enumerated best practices.

Sharing and Archiving Data Science Course Projects to Support Pedagogy for Future Cohorts

Sharing and Archiving Data Science Course Projects to Support Pedagogy for Future Cohorts

Founded in 2018, the Halıcıoğlu Data Science Institute (HDSI) is a significant new organization on the UC San Diego campus. As part of their pedagogical processes, HDSI faculty desired a way to store and share student capstone projects with future cohorts, so students could easily access reusable, raw datasets and analytical workflows, with the potential to expand on work done by previous cohorts. The UC San Diego Library has been managing an institutional data repository for over a decade, with established ingest workflows and tools.

Modeling Wildfire Behavior at the Continuum of Computing

Modeling Wildfire Behavior at the Continuum of Computing

This talk will review some of our recent work on building this dynamic data driven cyberinfrastructure and impactful application solution architectures that showcase integration of a variety of existing technologies and collaborative expertise. The lessons learned from the development of the NSF WIFIRE cyberinfrastructure will be summarized. Open data issues, use of edge and cloud computing on top of high-speed network, reproducibility through containerization and automated workflow provenance will also be discussed in the context of WIFIRE.

End-to-End Workflow-Driven Hydrologic Analysis for Different User Groups in HydroFrame

End-to-End Workflow-Driven Hydrologic Analysis for Different User Groups in HydroFrame

We present the initial progress on the HydroFrame community platform using an automated Kepler workflow that performs end-to-end hydrology simulations involving data ingestion, preprocessing, analysis, modeling, and visualization. We will demonstrate how different modules of workflow can be reused and repurposed for the three target user groups. Moreover, the Kepler workflow ensures complete reproducibility through a built-in provenance framework that collects workflow specific parameters, software versions and hardware system configuration.

A Demonstration of Modularity, Reuse, Reproducibility, Portability and Scalability for Modeling and Simulation of Cardiac Electrophysiology Using Kepler Workflows

A Demonstration of Modularity, Reuse, Reproducibility, Portability and Scalability for Modeling and Simulation of Cardiac Electrophysiology Using Kepler Workflows

Multi-scale computational modeling is a major branch of computational biology as evidenced by the US federal interagency Multi-Scale Modeling Consortium and major international projects. It invariably involves specific and detailed sequences of data analysis and simulation, often with multiple tools and datasets, and the community recognizes improved modularity, reuse, reproducibility, portability and scalability as critical unmet needs in this area. Scientific workflows are a well-recognized strategy for addressing these needs in scientific computing.

Using Dynamic Data Driven Cyberinfrastructure for Next Generation Disaster Intelligence

Using Dynamic Data Driven Cyberinfrastructure for Next Generation Disaster Intelligence

Wildland fires and related hazards are increasing globally. A common observation across these large events is that fire behavior is changing to be more destructive, making applied fire research more important and time critical. Significant improvements towards modeling of the extent and dynamics of evolving plethora of fire related environmental hazards, and their socio-economic and human impacts can be made through intelligent integration of modern data and computing technologies with techniques for data management, machine learning and fire modeling.

Cloud Software for Enabling Community-Oriented Integrated Hydrologic Modeling

Cloud Software for Enabling Community-Oriented Integrated Hydrologic Modeling

In previous work, we provided static domain and parameter datasets for the National Water Model (NWM) and Parflow (PF-CONUS) on demand, at regional watershed scales. We extend this functionality by connecting existing cloud applications and tools into a virtual ecosystem that supports extraction of domain and parameter datasets, execution of NWM and PF-CONUS models, and collaboration.

Workflows Community Summit: Bringing the Scientific Workflows Community Together

Workflows Community Summit: Bringing the Scientific Workflows Community Together

Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure.

Workflows Community Summit: Advancing the State-of-the-Art of Scientific Workflows Management Systems Research and Development

Workflows Community Summit: Advancing the State-of-the-Art of Scientific Workflows Management Systems Research and Development

Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms.

Modular Performance Prediction for Scientific Workflows Using Machine Learning

Modular Performance Prediction for Scientific Workflows Using Machine Learning

Scientific workflows provide an opportunity for declarative computational experiment design in an intuitive and efficient way. A distributed workflow is typically executed on a variety of resources, and it uses a variety of computational algorithms or tools to achieve the desired outcomes. Such a variety imposes additional complexity in scheduling these workflows on large scale computers. As computation becomes more distributed, insights into expected workload that a workflow presents become critical for effective resource allocation.

A Community Roadmap for Scientific Workflows Research and Development

A Community Roadmap for Scientific Workflows Research and Development

The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolated research claims, and a steep learning curve. To address some of these challenges and lay the groundwork for transforming workflows research and development, the WorkflowsRI and ExaWorks projects partnered to bring the international workflows community together. This paper reports on discussions and findings from two virtual “Workflows Community Summits” (January and April, 2021).

Building Cyberinfrastructure for Translational Impact: The WIFIRE Example

Building Cyberinfrastructure for Translational Impact: The WIFIRE Example

This paper overviews the enablers and phases for translational cyberinfrastructure for data-driven applications. In particular, it summarizes the translational process of and the lessons learned from the development of the NSF WIFIRE cyberinfrastructure. WIFIRE is an end-to-end cyberinfrastructure for real-time data fusion and data-driven simulation, prediction, and visualization of wildfire behavior. WIFIRE’s real-time data products and modeling services are routinely accessed by fire research and emergency response communities for modeling as well as the public for situational awareness.

A Science-Enabled Virtual Reality Demonstration to Increase Social Acceptance of Prescribed Burns

A Science-Enabled Virtual Reality Demonstration to Increase Social Acceptance of Prescribed Burns

Increasing social acceptance of prescribed burns is an important element of ramping up these controlled burns to the scale required to effectively mitigate destructive wildfires through reduction of excessive fire fuel loads. As part of a Design Challenge, students created concept designs for physical or virtual installations that would increase public understanding and acceptance of prescribed burns as an important tool for ending devastating megafires. The proposals defined how the public would interact with the installation and the learning goals for participants.

Enabling AI Innovation via Data and Model Sharing: An Overview of the NSF Convergence Accelerator Track D

Enabling AI Innovation via Data and Model Sharing: An Overview of the NSF Convergence Accelerator Track D

This article provides a brief overview of 18 projects funded in Track D—Data and Model Sharing to Enable AI Innovation—of the 2020 Cohort of the National Science Foundation's (NSF) Convergence Accelerator (CA) program. The NSF CA is focused on transitioning research to practice for societal impact. The projects described here were funded for one year in phase I of the program, beginning September 2020. Their focus is on delivering tools, technologies, and techniques to assist in sharing data as well as data-driven models to enable AI innovation.

WIFIRE and NESDIS User Engagement: Leveraging NOAA's Pathfinder Initiative to Develop Future Tools, Products and Services for Wildfire

WIFIRE and NESDIS User Engagement: Leveraging NOAA's Pathfinder Initiative to Develop Future Tools, Products and Services for Wildfire

WIFIRE Lab, from the University of California, San Diego, is among the first Pathfinders supporting the next generation of geostationary observations, GeoXO. NOAA plans for the Geostationary and Extended Orbits (GeoXO) Program to follow the Geostationary Operational Environmental Satellites (GOES) – R Series and Space Weather Follow-On (SWFO) missions in the 2030-2050 timeframe. This presentation will focus on the NOAA’s Pathfinder, WIFIRE lab, who supported the development of the synthetic data and exercise scenario as part of the predevelopment user engagement with GeoXO.

NOAA’s Pathfinder Value Chains

NOAA’s Pathfinder Value Chains

This presentation will present on how the NOAA Pathfinder value-chains aim to increase awareness to NOAA missions, products and services so that NESDIS can deliver maximum value to their users. This presentation will show two value chains (fire and oceans) that demonstrate how the Pathfinder value chains are used as a mechanism for incorporating user input into the development of the NOAA satellite lifecycle. This talk will also serve as an opportunity to recruit future NOAA Pathfinders.

HydroFrame

HydroFrame

The HydroFrame is a community platform that facilitates integrated hydrologic modeling across the United States. We design innovative workflow solutions that create pathways to enable hydrologic analysis for three target uses: modeling, analysis, and domain science. As part of our contribution to HydroFrame, we run HydroFrame workflows in the Kepler system, utilizing its automated workflow capabilities to perform end-to-end hydrology simulations involving data ingestion, preprocessing, analysis, modeling, and visualization.

WIFIRE

WIFIRE: Workflows Integrating Collaborative Hazard Sciences

The WIFIRE CI (cyberinfrastructure) builds an integrated system for wildfire analysis by combining satellite and remote sensor data with computational techniques to monitor weather patterns and predict wildfire spread in real-time. The WIFIRE Lab, powered by this CI and housed at the San Diego Supercomputer Center, UCSD, was founded in 2013 and is composed of various platforms and efforts, including: