Distributed Computing

Left Ventricle Segmentation and Volume Estimation on Cardiac MRI Using Deep Learning

Left Ventricle Segmentation and Volume Estimation on Cardiac MRI Using Deep Learning

In the United States, heart disease is the leading cause of death for both men and women, accounting for 610,000 deaths each year. Physicians use Magnetic Resonance Imaging (MRI) scans to take images of the heart in order to non-invasively estimate its structural and functional parameters for cardiovascular diagnosis and disease management. The end-systolic volume (ESV) and end-diastolic volume (EDV) of the left ventricle (LV), and the ejection fraction (EF) are indicators of heart disease.

Workflow-Driven Distributed Machine Learning in CHASE-CI: A Cognitive Hardware and Software Ecosystem Community Infrastructure

Workflow-Driven Distributed Machine Learning in CHASE-CI: A Cognitive Hardware and Software Ecosystem Community Infrastructure

The advances in data, computing and networking over the last two decades led to a shift in many application domains that includes machine learning on big data as a part of the scientific process, requiring new capabilities for integrated and distributed hardware and software infrastructure. This paper contributes a workflow-driven approach for dynamic data-driven application development on top of a new kind of networked Cyberinfrastructure called CHASE-CI.

Toward a Methodology and Framework for Workflow-Driven Team Science

Toward a Methodology and Framework for Workflow-Driven Team Science

Scientific workflows are powerful tools for the management of scalable experiments, often composed of complex tasks running on distributed resources. Existing cyberinfrastructure provides components that can be utilized within repeatable workflows. However, data and computing advances continuously change the way scientific workflows get developed and executed, pushing the scientific activity to be more data-driven, heterogeneous, and collaborative.

Scaling Deep Learning-Based Analysis of High-Resolution Satellite Imagery with Distributed Processing

Scaling Deep Learning-Based Analysis of High-Resolution Satellite Imagery with Distributed Processing

High-resolution satellite imagery is a rich source of data applicable to a variety of domains, ranging from demo-graphics and land use to agriculture and hazard assessment. We have developed an end-to-end analysis pipeline that uses deep learning and unsupervised learning to process high-resolution satellite imagery and have applied it to various applications in previous work. As high-resolution satellite imagery is large-volume data, scalability is important to be able to analyze data from large geographical areas.

PEnBayes: A Multi-Layered Ensemble Approach for Learning Bayesian Network Structure from Big Data

PEnBayes: A Multi-Layered Ensemble Approach for Learning Bayesian Network Structure from Big Data

Discovering the Bayesian network (BN) structure from big datasets containing rich causal relationships is becoming increasingly valuable for modeling and reasoning under uncertainties in many areas with big data gathered from sensors due to high volume and fast veracity. Most of the current BN structure learning algorithms have shortcomings facing big data. First, learning a BN structure from the entire big dataset is an expensive task which often ends in failure due to memory constraints.

Modeling Wildfire Behavior at the Continuum of Computing

Modeling Wildfire Behavior at the Continuum of Computing

This talk will review some of our recent work on building this dynamic data driven cyberinfrastructure and impactful application solution architectures that showcase integration of a variety of existing technologies and collaborative expertise. The lessons learned from the development of the NSF WIFIRE cyberinfrastructure will be summarized. Open data issues, use of edge and cloud computing on top of high-speed network, reproducibility through containerization and automated workflow provenance will also be discussed in the context of WIFIRE.

The Evolution of Bits and Bottlenecks in a Scientific Workflow Trying to Keep Up with Technology: Accelerating 4D Image Segmentation Applied to NASA Data

The Evolution of Bits and Bottlenecks in a Scientific Workflow Trying to Keep Up with Technology: Accelerating 4D Image Segmentation Applied to NASA Data

In 2016, a team of earth scientists directly engaged a team of computer scientists to identify cyberinfrastructure (CI) approaches that would speed up an earth science workflow. This paper describes the evolution of that workflow as the two teams bridged CI and an image segmentation algorithm to do large scale earth science research. The Pacific Research Platform (PRP) and The Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI) resources were used to significantly decreased the earth science workflow's wall-clock time from 19.5 days to 53 minutes.

A Demonstration of Modularity, Reuse, Reproducibility, Portability and Scalability for Modeling and Simulation of Cardiac Electrophysiology Using Kepler Workflows

A Demonstration of Modularity, Reuse, Reproducibility, Portability and Scalability for Modeling and Simulation of Cardiac Electrophysiology Using Kepler Workflows

Multi-scale computational modeling is a major branch of computational biology as evidenced by the US federal interagency Multi-Scale Modeling Consortium and major international projects. It invariably involves specific and detailed sequences of data analysis and simulation, often with multiple tools and datasets, and the community recognizes improved modularity, reuse, reproducibility, portability and scalability as critical unmet needs in this area. Scientific workflows are a well-recognized strategy for addressing these needs in scientific computing.

Scalable Workflow-Driven Hydrologic Analysis in HydroFrame

Scalable Workflow-Driven Hydrologic Analysis in HydroFrame

The HydroFrame project is a community platform designed to facilitate integrated hydrologic modeling across the US. As a part of HydroFrame, we seek to design innovative workflow solutions that create pathways to enable hydrologic analysis for three target user groups: the modeler, the analyzer, and the domain science educator. We present the initial progress on the HydroFrame community platform using an automated Kepler workflow. This workflow performs end-to-end hydrology simulations involving data ingestion, preprocessing, analysis, modeling, and visualization.

NeuroKube: An Automated and Autoscaling Neuroimaging Reconstruction Framework using Cloud Native Computing and A.I.

NeuroKube: An Automated and Autoscaling Neuroimaging Reconstruction Framework using Cloud Native Computing and A.I.

The neuroscience domain stands out from the field of sciences for its dependence on the study and characterization of complex, intertwining structures. Understanding the complexity of the brain has led to widespread advances in the structure of large-scale computing resources and the design of artificially intelligent analysis systems. However, the scale of problems and data generated continues to grow and outpace the standards and practices of neuroscience.

Workflows Community Summit: Bringing the Scientific Workflows Community Together

Workflows Community Summit: Bringing the Scientific Workflows Community Together

Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure.

Workflows Community Summit: Advancing the State-of-the-Art of Scientific Workflows Management Systems Research and Development

Workflows Community Summit: Advancing the State-of-the-Art of Scientific Workflows Management Systems Research and Development

Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms.

Modular Performance Prediction for Scientific Workflows Using Machine Learning

Modular Performance Prediction for Scientific Workflows Using Machine Learning

Scientific workflows provide an opportunity for declarative computational experiment design in an intuitive and efficient way. A distributed workflow is typically executed on a variety of resources, and it uses a variety of computational algorithms or tools to achieve the desired outcomes. Such a variety imposes additional complexity in scheduling these workflows on large scale computers. As computation becomes more distributed, insights into expected workload that a workflow presents become critical for effective resource allocation.

Integrated End-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD)

Integrated End-to-End Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD)

This report details the accomplishments from the ASCR funded project “Integrated End-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows” under the award numbers FWP-66406 and DE-SC0012630, with a focus on the UC San Diego (Award No. DE-SC0012630) part of the accomplishments. We refer to the project as IPPD.

Expanse: Computing without Boundaries - Architecture, Deployment, and Early Operations Experiences of a Supercomputer Designed for the Rapid Evolution in Science and Engineering

Expanse: Computing without Boundaries - Architecture, Deployment, and Early Operations Experiences of a Supercomputer Designed for the Rapid Evolution in Science and Engineering

We describe the design motivation, architecture, deployment, and early operations of Expanse, a 5 Petaflop, heterogenous HPC system that entered production as an NSF-funded resource in December 2020 and will be operated on behalf of the national community for five years. Expanse will serve a broad range of computational science and engineering through a combination of standard batch-oriented services, and by extending the system to the broader CI ecosystem through science gateways, public cloud integration, support for high throughput computing, and composable systems.

Building Cyberinfrastructure for Translational Impact: The WIFIRE Example

Building Cyberinfrastructure for Translational Impact: The WIFIRE Example

This paper overviews the enablers and phases for translational cyberinfrastructure for data-driven applications. In particular, it summarizes the translational process of and the lessons learned from the development of the NSF WIFIRE cyberinfrastructure. WIFIRE is an end-to-end cyberinfrastructure for real-time data fusion and data-driven simulation, prediction, and visualization of wildfire behavior. WIFIRE’s real-time data products and modeling services are routinely accessed by fire research and emergency response communities for modeling as well as the public for situational awareness.

Autonomous Provenance to Drive Reproducibility in Computational Hydrology

Autonomous Provenance to Drive Reproducibility in Computational Hydrology

The Kepler-driven provenance framework provides an Autonomous Provenance Collection capability for Hydrologic research. The framework scales to capture model parameters, user actions, hardware specifications and facilitates quick retrieval for actionable insights, whether the scientist is handling a small watershed simulation or a large continental-scale problem.

Towards a Dynamic Composability Approach for Using Heterogeneous Systems in Remote Sensing

Towards a Dynamic Composability Approach for Using Heterogeneous Systems in Remote Sensing

Influenced by the advances in data and computing, the scientific practice increasingly involves machine learning and artificial intelligence driven methods which requires specialized capabilities at the system-, science- and service-level in addition to the conventional large-capacity supercomputing approaches. The latest distributed architectures built around the composability of data-centric applications led to the emergence of a new ecosystem for container coordination and integration.

A Science-Enabled Virtual Reality Demonstration to Increase Social Acceptance of Prescribed Burns

A Science-Enabled Virtual Reality Demonstration to Increase Social Acceptance of Prescribed Burns

Increasing social acceptance of prescribed burns is an important element of ramping up these controlled burns to the scale required to effectively mitigate destructive wildfires through reduction of excessive fire fuel loads. As part of a Design Challenge, students created concept designs for physical or virtual installations that would increase public understanding and acceptance of prescribed burns as an important tool for ending devastating megafires. The proposals defined how the public would interact with the installation and the learning goals for participants.

Machine Learning for Improved Post-Fire Debris Flow Likelihood Prediction

Machine Learning for Improved Post-Fire Debris Flow Likelihood Prediction

Timely prediction of debris flow probabilities in areas impacted by wildfires is crucial to mitigate public exposure to this hazard during post-fire rainstorms. This paper presents a machine learning approach to amend an existing dataset of post-fire debris flow events with additional features reflecting existing vegetation type and geology, and train traditional and deep learning methods on a randomly selected subset of the data.

IPPD

IPPD: Integrated End-to-End Performance Prediction and Diagnosis for Extreme Scientific Workflows

Scientific workflows execute on a loosely connected set of distributed and heterogeneous computational resources. The Integrated End-to-End Performance Prediction and Diagnosis (IPPD) project contributes to clear understanding of the factors that influence the performance and potential optimization of scientific workflows. IPPD addressed three core issues in order to provide insights into workflow execution that can be used to both explain and optimize their execution:

NRP

National Research Platform

The National Research Platform (NRP), formerly known as the Pacific Research Platform (PRP), is a collaborative, multi-institutional effort to create a shared national infrastructure for data-driven research. Backed by the National Science Foundation (NSF) and the Department of Energy (DOE), the NRP provides high-performance computing resources, data storage and management services, and network connectivity capabilities to researchers across various disciplines (e.g., the earth sciences and health sciences).

WIFIRE

WIFIRE: Workflows Integrating Collaborative Hazard Sciences

The WIFIRE CI (cyberinfrastructure) builds an integrated system for wildfire analysis by combining satellite and remote sensor data with computational techniques to monitor weather patterns and predict wildfire spread in real-time. The WIFIRE Lab, powered by this CI and housed at the San Diego Supercomputer Center, UCSD, was founded in 2013 and is composed of various platforms and efforts, including: