Composable Systems

Left Ventricle Segmentation and Volume Estimation on Cardiac MRI Using Deep Learning

Left Ventricle Segmentation and Volume Estimation on Cardiac MRI Using Deep Learning

In the United States, heart disease is the leading cause of death for both men and women, accounting for 610,000 deaths each year. Physicians use Magnetic Resonance Imaging (MRI) scans to take images of the heart in order to non-invasively estimate its structural and functional parameters for cardiovascular diagnosis and disease management. The end-systolic volume (ESV) and end-diastolic volume (EDV) of the left ventricle (LV), and the ejection fraction (EF) are indicators of heart disease.

Land Cover Classification at the Wildland Urban Interface Using High-Resolution Satellite Imagery and Deep Learning

Land Cover Classification at the Wildland Urban Interface Using High-Resolution Satellite Imagery and Deep Learning

Land cover classification analysis from satellite imagery is important for monitoring change in ecosystems and urban growth over time. However, the land cover classifications that are widely available in the United States are generated at a low spatial and temporal resolution, so that the spatial distribution between vegetation and urban areas in the wildland urban interface is difficult to measure. High spatial and temporal resolution analysis is essential for understanding and managing changing environments in these regions.

Workflow-Driven Distributed Machine Learning in CHASE-CI: A Cognitive Hardware and Software Ecosystem Community Infrastructure

Workflow-Driven Distributed Machine Learning in CHASE-CI: A Cognitive Hardware and Software Ecosystem Community Infrastructure

The advances in data, computing and networking over the last two decades led to a shift in many application domains that includes machine learning on big data as a part of the scientific process, requiring new capabilities for integrated and distributed hardware and software infrastructure. This paper contributes a workflow-driven approach for dynamic data-driven application development on top of a new kind of networked Cyberinfrastructure called CHASE-CI.

Understanding a Rapidly Expanding Refugee Camp Using Convolutional Neural Networks and Satellite Imagery

Understanding a Rapidly Expanding Refugee Camp Using Convolutional Neural Networks and Satellite Imagery

In summer 2017, close to one million Rohingya, an ethnic minority group in Myanmar, have fled to Bangladesh due to the persecution of Muslims. This large influx of refugees has resided around existing refugee camps. Because of this dramatic expansion, the newly established Kutupalong-Balukhali expansion site lacked basic infrastructure and public service.

Toward a Methodology and Framework for Workflow-Driven Team Science

Toward a Methodology and Framework for Workflow-Driven Team Science

Scientific workflows are powerful tools for the management of scalable experiments, often composed of complex tasks running on distributed resources. Existing cyberinfrastructure provides components that can be utilized within repeatable workflows. However, data and computing advances continuously change the way scientific workflows get developed and executed, pushing the scientific activity to be more data-driven, heterogeneous, and collaborative.

Scaling Deep Learning-Based Analysis of High-Resolution Satellite Imagery with Distributed Processing

Scaling Deep Learning-Based Analysis of High-Resolution Satellite Imagery with Distributed Processing

High-resolution satellite imagery is a rich source of data applicable to a variety of domains, ranging from demo-graphics and land use to agriculture and hazard assessment. We have developed an end-to-end analysis pipeline that uses deep learning and unsupervised learning to process high-resolution satellite imagery and have applied it to various applications in previous work. As high-resolution satellite imagery is large-volume data, scalability is important to be able to analyze data from large geographical areas.

PEnBayes: A Multi-Layered Ensemble Approach for Learning Bayesian Network Structure from Big Data

PEnBayes: A Multi-Layered Ensemble Approach for Learning Bayesian Network Structure from Big Data

Discovering the Bayesian network (BN) structure from big datasets containing rich causal relationships is becoming increasingly valuable for modeling and reasoning under uncertainties in many areas with big data gathered from sensors due to high volume and fast veracity. Most of the current BN structure learning algorithms have shortcomings facing big data. First, learning a BN structure from the entire big dataset is an expensive task which often ends in failure due to memory constraints.

Modeling Wildfire Behavior at the Continuum of Computing

Modeling Wildfire Behavior at the Continuum of Computing

This talk will review some of our recent work on building this dynamic data driven cyberinfrastructure and impactful application solution architectures that showcase integration of a variety of existing technologies and collaborative expertise. The lessons learned from the development of the NSF WIFIRE cyberinfrastructure will be summarized. Open data issues, use of edge and cloud computing on top of high-speed network, reproducibility through containerization and automated workflow provenance will also be discussed in the context of WIFIRE.

The Evolution of Bits and Bottlenecks in a Scientific Workflow Trying to Keep Up with Technology: Accelerating 4D Image Segmentation Applied to NASA Data

The Evolution of Bits and Bottlenecks in a Scientific Workflow Trying to Keep Up with Technology: Accelerating 4D Image Segmentation Applied to NASA Data

In 2016, a team of earth scientists directly engaged a team of computer scientists to identify cyberinfrastructure (CI) approaches that would speed up an earth science workflow. This paper describes the evolution of that workflow as the two teams bridged CI and an image segmentation algorithm to do large scale earth science research. The Pacific Research Platform (PRP) and The Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI) resources were used to significantly decreased the earth science workflow's wall-clock time from 19.5 days to 53 minutes.

Enabling FAIR Research in Earth Science Through Research Objects

Enabling FAIR Research in Earth Science Through Research Objects

Data-intensive science communities are progressively adopting FAIR practices that enhance the visibility of scientific breakthroughs and enable reuse. At the core of this movement, research objects contain and describe scientific information and resources in a way compliant with the FAIR principles and sustain the development of key infrastructure and tools. This paper provides an account of the challenges, experiences and solutions involved in the adoption of FAIR around research objects over several Earth Science disciplines.

A Demonstration of Modularity, Reuse, Reproducibility, Portability and Scalability for Modeling and Simulation of Cardiac Electrophysiology Using Kepler Workflows

A Demonstration of Modularity, Reuse, Reproducibility, Portability and Scalability for Modeling and Simulation of Cardiac Electrophysiology Using Kepler Workflows

Multi-scale computational modeling is a major branch of computational biology as evidenced by the US federal interagency Multi-Scale Modeling Consortium and major international projects. It invariably involves specific and detailed sequences of data analysis and simulation, often with multiple tools and datasets, and the community recognizes improved modularity, reuse, reproducibility, portability and scalability as critical unmet needs in this area. Scientific workflows are a well-recognized strategy for addressing these needs in scientific computing.

Assessing the Rohingya Displacement Crisis Using Satellite Data and Convolutional Neural Networks

Assessing the Rohingya Displacement Crisis Using Satellite Data and Convolutional Neural Networks

Through the benefits of machine learning, we could quantify an increase in built-up area from 0.4 km in January 2016 to 9.5 km in February 2018 replacing primarily shrub and farmland. We are further able to detect a densification and consequently 'browning' of the refugee camp over time and display its heterogeneous structure. The developed method is scalable, and applicable to rapidly expanding settlements across various regions.

Using Dynamic Data Driven Cyberinfrastructure for Next Generation Disaster Intelligence

Using Dynamic Data Driven Cyberinfrastructure for Next Generation Disaster Intelligence

Wildland fires and related hazards are increasing globally. A common observation across these large events is that fire behavior is changing to be more destructive, making applied fire research more important and time critical. Significant improvements towards modeling of the extent and dynamics of evolving plethora of fire related environmental hazards, and their socio-economic and human impacts can be made through intelligent integration of modern data and computing technologies with techniques for data management, machine learning and fire modeling.

Scalable Workflow-Driven Hydrologic Analysis in HydroFrame

Scalable Workflow-Driven Hydrologic Analysis in HydroFrame

The HydroFrame project is a community platform designed to facilitate integrated hydrologic modeling across the US. As a part of HydroFrame, we seek to design innovative workflow solutions that create pathways to enable hydrologic analysis for three target user groups: the modeler, the analyzer, and the domain science educator. We present the initial progress on the HydroFrame community platform using an automated Kepler workflow. This workflow performs end-to-end hydrology simulations involving data ingestion, preprocessing, analysis, modeling, and visualization.

NeuroKube: An Automated and Autoscaling Neuroimaging Reconstruction Framework using Cloud Native Computing and A.I.

NeuroKube: An Automated and Autoscaling Neuroimaging Reconstruction Framework using Cloud Native Computing and A.I.

The neuroscience domain stands out from the field of sciences for its dependence on the study and characterization of complex, intertwining structures. Understanding the complexity of the brain has led to widespread advances in the structure of large-scale computing resources and the design of artificially intelligent analysis systems. However, the scale of problems and data generated continues to grow and outpace the standards and practices of neuroscience.

Automated Early Detection of Wildfire Smoke Using Deep Learning with Combined Spatial-Temporal Information

Automated Early Detection of Wildfire Smoke Using Deep Learning with Combined Spatial-Temporal Information

We propose incorporating both spatial and temporal information via a combined CNN-LSTM classification model. We theorize that the inclusion of temporal information may reduce the number of false positives and improve generalizability to new environments. The model is trained and tested on images of landscapes with and without smoke from the HPWREN tower network in southern California, part of the SAGE remote-sensing infrastructure. We use traditional CNN-based classifiers leveraged in past smoke detection literature as baselines to evaluate our model's performance.

Workflows Community Summit: Bringing the Scientific Workflows Community Together

Workflows Community Summit: Bringing the Scientific Workflows Community Together

Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure.

Workflows Community Summit: Advancing the State-of-the-Art of Scientific Workflows Management Systems Research and Development

Workflows Community Summit: Advancing the State-of-the-Art of Scientific Workflows Management Systems Research and Development

Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms.

TemPredict: A Big Data Analytical Platform for Scalable Exploration and Monitoring of Personalized Multimodal Data for COVID-19

TemPredict: A Big Data Analytical Platform for Scalable Exploration and Monitoring of Personalized Multimodal Data for COVID-19

A key takeaway from the COVID-19 crisis is the need for scalable methods and systems for ingestion of big data related to the disease, such as models of the virus, health surveys, and social data, and the ability to integrate and analyze the ingested data rapidly. One specific example is the use of the Internet of Things and wearables (i.e., the Oura ring) to collect large-scale individualized data (e.g., temperature and heart rate) continuously and to create personalized baselines for detection of disease symptoms.

Modular Performance Prediction for Scientific Workflows Using Machine Learning

Modular Performance Prediction for Scientific Workflows Using Machine Learning

Scientific workflows provide an opportunity for declarative computational experiment design in an intuitive and efficient way. A distributed workflow is typically executed on a variety of resources, and it uses a variety of computational algorithms or tools to achieve the desired outcomes. Such a variety imposes additional complexity in scheduling these workflows on large scale computers. As computation becomes more distributed, insights into expected workload that a workflow presents become critical for effective resource allocation.

Integrated End-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD)

Integrated End-to-End Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD)

This report details the accomplishments from the ASCR funded project “Integrated End-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows” under the award numbers FWP-66406 and DE-SC0012630, with a focus on the UC San Diego (Award No. DE-SC0012630) part of the accomplishments. We refer to the project as IPPD.

Expanse: Computing without Boundaries - Architecture, Deployment, and Early Operations Experiences of a Supercomputer Designed for the Rapid Evolution in Science and Engineering

Expanse: Computing without Boundaries - Architecture, Deployment, and Early Operations Experiences of a Supercomputer Designed for the Rapid Evolution in Science and Engineering

We describe the design motivation, architecture, deployment, and early operations of Expanse, a 5 Petaflop, heterogenous HPC system that entered production as an NSF-funded resource in December 2020 and will be operated on behalf of the national community for five years. Expanse will serve a broad range of computational science and engineering through a combination of standard batch-oriented services, and by extending the system to the broader CI ecosystem through science gateways, public cloud integration, support for high throughput computing, and composable systems.

Building Cyberinfrastructure for Translational Impact: The WIFIRE Example

Building Cyberinfrastructure for Translational Impact: The WIFIRE Example

This paper overviews the enablers and phases for translational cyberinfrastructure for data-driven applications. In particular, it summarizes the translational process of and the lessons learned from the development of the NSF WIFIRE cyberinfrastructure. WIFIRE is an end-to-end cyberinfrastructure for real-time data fusion and data-driven simulation, prediction, and visualization of wildfire behavior. WIFIRE’s real-time data products and modeling services are routinely accessed by fire research and emergency response communities for modeling as well as the public for situational awareness.

Autonomous Provenance to Drive Reproducibility in Computational Hydrology

Autonomous Provenance to Drive Reproducibility in Computational Hydrology

The Kepler-driven provenance framework provides an Autonomous Provenance Collection capability for Hydrologic research. The framework scales to capture model parameters, user actions, hardware specifications and facilitates quick retrieval for actionable insights, whether the scientist is handling a small watershed simulation or a large continental-scale problem.

Towards a Dynamic Composability Approach for Using Heterogeneous Systems in Remote Sensing

Towards a Dynamic Composability Approach for Using Heterogeneous Systems in Remote Sensing

Influenced by the advances in data and computing, the scientific practice increasingly involves machine learning and artificial intelligence driven methods which requires specialized capabilities at the system-, science- and service-level in addition to the conventional large-capacity supercomputing approaches. The latest distributed architectures built around the composability of data-centric applications led to the emergence of a new ecosystem for container coordination and integration.

Smart Connected Worker Edge Platform for Smart Manufacturing: Part 2—Implementation and On-Site Deployment Case Study

Smart Connected Worker Edge Platform for Smart Manufacturing: Part 2—Implementation and On-Site Deployment Case Study

In this paper, we describe specific deployments of the Smart Connected Worker (SCW) Edge Platform for Smart Manufacturing through implementation of four instructive real-world use cases that illustrate the role of people in a Smart Manufacturing paradigm through which affordable, scalable, accessible, and portable (ASAP) information technology (IT) acquires and contextualizes data into information for transmission to operation technologies (OT).

Smart Connected Worker Edge Platform for Smart Manufacturing: Part 1—Architecture and Platform Design

Smart Connected Worker Edge Platform for Smart Manufacturing: Part 1—Architecture and Platform Design

The challenge of sustainably producing goods and services for healthy living on a healthy planet requires simultaneous consideration of economic, societal, and environmental dimensions in manufacturing. Enabling technology for data driven manufacturing paradigms like Smart Manufacturing (a.k.a. Industry 4.0) serve as the technological backbone from which sustainable approaches to manufacturing can be implemented.

A Science-Enabled Virtual Reality Demonstration to Increase Social Acceptance of Prescribed Burns

A Science-Enabled Virtual Reality Demonstration to Increase Social Acceptance of Prescribed Burns

Increasing social acceptance of prescribed burns is an important element of ramping up these controlled burns to the scale required to effectively mitigate destructive wildfires through reduction of excessive fire fuel loads. As part of a Design Challenge, students created concept designs for physical or virtual installations that would increase public understanding and acceptance of prescribed burns as an important tool for ending devastating megafires. The proposals defined how the public would interact with the installation and the learning goals for participants.

Responding to Emerging Wildfires through Integration of NOAA Satellites with Real-Time Ground Intelligence

Responding to Emerging Wildfires through Integration of NOAA Satellites with Real-Time Ground Intelligence

This presentation discusses the process of delivering fire behavior forecasts on initial attack using earliest detections of fire from geostationary satellite data. The current GOES 16 and 17 satellites deliver rapid detections and the future GeoXO will increase the speed and accuracy of the earliest alerts. GeoXO will also deliver important information such as the radiative power of the fire detected, providing insight into the fire intensity.

Multimodal Wildland Fire Smoke Detection

Multimodal Wildland Fire Smoke Detection

Research has shown that climate change creates warmer temperatures and drier conditions, leading to longer wildfire seasons and increased wildfire risks in the United States. These factors have in turn led to increases in the frequency, extent, and severity of wildfires in recent years. Given the danger posed by wildland fires to people, property, wildlife, and the environment, there is an urgency to provide tools for effective wildfire management. Early detection of wildfires is essential to minimizing potentially catastrophic destruction.

Metrics from Wearable Devices as Candidate Predictors of Antibody Response Following Vaccination against COVID-19: Data from the Second TemPredict Study

Metrics from Wearable Devices as Candidate Predictors of Antibody Response Following Vaccination against COVID-19: Data from the Second TemPredict Study

There is significant variability in neutralizing antibody responses (which correlate with immune protection) after COVID-19 vaccination, but only limited information is available about predictors of these responses. We investigated whether device-generated summaries of physiological metrics collected by a wearable device correlated with post-vaccination levels of antibodies to the SARS-CoV-2 receptor-binding domain (RBD), the target of neutralizing antibodies generated by existing COVID-19 vaccines.

Machine Learning for Improved Post-Fire Debris Flow Likelihood Prediction

Machine Learning for Improved Post-Fire Debris Flow Likelihood Prediction

Timely prediction of debris flow probabilities in areas impacted by wildfires is crucial to mitigate public exposure to this hazard during post-fire rainstorms. This paper presents a machine learning approach to amend an existing dataset of post-fire debris flow events with additional features reflecting existing vegetation type and geology, and train traditional and deep learning methods on a randomly selected subset of the data.

FIgLib & SmokeyNet: Dataset and Deep Learning Model for Real-Time Wildland Fire Smoke Detection

FIgLib & SmokeyNet: Dataset and Deep Learning Model for Real-Time Wildland Fire Smoke Detection

The size and frequency of wildland fires in the western United States have dramatically increased in recent years. On high-fire-risk days, a small fire ignition can rapidly grow and become out of control. Early detection of fire ignitions from initial smoke can assist the response to such fires before they become difficult to manage. Past deep learning approaches for wildfire smoke detection have suffered from small or unreliable datasets that make it difficult to extrapolate performance to real-world scenarios.

Estimation of Wildfire Wind Conditions via Perimeter and Surface Area Optimization

Estimation of Wildfire Wind Conditions via Perimeter and Surface Area Optimization

This paper shows that the prediction capability of wildfire progression can be improved by estimation of a single prevailing wind vector parametrized by a wind speed and a wind direction to drive a wildfire simulation created by FARSITE. Estimations of these wind vectors are achieved in this work by a gradient-free optimization via a grid search that compares wildfire model simulations with measured wildfire perimeters, where noisy observations are modeled as uncertainties on the locations of the vertices of the measured wildfire perimeters.

Detection of COVID-19 Using Multimodal Data

Detection of COVID-19 Using Multimodal Data from a Wearable Device: Results from the First TemPredict Study

Early detection of diseases such as COVID-19 could be a critical tool in reducing disease transmission by helping individuals recognize when they should self-isolate, seek testing, and obtain early medical intervention. Consumer wearable devices that continuously measure physiological metrics hold promise as tools for early illness detection. We gathered daily questionnaire data and physiological data using a consumer wearable (Oura Ring) from 63,153 participants, of whom 704 self-reported possible COVID-19 disease.

WIFIRE and NESDIS User Engagement: Leveraging NOAA's Pathfinder Initiative to Develop Future Tools, Products and Services for Wildfire

WIFIRE and NESDIS User Engagement: Leveraging NOAA's Pathfinder Initiative to Develop Future Tools, Products and Services for Wildfire

WIFIRE Lab, from the University of California, San Diego, is among the first Pathfinders supporting the next generation of geostationary observations, GeoXO. NOAA plans for the Geostationary and Extended Orbits (GeoXO) Program to follow the Geostationary Operational Environmental Satellites (GOES) – R Series and Space Weather Follow-On (SWFO) missions in the 2030-2050 timeframe. This presentation will focus on the NOAA’s Pathfinder, WIFIRE lab, who supported the development of the synthetic data and exercise scenario as part of the predevelopment user engagement with GeoXO.

NOAA’s Pathfinder Value Chains

NOAA’s Pathfinder Value Chains

This presentation will present on how the NOAA Pathfinder value-chains aim to increase awareness to NOAA missions, products and services so that NESDIS can deliver maximum value to their users. This presentation will show two value chains (fire and oceans) that demonstrate how the Pathfinder value chains are used as a mechanism for incorporating user input into the development of the NOAA satellite lifecycle. This talk will also serve as an opportunity to recruit future NOAA Pathfinders.

Integrating plant physiology into simulation offire behavior and effects (2023)

Integrating Plant Physiology into Simulation of Fire Behavior and Effects

Wildfires are a global crisis, but current fire models fail to capture vegetation response to changing climate. With drought and elevated temperature increasing the importance of vegetation dynamics to fire behavior, and the advent of next generation models capable of capturing increasingly complex physical processes, we provide a renewed focus on representation of woody vegetation in fire models. Currently, the most advanced representations of fire behavior and biophysical fire effects are found in distinct classes of fine-scale models and do not capture variation in live fuel (i.e.

IPPD

IPPD: Integrated End-to-End Performance Prediction and Diagnosis for Extreme Scientific Workflows

Scientific workflows execute on a loosely connected set of distributed and heterogeneous computational resources. The Integrated End-to-End Performance Prediction and Diagnosis (IPPD) project contributes to clear understanding of the factors that influence the performance and potential optimization of scientific workflows. IPPD addressed three core issues in order to provide insights into workflow execution that can be used to both explain and optimize their execution:

NRP

National Research Platform

The National Research Platform (NRP), formerly known as the Pacific Research Platform (PRP), is a collaborative, multi-institutional effort to create a shared national infrastructure for data-driven research. Backed by the National Science Foundation (NSF) and the Department of Energy (DOE), the NRP provides high-performance computing resources, data storage and management services, and network connectivity capabilities to researchers across various disciplines (e.g., the earth sciences and health sciences).

CESMII

Clean Energy Smart Manufacturing Innovation Institute (CESMII)

The Clean Energy Smart Manufacturing Innovation Institute (CESMII) is a non-profit organization driving the transformation of the manufacturing industry toward a cleaner, more sustainable future. The US Department of Energy's (DOE) Clean Energy Manufacturing Initiative declared CESMII a Manufacturing Innovation Institutes in 2016.

WIFIRE

WIFIRE: Workflows Integrating Collaborative Hazard Sciences

The WIFIRE CI (cyberinfrastructure) builds an integrated system for wildfire analysis by combining satellite and remote sensor data with computational techniques to monitor weather patterns and predict wildfire spread in real-time. The WIFIRE Lab, powered by this CI and housed at the San Diego Supercomputer Center, UCSD, was founded in 2013 and is composed of various platforms and efforts, including:

Sage (A Software-Defined Sensor Network)

Sage: A Software-Defined Sensor Network

The Sage project is a National Science Foundation (NSF)-backed endeavor, led by Northwestern University since 2019. The project focuses on harnessing the latest edge computing technologies and methods to create a programmable, reusable network of smart, AI-based sensors at the edge for various applications, e.g., tracking smoke plume dispersion during wildfires. Leveraging our expertise in cyberinfrastructure development and data architecture, we have been working towards the robust development of several pieces of the Sage sensor network, including: