ML & Deep Learning Applications

Left Ventricle Segmentation and Volume Estimation on Cardiac MRI Using Deep Learning

Left Ventricle Segmentation and Volume Estimation on Cardiac MRI Using Deep Learning

In the United States, heart disease is the leading cause of death for both men and women, accounting for 610,000 deaths each year. Physicians use Magnetic Resonance Imaging (MRI) scans to take images of the heart in order to non-invasively estimate its structural and functional parameters for cardiovascular diagnosis and disease management. The end-systolic volume (ESV) and end-diastolic volume (EDV) of the left ventricle (LV), and the ejection fraction (EF) are indicators of heart disease.

Analytics Pipeline for Left Ventricle Segmentation and Volume Estimation on Cardiac MRI Using Deep Learning

Analytics Pipeline for Left Ventricle Segmentation and Volume Estimation on Cardiac MRI Using Deep Learning

The left ventricle (LV) is the largest chamber in the heart and plays a critical role in cardiac function. Noninvasive cardiac imaging modalities (e.g., cardiac magnetic resonance (CMR), transesophageal echocardiography (TEE), and computed tomography (CT)) are commonly used to study LV size and function in addition to other cardiac structural aspects such as valvular disease, and are invaluable tools for the diagnosis and management of heart disease. However, the process of analyzing cardiac images is time-consuming and labor-intensive.

Land Cover Classification at the Wildland Urban Interface Using High-Resolution Satellite Imagery and Deep Learning

Land Cover Classification at the Wildland Urban Interface Using High-Resolution Satellite Imagery and Deep Learning

Land cover classification analysis from satellite imagery is important for monitoring change in ecosystems and urban growth over time. However, the land cover classifications that are widely available in the United States are generated at a low spatial and temporal resolution, so that the spatial distribution between vegetation and urban areas in the wildland urban interface is difficult to measure. High spatial and temporal resolution analysis is essential for understanding and managing changing environments in these regions.

Scalable Detection of Rural Schools in Africa Using Convolutional Neural Networks and Satellite Imagery

Scalable Detection of Rural Schools in Africa Using Convolutional Neural Networks and Satellite Imagery

Many countries typically lack sufficient civic data to assess where and what challenges communities face. High resolution satellite images can provide honest assessments of neighborhoods and communities to guide aid workers, policy makers, private sector, and philanthropists. Although humans are very good at detecting patterns, manually inspecting high resolution satellite imagery at scale can be costly and time consuming. Machine learning has the potential to scale this process significantly and automate the detection of regions of interest.

Workflow-Driven Distributed Machine Learning in CHASE-CI: A Cognitive Hardware and Software Ecosystem Community Infrastructure

Workflow-Driven Distributed Machine Learning in CHASE-CI: A Cognitive Hardware and Software Ecosystem Community Infrastructure

The advances in data, computing and networking over the last two decades led to a shift in many application domains that includes machine learning on big data as a part of the scientific process, requiring new capabilities for integrated and distributed hardware and software infrastructure. This paper contributes a workflow-driven approach for dynamic data-driven application development on top of a new kind of networked Cyberinfrastructure called CHASE-CI.

Understanding a Rapidly Expanding Refugee Camp Using Convolutional Neural Networks and Satellite Imagery

Understanding a Rapidly Expanding Refugee Camp Using Convolutional Neural Networks and Satellite Imagery

In summer 2017, close to one million Rohingya, an ethnic minority group in Myanmar, have fled to Bangladesh due to the persecution of Muslims. This large influx of refugees has resided around existing refugee camps. Because of this dramatic expansion, the newly established Kutupalong-Balukhali expansion site lacked basic infrastructure and public service.

Toward a Methodology and Framework for Workflow-Driven Team Science

Toward a Methodology and Framework for Workflow-Driven Team Science

Scientific workflows are powerful tools for the management of scalable experiments, often composed of complex tasks running on distributed resources. Existing cyberinfrastructure provides components that can be utilized within repeatable workflows. However, data and computing advances continuously change the way scientific workflows get developed and executed, pushing the scientific activity to be more data-driven, heterogeneous, and collaborative.

Scaling Deep Learning-Based Analysis of High-Resolution Satellite Imagery with Distributed Processing

Scaling Deep Learning-Based Analysis of High-Resolution Satellite Imagery with Distributed Processing

High-resolution satellite imagery is a rich source of data applicable to a variety of domains, ranging from demo-graphics and land use to agriculture and hazard assessment. We have developed an end-to-end analysis pipeline that uses deep learning and unsupervised learning to process high-resolution satellite imagery and have applied it to various applications in previous work. As high-resolution satellite imagery is large-volume data, scalability is important to be able to analyze data from large geographical areas.

PEnBayes: A Multi-Layered Ensemble Approach for Learning Bayesian Network Structure from Big Data

PEnBayes: A Multi-Layered Ensemble Approach for Learning Bayesian Network Structure from Big Data

Discovering the Bayesian network (BN) structure from big datasets containing rich causal relationships is becoming increasingly valuable for modeling and reasoning under uncertainties in many areas with big data gathered from sensors due to high volume and fast veracity. Most of the current BN structure learning algorithms have shortcomings facing big data. First, learning a BN structure from the entire big dataset is an expensive task which often ends in failure due to memory constraints.

Modeling Wildfire Behavior at the Continuum of Computing

Modeling Wildfire Behavior at the Continuum of Computing

This talk will review some of our recent work on building this dynamic data driven cyberinfrastructure and impactful application solution architectures that showcase integration of a variety of existing technologies and collaborative expertise. The lessons learned from the development of the NSF WIFIRE cyberinfrastructure will be summarized. Open data issues, use of edge and cloud computing on top of high-speed network, reproducibility through containerization and automated workflow provenance will also be discussed in the context of WIFIRE.

The Evolution of Bits and Bottlenecks in a Scientific Workflow Trying to Keep Up with Technology: Accelerating 4D Image Segmentation Applied to NASA Data

The Evolution of Bits and Bottlenecks in a Scientific Workflow Trying to Keep Up with Technology: Accelerating 4D Image Segmentation Applied to NASA Data

In 2016, a team of earth scientists directly engaged a team of computer scientists to identify cyberinfrastructure (CI) approaches that would speed up an earth science workflow. This paper describes the evolution of that workflow as the two teams bridged CI and an image segmentation algorithm to do large scale earth science research. The Pacific Research Platform (PRP) and The Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI) resources were used to significantly decreased the earth science workflow's wall-clock time from 19.5 days to 53 minutes.

End-to-End Workflow-Driven Hydrologic Analysis for Different User Groups in HydroFrame

End-to-End Workflow-Driven Hydrologic Analysis for Different User Groups in HydroFrame

We present the initial progress on the HydroFrame community platform using an automated Kepler workflow that performs end-to-end hydrology simulations involving data ingestion, preprocessing, analysis, modeling, and visualization. We will demonstrate how different modules of workflow can be reused and repurposed for the three target user groups. Moreover, the Kepler workflow ensures complete reproducibility through a built-in provenance framework that collects workflow specific parameters, software versions and hardware system configuration.

A Demonstration of Modularity, Reuse, Reproducibility, Portability and Scalability for Modeling and Simulation of Cardiac Electrophysiology Using Kepler Workflows

A Demonstration of Modularity, Reuse, Reproducibility, Portability and Scalability for Modeling and Simulation of Cardiac Electrophysiology Using Kepler Workflows

Multi-scale computational modeling is a major branch of computational biology as evidenced by the US federal interagency Multi-Scale Modeling Consortium and major international projects. It invariably involves specific and detailed sequences of data analysis and simulation, often with multiple tools and datasets, and the community recognizes improved modularity, reuse, reproducibility, portability and scalability as critical unmet needs in this area. Scientific workflows are a well-recognized strategy for addressing these needs in scientific computing.

Cardiac MRI Image Segmentation for Left Ventricle and Right Ventricle Using Deep Learning

Cardiac MRI Image Segmentation for Left Ventricle and Right Ventricle Using Deep Learning

The goal of this project is to use magnetic resonance imaging (MRI) data to provide an end-to-end analytics pipeline for left and right ventricle (LV and RV) segmentation. Another aim of the project is to find a model that would be generalizable across medical imaging datasets. We utilized a variety of models, datasets, and tests to determine which one is well suited to this purpose.

Assessing the Rohingya Displacement Crisis Using Satellite Data and Convolutional Neural Networks

Assessing the Rohingya Displacement Crisis Using Satellite Data and Convolutional Neural Networks

Through the benefits of machine learning, we could quantify an increase in built-up area from 0.4 km in January 2016 to 9.5 km in February 2018 replacing primarily shrub and farmland. We are further able to detect a densification and consequently 'browning' of the refugee camp over time and display its heterogeneous structure. The developed method is scalable, and applicable to rapidly expanding settlements across various regions.

Scalable Workflow-Driven Hydrologic Analysis in HydroFrame

Scalable Workflow-Driven Hydrologic Analysis in HydroFrame

The HydroFrame project is a community platform designed to facilitate integrated hydrologic modeling across the US. As a part of HydroFrame, we seek to design innovative workflow solutions that create pathways to enable hydrologic analysis for three target user groups: the modeler, the analyzer, and the domain science educator. We present the initial progress on the HydroFrame community platform using an automated Kepler workflow. This workflow performs end-to-end hydrology simulations involving data ingestion, preprocessing, analysis, modeling, and visualization.

Recursive Updates of Wildfire Perimeters Using Barrier Points and Ensemble Kalman Filtering

Recursive Updates of Wildfire Perimeters Using Barrier Points and Ensemble Kalman Filtering

This paper shows how the wildfire simulation tool FARSITE is augmented with data assimilation capabilities that exploit the notion of barrier points and a constraint-point ensemble Kalman filtering to update wildfire perimeter predictions. Based on observations of the actual fire perimeter, stationary points on the fire perimeter are identified as barrier points and combined with a recursive update of the initial fire perimeter.

NeuroKube: An Automated and Autoscaling Neuroimaging Reconstruction Framework using Cloud Native Computing and A.I.

NeuroKube: An Automated and Autoscaling Neuroimaging Reconstruction Framework using Cloud Native Computing and A.I.

The neuroscience domain stands out from the field of sciences for its dependence on the study and characterization of complex, intertwining structures. Understanding the complexity of the brain has led to widespread advances in the structure of large-scale computing resources and the design of artificially intelligent analysis systems. However, the scale of problems and data generated continues to grow and outpace the standards and practices of neuroscience.

Automated Early Detection of Wildfire Smoke Using Deep Learning with Combined Spatial-Temporal Information

Automated Early Detection of Wildfire Smoke Using Deep Learning with Combined Spatial-Temporal Information

We propose incorporating both spatial and temporal information via a combined CNN-LSTM classification model. We theorize that the inclusion of temporal information may reduce the number of false positives and improve generalizability to new environments. The model is trained and tested on images of landscapes with and without smoke from the HPWREN tower network in southern California, part of the SAGE remote-sensing infrastructure. We use traditional CNN-based classifiers leveraged in past smoke detection literature as baselines to evaluate our model's performance.

Workflows Community Summit: Bringing the Scientific Workflows Community Together

Workflows Community Summit: Bringing the Scientific Workflows Community Together

Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions must be managed using some software infrastructure.

Workflows Community Summit: Advancing the State-of-the-Art of Scientific Workflows Management Systems Research and Development

Workflows Community Summit: Advancing the State-of-the-Art of Scientific Workflows Management Systems Research and Development

Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms.

TemPredict: A Big Data Analytical Platform for Scalable Exploration and Monitoring of Personalized Multimodal Data for COVID-19

TemPredict: A Big Data Analytical Platform for Scalable Exploration and Monitoring of Personalized Multimodal Data for COVID-19

A key takeaway from the COVID-19 crisis is the need for scalable methods and systems for ingestion of big data related to the disease, such as models of the virus, health surveys, and social data, and the ability to integrate and analyze the ingested data rapidly. One specific example is the use of the Internet of Things and wearables (i.e., the Oura ring) to collect large-scale individualized data (e.g., temperature and heart rate) continuously and to create personalized baselines for detection of disease symptoms.

Quantum Data Hub: A Collaborative Data and Analysis Platform for Quantum Material Science

Quantum Data Hub: A Collaborative Data and Analysis Platform for Quantum Material Science

Quantum materials research is a rapidly growing domain of materials research, seeking novel compounds whose electronic properties are born from the uniquely quantum aspects of their constituent electrons. The data from this rapidly evolving area of quantum materials requires a new community-driven approach for collaboration and sharing the data from the end-to-end quantum material process.

Perspectives on Automated Composition of Workflows in the Life Sciences

Perspectives on Automated Composition of Workflows in the Life Sciences

Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences.

Modular Performance Prediction for Scientific Workflows Using Machine Learning

Modular Performance Prediction for Scientific Workflows Using Machine Learning

Scientific workflows provide an opportunity for declarative computational experiment design in an intuitive and efficient way. A distributed workflow is typically executed on a variety of resources, and it uses a variety of computational algorithms or tools to achieve the desired outcomes. Such a variety imposes additional complexity in scheduling these workflows on large scale computers. As computation becomes more distributed, insights into expected workload that a workflow presents become critical for effective resource allocation.

Integrated End-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD)

Integrated End-to-End Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD)

This report details the accomplishments from the ASCR funded project “Integrated End-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows” under the award numbers FWP-66406 and DE-SC0012630, with a focus on the UC San Diego (Award No. DE-SC0012630) part of the accomplishments. We refer to the project as IPPD.

Improving Wildfire Simulations by Estimation of Wildfire Wind Conditions from Fire Perimeter Measurements

Improving Wildfire Simulations by Estimation of Wildfire Wind Conditions from Fire Perimeter Measurements

This paper shows how a gradient-free optimization method is used to improve the prediction capabilities of wildfire progression by estimating the wind conditions driving a FARSITE wildfire model. To characterize the performance of the prediction of the perimeter as a function of the wind conditions, an uncertainty weighting is applied to each vertex of the measured fire perimeter and a weighted least-squares error is computed between the predicted and measured fire perimeter.

Expanse: Computing without Boundaries - Architecture, Deployment, and Early Operations Experiences of a Supercomputer Designed for the Rapid Evolution in Science and Engineering

Expanse: Computing without Boundaries - Architecture, Deployment, and Early Operations Experiences of a Supercomputer Designed for the Rapid Evolution in Science and Engineering

We describe the design motivation, architecture, deployment, and early operations of Expanse, a 5 Petaflop, heterogenous HPC system that entered production as an NSF-funded resource in December 2020 and will be operated on behalf of the national community for five years. Expanse will serve a broad range of computational science and engineering through a combination of standard batch-oriented services, and by extending the system to the broader CI ecosystem through science gateways, public cloud integration, support for high throughput computing, and composable systems.

A Community Roadmap for Scientific Workflows Research and Development

A Community Roadmap for Scientific Workflows Research and Development

The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolated research claims, and a steep learning curve. To address some of these challenges and lay the groundwork for transforming workflows research and development, the WorkflowsRI and ExaWorks projects partnered to bring the international workflows community together. This paper reports on discussions and findings from two virtual “Workflows Community Summits” (January and April, 2021).

Building Cyberinfrastructure for Translational Impact: The WIFIRE Example

Building Cyberinfrastructure for Translational Impact: The WIFIRE Example

This paper overviews the enablers and phases for translational cyberinfrastructure for data-driven applications. In particular, it summarizes the translational process of and the lessons learned from the development of the NSF WIFIRE cyberinfrastructure. WIFIRE is an end-to-end cyberinfrastructure for real-time data fusion and data-driven simulation, prediction, and visualization of wildfire behavior. WIFIRE’s real-time data products and modeling services are routinely accessed by fire research and emergency response communities for modeling as well as the public for situational awareness.

Towards a Dynamic Composability Approach for Using Heterogeneous Systems in Remote Sensing

Towards a Dynamic Composability Approach for Using Heterogeneous Systems in Remote Sensing

Influenced by the advances in data and computing, the scientific practice increasingly involves machine learning and artificial intelligence driven methods which requires specialized capabilities at the system-, science- and service-level in addition to the conventional large-capacity supercomputing approaches. The latest distributed architectures built around the composability of data-centric applications led to the emergence of a new ecosystem for container coordination and integration.

Multimodal Wildland Fire Smoke Detection

Multimodal Wildland Fire Smoke Detection

Research has shown that climate change creates warmer temperatures and drier conditions, leading to longer wildfire seasons and increased wildfire risks in the United States. These factors have in turn led to increases in the frequency, extent, and severity of wildfires in recent years. Given the danger posed by wildland fires to people, property, wildlife, and the environment, there is an urgency to provide tools for effective wildfire management. Early detection of wildfires is essential to minimizing potentially catastrophic destruction.

Machine Learning for Improved Post-Fire Debris Flow Likelihood Prediction

Machine Learning for Improved Post-Fire Debris Flow Likelihood Prediction

Timely prediction of debris flow probabilities in areas impacted by wildfires is crucial to mitigate public exposure to this hazard during post-fire rainstorms. This paper presents a machine learning approach to amend an existing dataset of post-fire debris flow events with additional features reflecting existing vegetation type and geology, and train traditional and deep learning methods on a randomly selected subset of the data.

FIgLib & SmokeyNet: Dataset and Deep Learning Model for Real-Time Wildland Fire Smoke Detection

FIgLib & SmokeyNet: Dataset and Deep Learning Model for Real-Time Wildland Fire Smoke Detection

The size and frequency of wildland fires in the western United States have dramatically increased in recent years. On high-fire-risk days, a small fire ignition can rapidly grow and become out of control. Early detection of fire ignitions from initial smoke can assist the response to such fires before they become difficult to manage. Past deep learning approaches for wildfire smoke detection have suffered from small or unreliable datasets that make it difficult to extrapolate performance to real-world scenarios.

Estimation of Wildfire Wind Conditions via Perimeter and Surface Area Optimization

Estimation of Wildfire Wind Conditions via Perimeter and Surface Area Optimization

This paper shows that the prediction capability of wildfire progression can be improved by estimation of a single prevailing wind vector parametrized by a wind speed and a wind direction to drive a wildfire simulation created by FARSITE. Estimations of these wind vectors are achieved in this work by a gradient-free optimization via a grid search that compares wildfire model simulations with measured wildfire perimeters, where noisy observations are modeled as uncertainties on the locations of the vertices of the measured wildfire perimeters.

Enabling AI Innovation via Data and Model Sharing: An Overview of the NSF Convergence Accelerator Track D

Enabling AI Innovation via Data and Model Sharing: An Overview of the NSF Convergence Accelerator Track D

This article provides a brief overview of 18 projects funded in Track D—Data and Model Sharing to Enable AI Innovation—of the 2020 Cohort of the National Science Foundation's (NSF) Convergence Accelerator (CA) program. The NSF CA is focused on transitioning research to practice for societal impact. The projects described here were funded for one year in phase I of the program, beginning September 2020. Their focus is on delivering tools, technologies, and techniques to assist in sharing data as well as data-driven models to enable AI innovation.

Detection of COVID-19 Using Multimodal Data

Detection of COVID-19 Using Multimodal Data from a Wearable Device: Results from the First TemPredict Study

Early detection of diseases such as COVID-19 could be a critical tool in reducing disease transmission by helping individuals recognize when they should self-isolate, seek testing, and obtain early medical intervention. Consumer wearable devices that continuously measure physiological metrics hold promise as tools for early illness detection. We gathered daily questionnaire data and physiological data using a consumer wearable (Oura Ring) from 63,153 participants, of whom 704 self-reported possible COVID-19 disease.

Integrating plant physiology into simulation offire behavior and effects (2023)

Integrating Plant Physiology into Simulation of Fire Behavior and Effects

Wildfires are a global crisis, but current fire models fail to capture vegetation response to changing climate. With drought and elevated temperature increasing the importance of vegetation dynamics to fire behavior, and the advent of next generation models capable of capturing increasingly complex physical processes, we provide a renewed focus on representation of woody vegetation in fire models. Currently, the most advanced representations of fire behavior and biophysical fire effects are found in distinct classes of fine-scale models and do not capture variation in live fuel (i.e.

Quantum Foundry

Quantum Foundry

The Quantum Foundry is a collaborative research center headquartered at the University of California, Santa Barbara (UCSB), focused on advancing the field of quantum science and engineering through the development of new materials and devices for use in quantum technologies. The foundational infrastructure of the center's various initiatives is one that enables smart, national-scale materials science and manufacturing.

WIFIRE

WIFIRE: Workflows Integrating Collaborative Hazard Sciences

The WIFIRE CI (cyberinfrastructure) builds an integrated system for wildfire analysis by combining satellite and remote sensor data with computational techniques to monitor weather patterns and predict wildfire spread in real-time. The WIFIRE Lab, powered by this CI and housed at the San Diego Supercomputer Center, UCSD, was founded in 2013 and is composed of various platforms and efforts, including:

Sage (A Software-Defined Sensor Network)

Sage: A Software-Defined Sensor Network

The Sage project is a National Science Foundation (NSF)-backed endeavor, led by Northwestern University since 2019. The project focuses on harnessing the latest edge computing technologies and methods to create a programmable, reusable network of smart, AI-based sensors at the edge for various applications, e.g., tracking smoke plume dispersion during wildfires. Leveraging our expertise in cyberinfrastructure development and data architecture, we have been working towards the robust development of several pieces of the Sage sensor network, including: 

TemPredict

TemPredict

The TemPredict initiative was spearheaded at UCSF in 2020, to bring together experts in a variety of disciplines, including machine learning and epidemiology, to forecast COVID-19 cases, and track the progression and spread of the virus. The initiative received seed money from the health tech company, Ōura, who also provided the wearable technology (i.e., the Ōura ring), to collect the personalized health data (e.g., body temperature, heart rate, etc.) of TemPredict study participants.