Kepler

Workflow-Driven Distributed Machine Learning in CHASE-CI: A Cognitive Hardware and Software Ecosystem Community Infrastructure

Workflow-Driven Distributed Machine Learning in CHASE-CI: A Cognitive Hardware and Software Ecosystem Community Infrastructure

The advances in data, computing and networking over the last two decades led to a shift in many application domains that includes machine learning on big data as a part of the scientific process, requiring new capabilities for integrated and distributed hardware and software infrastructure. This paper contributes a workflow-driven approach for dynamic data-driven application development on top of a new kind of networked Cyberinfrastructure called CHASE-CI.

Toward a Methodology and Framework for Workflow-Driven Team Science

Toward a Methodology and Framework for Workflow-Driven Team Science

Scientific workflows are powerful tools for the management of scalable experiments, often composed of complex tasks running on distributed resources. Existing cyberinfrastructure provides components that can be utilized within repeatable workflows. However, data and computing advances continuously change the way scientific workflows get developed and executed, pushing the scientific activity to be more data-driven, heterogeneous, and collaborative.

PEnBayes: A Multi-Layered Ensemble Approach for Learning Bayesian Network Structure from Big Data

PEnBayes: A Multi-Layered Ensemble Approach for Learning Bayesian Network Structure from Big Data

Discovering the Bayesian network (BN) structure from big datasets containing rich causal relationships is becoming increasingly valuable for modeling and reasoning under uncertainties in many areas with big data gathered from sensors due to high volume and fast veracity. Most of the current BN structure learning algorithms have shortcomings facing big data. First, learning a BN structure from the entire big dataset is an expensive task which often ends in failure due to memory constraints.

HydroFrame: A Software Framework to Enable Continental Scale Hydrologic Simulation

HydroFrame: A Software Framework to Enable Continental Scale Hydrologic Simulation

The goal of the HydroFrame project is to provide a community framework for sophisticated high resolution hydrologic simulation across the entire continental US. To accomplish this we are building an integrated software framework for continental scale hydrologic simulation and data analysis built with multi-scale configurable components. The multi-scale requirements of this domain drive the design of the proposed framework.

End-to-End Workflow-Driven Hydrologic Analysis for Different User Groups in HydroFrame

End-to-End Workflow-Driven Hydrologic Analysis for Different User Groups in HydroFrame

We present the initial progress on the HydroFrame community platform using an automated Kepler workflow that performs end-to-end hydrology simulations involving data ingestion, preprocessing, analysis, modeling, and visualization. We will demonstrate how different modules of workflow can be reused and repurposed for the three target user groups. Moreover, the Kepler workflow ensures complete reproducibility through a built-in provenance framework that collects workflow specific parameters, software versions and hardware system configuration.

A Demonstration of Modularity, Reuse, Reproducibility, Portability and Scalability for Modeling and Simulation of Cardiac Electrophysiology Using Kepler Workflows

A Demonstration of Modularity, Reuse, Reproducibility, Portability and Scalability for Modeling and Simulation of Cardiac Electrophysiology Using Kepler Workflows

Multi-scale computational modeling is a major branch of computational biology as evidenced by the US federal interagency Multi-Scale Modeling Consortium and major international projects. It invariably involves specific and detailed sequences of data analysis and simulation, often with multiple tools and datasets, and the community recognizes improved modularity, reuse, reproducibility, portability and scalability as critical unmet needs in this area. Scientific workflows are a well-recognized strategy for addressing these needs in scientific computing.

Scalable Workflow-Driven Hydrologic Analysis in HydroFrame

Scalable Workflow-Driven Hydrologic Analysis in HydroFrame

The HydroFrame project is a community platform designed to facilitate integrated hydrologic modeling across the US. As a part of HydroFrame, we seek to design innovative workflow solutions that create pathways to enable hydrologic analysis for three target user groups: the modeler, the analyzer, and the domain science educator. We present the initial progress on the HydroFrame community platform using an automated Kepler workflow. This workflow performs end-to-end hydrology simulations involving data ingestion, preprocessing, analysis, modeling, and visualization.

Quantum Data Hub: A Collaborative Data and Analysis Platform for Quantum Material Science

Quantum Data Hub: A Collaborative Data and Analysis Platform for Quantum Material Science

Quantum materials research is a rapidly growing domain of materials research, seeking novel compounds whose electronic properties are born from the uniquely quantum aspects of their constituent electrons. The data from this rapidly evolving area of quantum materials requires a new community-driven approach for collaboration and sharing the data from the end-to-end quantum material process.

Modular Performance Prediction for Scientific Workflows Using Machine Learning

Modular Performance Prediction for Scientific Workflows Using Machine Learning

Scientific workflows provide an opportunity for declarative computational experiment design in an intuitive and efficient way. A distributed workflow is typically executed on a variety of resources, and it uses a variety of computational algorithms or tools to achieve the desired outcomes. Such a variety imposes additional complexity in scheduling these workflows on large scale computers. As computation becomes more distributed, insights into expected workload that a workflow presents become critical for effective resource allocation.

Integrated End-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD)

Integrated End-to-End Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD)

This report details the accomplishments from the ASCR funded project “Integrated End-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows” under the award numbers FWP-66406 and DE-SC0012630, with a focus on the UC San Diego (Award No. DE-SC0012630) part of the accomplishments. We refer to the project as IPPD.

Building Cyberinfrastructure for Translational Impact: The WIFIRE Example

Building Cyberinfrastructure for Translational Impact: The WIFIRE Example

This paper overviews the enablers and phases for translational cyberinfrastructure for data-driven applications. In particular, it summarizes the translational process of and the lessons learned from the development of the NSF WIFIRE cyberinfrastructure. WIFIRE is an end-to-end cyberinfrastructure for real-time data fusion and data-driven simulation, prediction, and visualization of wildfire behavior. WIFIRE’s real-time data products and modeling services are routinely accessed by fire research and emergency response communities for modeling as well as the public for situational awareness.

Autonomous Provenance to Drive Reproducibility in Computational Hydrology

Autonomous Provenance to Drive Reproducibility in Computational Hydrology

The Kepler-driven provenance framework provides an Autonomous Provenance Collection capability for Hydrologic research. The framework scales to capture model parameters, user actions, hardware specifications and facilitates quick retrieval for actionable insights, whether the scientist is handling a small watershed simulation or a large continental-scale problem.

Smart Connected Worker Edge Platform for Smart Manufacturing: Part 2—Implementation and On-Site Deployment Case Study

Smart Connected Worker Edge Platform for Smart Manufacturing: Part 2—Implementation and On-Site Deployment Case Study

In this paper, we describe specific deployments of the Smart Connected Worker (SCW) Edge Platform for Smart Manufacturing through implementation of four instructive real-world use cases that illustrate the role of people in a Smart Manufacturing paradigm through which affordable, scalable, accessible, and portable (ASAP) information technology (IT) acquires and contextualizes data into information for transmission to operation technologies (OT).

Smart Connected Worker Edge Platform for Smart Manufacturing: Part 1—Architecture and Platform Design

Smart Connected Worker Edge Platform for Smart Manufacturing: Part 1—Architecture and Platform Design

The challenge of sustainably producing goods and services for healthy living on a healthy planet requires simultaneous consideration of economic, societal, and environmental dimensions in manufacturing. Enabling technology for data driven manufacturing paradigms like Smart Manufacturing (a.k.a. Industry 4.0) serve as the technological backbone from which sustainable approaches to manufacturing can be implemented.

HydroFrame Infrastructure: Developments in the Software Behind a National Hydrologic Modeling Framework

HydroFrame Infrastructure: Developments in the Software Behind a National Hydrologic Modeling Framework

The HydroFrame project combines cutting-edge environmental modeling approaches with modern software principals to build an end-to-end workflow for regional and continental scale scientific applications, by enabling modelers to extract static datasets from continental datasets and execute them using high performance computing hardware hosted at Princeton University. In prior work we have provided the capability for users to extract domain data for the ParFlow model at local scales and execute them using freely accessible cloud computing services (i.e. MyBinder.org).

HydroFrame

HydroFrame

The HydroFrame is a community platform that facilitates integrated hydrologic modeling across the United States. We design innovative workflow solutions that create pathways to enable hydrologic analysis for three target uses: modeling, analysis, and domain science. As part of our contribution to HydroFrame, we run HydroFrame workflows in the Kepler system, utilizing its automated workflow capabilities to perform end-to-end hydrology simulations involving data ingestion, preprocessing, analysis, modeling, and visualization.

The Kepler Project

The Kepler Project

The Kepler Project supports the use and development of the free, open source Kepler Scientific Workflow System. This system helps scientists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines. The Kepler Scientific Workflow System can operate on data stored locally and over the internet.

WIFIRE

WIFIRE: Workflows Integrating Collaborative Hazard Sciences

The WIFIRE CI (cyberinfrastructure) builds an integrated system for wildfire analysis by combining satellite and remote sensor data with computational techniques to monitor weather patterns and predict wildfire spread in real-time. The WIFIRE Lab, powered by this CI and housed at the San Diego Supercomputer Center, UCSD, was founded in 2013 and is composed of various platforms and efforts, including:

Sage (A Software-Defined Sensor Network)

Sage: A Software-Defined Sensor Network

The Sage project is a National Science Foundation (NSF)-backed endeavor, led by Northwestern University since 2019. The project focuses on harnessing the latest edge computing technologies and methods to create a programmable, reusable network of smart, AI-based sensors at the edge for various applications, e.g., tracking smoke plume dispersion during wildfires. Leveraging our expertise in cyberinfrastructure development and data architecture, we have been working towards the robust development of several pieces of the Sage sensor network, including: