Risers Fatigue Analysis Synthetic - Spark

In this repository, you find a synthetic implementation on Apache Spark framework of Riser Fatigue Analysis (RFA) scientific workflow based on a real case study in Oil and Gas domain. This implementation use the natively available Process library in Spark to call external black-box applications.

Content

Risers Fatigue Analysis synthetic workflow

Risers Fatigue Analysis (RFA) workflow is a real case study in the Oil and Gas domain. It is composed of seven activities that receive input tuples, perform complex calculations based on them, and transform tuples into resulting output tuples.

Activities
- Uncompress Input Dataset - split one tuple into many tuples
- Preprocessing - map
- Analyze Risers - map
- Calculate Wear and Tear - filter
- Analyze Position - filter
- Join Results - join
- Compress Results - reduce tuples
RFA activites implementation (source code)
RFA activites implementation (jar)
RFA files

How to Run

Dependencies:

Setup and configuration:

Clone repository:

$ git clone https://github.com/hpcdb/RFA-Spark.git
$ cd RFA-Spark

Edit the input file:

$ vi input.dataset

Example:

ID;SPLITMAP;SPLITFACTOR;MAP1;MAP2;FILTER1;F1;FILTER2;F2;REDUCE;REDUCEFACTOR
1;5;8;5;5;5;50;5;50;5;4

Fields:
- ID: Entry identifier
- SPLITMAP: Average Task Cost in Uncompress activity (seconds)
- SPLITFACTOR: Number of entries in the input dataset after uncompression
- MAP1: Average Task Cost in Pre-Processing activity (seconds)
- MAP2: Average Task Cost in Analyze Riser sactivity (seconds)
- FILTER1:Average Task Cost in Calculate Wear and Tear activity (seconds)
- F1: Amount of entries for Calculate Wear and Tear activity to filter in % (i.e., Percentage that will continue in the flow)
- FILTER2:Average Task Cost in Analyze Position activity (seconds)
- F2: Amount of entries for Analyze Position activity to filter in %(i.e., Percentage that will continue in the flow)
- REDUCE: Average Task Cost in Compress Results activity (seconds)
- REDUCEFACTOR: Number of compressed output entries

Run

Start Apache Spark Cluster
Set SPARK_HOME environment variable

$ export SPARK_HOME=/path/to/spark

Change directory to RFA-Spark home:

$ cd RFA-Spark

Run:

$ ./run.sh <spark-master-url> <num-executors> <total-executor-cores>

Where:

spark-master-url: The master URL for the cluster
num-executors: Number of Apache Spark executors requested on the cluster.
total-executor-cores: Total Number of cores requested on the cluster.
Example:

$ ./run.sh  spark://hostname:7077 1 2

Source Code

How to Build

Build Dependencies

Build

Change directory to rfa-spark-project:

$ cd RFA-Spark/rfa-spark-project

Maven

$ mvn package

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
bin		bin
out		out
rfa-spark-project		rfa-spark-project
rfa-synthetic		rfa-synthetic
README.md		README.md
input.dataset		input.dataset
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Risers Fatigue Analysis Synthetic - Spark

Content

Risers Fatigue Analysis synthetic workflow

How to Run

Dependencies:

Setup and configuration:

Edit the input file:

Run

Source Code

How to Build

Build Dependencies

Build

About

Releases

Packages

Languages

hpcdb/RFA-Spark

Folders and files

Latest commit

History

Repository files navigation

Risers Fatigue Analysis Synthetic - Spark

Content

Risers Fatigue Analysis synthetic workflow

How to Run

Dependencies:

Setup and configuration:

Edit the input file:

Run

Source Code

How to Build

Build Dependencies

Build

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages