Skip to content

manvith2003/serverless-spark-iot-framework

Repository files navigation

🚀 Serverless Spark IoT Framework

Real-time IoT data processing framework using Apache Spark with serverless architecture, reinforcement learning optimization, and edge-cloud synchronization.

License: MIT Python 3.11 PySpark 3.5


🌟 Novel Contributions

This framework introduces several first-of-their-kind innovations:

  1. CRDT-based Edge-Cloud State Synchronization with LSTM predictive prefetching
  2. Reinforcement Learning (DQN/PPO) Resource Allocation for multi-objective optimization
  3. Adaptive 4-Tier Shuffle Optimization with IoT-specific data temperature prediction
  4. Cross-Cloud Portability Layer enabling seamless AWS/GCP/Azure deployment
  5. Privacy-Preserving Analytics with federated learning and homomorphic encryption

🏗️ Architecture

IoT Devices → Edge Processing → MQTT/Kafka → Serverless Spark → Output
              (CRDT Sync)      (Ingestion)   (RL Optimizer)    (Dashboard)
                                             (4-Tier Shuffle)

🛠 Tech Stack

  • Processing: Apache Spark 3.5 (Serverless)
  • Streaming: Kafka, MQTT (Mosquitto)
  • ML/RL: PyTorch, TensorFlow, Stable-Baselines3
  • Cloud: AWS EMR, GCP Dataproc, Azure Synapse
  • State: Redis, DynamoDB, Cassandra
  • Orchestration: Kubernetes, Docker, Terraform

📋 Prerequisites

  • OS: macOS (M2/M3), Linux, Windows
  • RAM: 8GB minimum, 16GB recommended
  • Python: 3.11+
  • Java: 11+
  • Docker: Latest version
  • Conda: Mambaforge/Miniconda

🚀 Quick Start

1. Clone Repository

git clone https://github.com/YOUR_USERNAME/serverless-spark-iot-framework.git
cd serverless-spark-iot-framework

2. Create Conda Environment

conda env create -f environment.yml
conda activate spark-iot

3. Start Docker Services

docker-compose up -d

4. Verify Setup

python tests/test_setup.py

📁 Project Structure

serverless-spark-iot-framework/
├── ingestion/              # MQTT/Kafka data ingestion
├── edge_processing/        # Edge computing with CRDT sync
├── spark_core/             # Serverless Spark processing
├── optimization/           # RL-based resource allocation
├── state_management/       # Distributed state handling
├── cross_cloud/            # Multi-cloud abstraction
├── dashboard/              # Real-time visualization
├── evaluation/             # Benchmarking & metrics
└── docs/                   # Documentation & research paper

🎯 Use Cases

  • Smart Cities: Real-time traffic and pollution monitoring
  • Healthcare: Patient vitals analysis with privacy preservation
  • Industry 4.0: Predictive maintenance and anomaly detection
  • Agriculture: Sensor-based crop health monitoring

📊 Performance (Preliminary)

Metric Baseline Our Framework Improvement
Cost 1.0× 0.16× 6.2× reduction
Latency 500ms 275ms 45% faster
Edge Sync N/A <10ms Novel

📝 Research Paper

This work is part of a research project at Indian Institute of Information Technology Kottayam.

Authors: Manvith M Advisor: Dr. Shajulin Benedict


📄 License

MIT License - see LICENSE file


🤝 Contributing

Contributions welcome! Please open an issue or submit a pull request.


📧 Contact

For questions or collaboration: [[email protected]]


Built with ❤️ for the IoT and Serverless Computing community

About

Serverless Spark Framework for Real-Time IoT Applications

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published