1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
Updated
Nov 8, 2024 - Python
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
A Data Analysis Board in Vue.
PySpark-Tutorial provides basic algorithms using PySpark
vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
Powerful & Easy way for big data discovery
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
A data-driven method combining symbolic regression and compressed sensing for accurate & interpretable models.
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Use CH-UI to work with your data from Click House self-hosted with a user-friendly interface. CH-UI is a modern and feature-rich user interface for ClickHouse databases. It offers an intuitive platform for querying ClickHouse databases, executing queries, and visualizing metrics about your instance.
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
This is about learning courses in Coursera. All the answers given written by myself
The Pandata scalable open-source analysis stack
Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.
Big data projects implemented by Maniram yadav
Egis - a handy Ruby interface for AWS Athena
Real-time Packet Observation Tool
Visual, interactive queries against big databases
Add a description, image, and links to the big-data-analytics topic page so that developers can more easily learn about it.
To associate your repository with the big-data-analytics topic, visit your repo's landing page and select "manage topics."