Unified Data Processing for Batch and Streaming with Identical Syntax
DataFlow.NET is a revolutionary open-source framework that unifies batch and streaming data processing in C#. Write your data processing logic once using our intuitive Cases/SelectCase/ForEachCase pattern, then apply it seamlessly to static files, real-time streams, or any data source - without changing a single line of processing code.
π The Vision: Process CSV files during development, deploy the same code to handle live data streams in production
// This SAME processing logic works for both batch and streaming:
var processLogs = (data) => data
.Cases(log => log.Level == "ERROR", log => log.Level == "WARNING")
.SelectCase(error => $"π¨ {error.Message}", warning => $"β οΈ {warning.Message}")
.ForEachCase(SendAlert, LogWarning)
.AllCases();
// BATCH: Process historical logs
processLogs(Read.csv<LogEntry>("historical_logs.csv"));
// STREAMING: Process live logs (IDENTICAL CODE!)
await processLogs(liveLogStream.AsAsyncEnumerable());
DataFlow.NET's Supra Category Pattern lets you focus on what matters while gracefully handling unexpected data:
// Only process errors and warnings, ignore everything else
logs.Cases(IsError, IsWarning) // Info/Debug logs become "supra category"
.SelectCase(HandleError, HandleWarning) // Only transform what you care about
.ForEachCase(AlertError, LogWarning) // Supra category gracefully ignored
.AllCases();
- Identical Syntax: Same fluent API for batch files and real-time streams
- Zero Migration Cost: Convert batch processing to streaming without code changes
- Async-First Design: Built on
IAsyncEnumerable<T>
with sync compatibility - Stream Merging: Combine multiple data sources with
AsyncEnumerableMerger<T>
- Multiple Formats: Text, CSV, JSON, YAML, and custom CFG Grammar files
- Memory Efficient: Lazy evaluation processes huge files without memory issues
- Stream-Ready: All readers work with both files and live data sources
- Cases Pattern: Categorize data with multiple predicates in one operation
- SelectCase: Transform each category with different logic
- ForEachCase: Execute side effects per category (logging, alerts, etc.)
- Supra Category: Gracefully handle unmatched data
- Single-Pass Processing: Transform, filter, and write in one iteration
- Lazy Evaluation: Process data as needed, not all at once
- Channel-Based Streaming: High-performance async data flow
- Memory Conscious: Handle massive datasets efficiently
- Intuitive Regex: Simplified pattern matching with human-friendly syntax
- Fluent API: Chain operations naturally with method chaining
- LINQ Integration: Works seamlessly with existing .NET code
- Rich Extensions: Comprehensive
IEnumerable
andIAsyncEnumerable
extensions
// Read, transform, and write CSV in one fluent chain
Read.csv<Person>("people.csv")
.Where(p => p.Age >= 18)
.Select(p => { p.Name = p.Name.ToUpper(); return p; })
.WriteCSV("adults_uppercase.csv");
// Merge multiple log sources and process in real-time
var logStream = new AsyncEnumerableMerger<LogEntry>(
webServerLogs, databaseLogs, authServiceLogs
);
await logStream
.Cases(log => log.Level == "ERROR", log => log.Level == "WARNING")
.SelectCase(
error => $"π¨ CRITICAL: {error.Message}",
warning => $"β οΈ WARNING: {warning.Message}"
)
.ForEachCase(
critical => await alertSystem.SendAsync(critical),
warning => await logger.LogAsync(warning)
)
.AllCases()
.WriteTextAsync("processed_logs.txt");
// Process only what you care about, ignore the rest
Read.text("mixed_logs.txt")
.Cases(
line => line.Contains("ERROR"),
line => line.Contains("WARNING")
// INFO, DEBUG, TRACE logs become supra category (ignored)
)
.SelectCase(
error => $"ERROR: {error}", // Handle errors
warning => $"WARNING: {warning}" // Handle warnings
// No selector for supra category = graceful ignoring
)
.ForEachCase(
error => errorWriter.WriteLine(error),
warning => warningWriter.WriteLine(warning)
// Supra category automatically skipped
)
.AllCases()
.Where(line => line != null) // Filter out ignored items
.WriteText("filtered_logs.txt");
// Extract and process HTTP status codes from logs
Read.text("access.log")
.Map($"HTTP/1.1\" {NUMS.As("StatusCode")} {NUMS.As("ResponseSize")}")
.Cases("StatusCode")
.SelectCase(code =>
code.StartsWith("4") || code.StartsWith("5") ?
$"Error: {code}" : $"Success: {code}")
.AllCases()
.Display();
// Collect from multiple sources
var dataStream = new AsyncEnumerableMerger<SensorData>(
temperatureSensors, humiditySensors, pressureSensors
);
// Process with familiar syntax
await dataStream
.Cases(data => data.IsCritical(), data => data.IsWarning())
.SelectCase(ProcessCritical, ProcessWarning)
.AllCases()
.WriteCSVAsync("sensor_data.csv");
// Start with batch processing during development
var testResults = Read.csv<Order>("test_orders.csv")
.Cases(IsUrgent, IsStandard)
.SelectCase(ProcessUrgent, ProcessStandard)
.AllCases();
// Deploy with streaming (ZERO code changes!)
var liveResults = await orderStream
.Cases(IsUrgent, IsStandard) // Same predicates
.SelectCase(ProcessUrgent, ProcessStandard) // Same transformations
.AllCases(); // Same result extraction
# NuGet Package (coming soon)
dotnet add package DataFlow.NET
# Or clone from GitHub
git clone https://github.com/yourusername/DataFlow.NET.git
using DataFlow.NET;
// Read CSV, process, and write results
Read.csv<Customer>("customers.csv")
.Cases(c => c.IsVIP, c => c.IsActive)
.SelectCase(
vip => $"VIP: {vip.Name}",
active => $"Active: {active.Name}",
inactive => $"Inactive: {inactive.Name}"
)
.AllCases()
.WriteText("customer_status.txt");
- π Technical Documentation: Deep dive into advanced features
- π― API Reference: Complete method documentation
- π‘ Examples: Real-world usage scenarios
- π Getting Started Guide: Step-by-step tutorials
- π Issues: Bug reports and feature requests
- π¬ Discussions: Community Q&A
- π§ Email: [email protected]
- Open Source: Apache V2.0 License for free software use - see LICENSE
- Commercial: For commercial software use, see LICENSE_NOTICE
DataFlow.NET - Where Data Processing Meets Simplicity π