Releases: Alipsa/matrix
Releases · Alipsa/matrix
Matrix Xchart 0.2.3
- add @CompileStatic to all 17 classes for performance and type safety (100% static compilation, no @CompileDynamic needed)
- complete empty test methods in HistogramChartTest (testDensityHistogram, testFrequencyHistogramCustom)
- add comprehensive GroovyDoc to all abstract classes (AbstractChart, AbstractXYChart, AbstractCategoryChart)
- add comprehensive edge case tests for improved test coverage
- replace Math calls with NumberExtension methods for idiomatic Groovy (Math.sqrt() → .sqrt())
- expand GroovyDoc on all public methods with detailed parameter descriptions and examples
- remove commented println debug statements
- fix IndexOutOfBoundsException in AbstractChart.makeFillTransparent() method
- fix multiple calculation issues in chart rendering
- update README version references and complete StickChart description
Matrix Tablesaw 0.2.2
Build Configuration Improvements
- Added
compileTestJavaconfiguration with deprecation and unchecked warnings enabled - Added
-Xlint:uncheckedflag tocompileGroovyfor improved Groovy code quality checks - Build configuration now consistent with matrix-core module standards
Code Quality Improvements
- Added
@SuppressWarnings("unchecked")annotation toTableUtil.createColumn()method to properly handle intentional unchecked generic casts - Removed duplicate
BigDecimalColumntype check inclassForColumnType()method (dead code removal)
Bug Fixes & Improvements
- BigDecimalColumn enhancements:
- Fixed
asBytes()method to use UTF-8 encoding explicitly instead of platform default charset, ensuring consistent byte representation across all platforms - Cleaned up
asBytes()method documentation and removed outdated TODO comments - Extended
toBigDecimal()method to handle additional Number subtypes:BigDecimal- now returns the value as-is without conversion (prevents precision loss from unnecessary double conversion)AtomicInteger- converted viaget()for precisionAtomicLong- converted viaget()for precisionDoubleAccumulator- converted viadoubleValue()
- Added comprehensive Javadoc explaining conversion behavior for all Number types
- Improved test coverage to properly exercise toBigDecimal(Number) conversion path for BigDecimal inputs
- Updated test assertions to use UTF-8 encoding for deterministic byte array comparisons
- Fixed
Documentation
- Enhanced Javadoc/Groovydoc documentation:
- BigDecimalColumnFormatter - Added comprehensive class documentation with usage examples, documented all factory methods, constructors, and formatting methods
- GdataFrameJoiner - Added class documentation explaining join types, documented all join method variants with parameter descriptions
- Verified existing documentation in BigDecimalComparator, XlsxWriteOptions, and GdataFrameReader
- All public APIs now have production-quality documentation
Testing
- Added JaCoCo code coverage reporting infrastructure
- Current coverage: 54% instruction coverage, 58% branch coverage
- Coverage thresholds: 50% overall, 15% per class (baseline to prevent regression)
- Coverage reports available in HTML and XML formats
- Excluded low-coverage infrastructure classes from strict requirements
- Added test coverage for atomic type conversions in BigDecimalColumn
- Added test for BigDecimal precision preservation
- All 85 tests passing (2 new tests added)
Deprecations
- Deprecated
OdsReadOptions.builder(Reader reader)- ODS is a binary format, not text-based - Deprecated
OdsReadOptions.builderFromString(String contents)- ODS is a binary format, not text-based - Deprecated
OdsReadOptions.builderFromUrl(String url)- ODS is a binary format, not text-based - Note: These deprecated methods will be removed in v0.3.0
Dependency Updates
- com.github.miachm.sods:SODS [1.6.8 -> 1.7.0]
- org.apache.poi:poi-ooxml [5.4.1 -> 5.5.1]
Matrix Sql 2.3.0
- add option to control whether column names are quoted when creating a table
- add an execute method to MatrixSql to run arbitrary sql (update, delete, insert etc.)
- MatrixSqlFactory.create attempts to infer and set the JDBC driver when missing
- MatrixSql connection lifecycle fixes (reconnect after close)
- safer, prepared-statement updates with match-column validation
- ResultSet improvements: updateRow is a no-op for detached sets, null-safe primitive/stream getters, strict unwrap contract
- close metadata ResultSets for table discovery utilities
- Dependency upgrades:
- commons-io:commons-io [2.20.0 -> 2.21.0]
- se.alipsa.groovy:data-utils [2.0.3 -> 2.0.4]
- se.alipsa:maven-3.9.11-utils [1.0.0 -> 1.1.0]
Matrix Spreadsheet 2.3.0
Major architectural refactoring with significant performance improvements
Breaking Changes
- removed POI and SODS implementations - FastExcel is now the single XLSX backend, FastOds is the single ODS backend
- removed ExcelImplementation and OdsImplementation enums - no implementation selection needed
- explicitly reject legacy .xls files (XLSX only)
- deprecated SpreadsheetExporter in favor of SpreadsheetWriter
New Features
- add append/replace support for existing XLSX and ODS files (preserves sheets and metadata)
- add flexible start position support when writing data (e.g., write to cell B5)
- add map-based multi-sheet API:
writeSheets(Map<String, Position>) - add new ODS streaming writer/appender with table attributes/column reuse for styling
- add profiling support for ODS operations via
-Dmatrix.spreadsheet.ods.profile=true - XLSX append now inherits sheetFormatPr, column widths, page margins, and fixes relId collisions
- add comprehensive sheet name sanitization with automatic de-duplication
Performance Improvements
- ODS read performance: 65-80% faster (medium files: 4.86s → 1.43s, large files: 262s → 53s)
- switched to Aalto StAX parser for 64% speedup
- adaptive row capacity sizing to minimize ArrayList resizing
- type-aware value extraction with switch dispatch
- optimized trailing empty row detection
- Null Object pattern for profiling eliminates branching overhead (~14% improvement)
Bug Fixes
- fix missing return statement in ValueExtractor.getDouble() (percentage parsing)
- fix 1-based sheet indexing consistency across all importers
- fix sheet name collision prevention (sanitization could cause silent data loss)
- fix invalid sheet number handling in URL imports
- fix null row guards in FExcelReader
- fix percentage parsing to be locale-independent
- fix race condition in SpreadsheetExporter static field sharing
Code Quality
- add column count validation to all write methods
- add robust cleanup for temp files and XML stream resources
- add XXE protection with hardened XML parsing
- replace 15+ println statements with proper logging
- remove ~500 lines of dead code (POI, SODS implementations)
- extract duplicate header building logic (DRY improvements)
- comprehensive test coverage: 79.74% (105 tests passing)
- add benchmarking suite for performance validation
Dependencies
- remove org.apache.poi:poi and org.apache.poi:poi-ooxml
- remove com.github.miachm.sods:SODS
- remove org.apache.logging.log4j:log4j-api (migrated to matrix-core Logger)
- add com.fasterxml:aalto-xml 1.3.4 (high-performance StAX parser)
- upgrade com.github.javaparser:javaparser-core [3.26.4 -> 3.27.0]
- migrate from log4j to matrix-core Logger (supports slf4j if present, otherwise System.out/err)
Matrix Smile 0.1.0
Initial release providing comprehensive integration between Matrix and Smile (Statistical Machine Intelligence and Learning Engine).
Core Features
- DataframeConverter: Bidirectional conversion between Matrix and Smile DataFrame with support for 18 data types
- SmileUtil: Pandas-like utility functions for data exploration and manipulation
- Statistical summary (describe), column information (info), frequency tables
- Sampling (random, by count, by fraction, with seed)
- Head/tail operations, null detection and counting
- Gsmile Extension Module: Natural Groovy syntax extensions for Matrix and DataFrame
- Matrix extensions: toSmileDataFrame(), smileDescribe(), smileSample()
- DataFrame extensions: toMatrix(), subscript operators (getAt), filtering, iteration
- Comprehensive test coverage (24 extension method tests)
Machine Learning Wrappers
- SmileClassifier: Wrappers for classification algorithms
- Logistic Regression, Decision Trees, Random Forest, Gradient Boosted Trees
- Support Vector Machines, K-Nearest Neighbors, Naive Bayes, AdaBoost
- Model training, prediction, and evaluation with confusion matrices
- SmileRegression: Wrappers for regression algorithms
- Linear Regression, Ridge Regression, LASSO, Elastic Net
- Regression Trees, Gradient Boosted Trees, Random Forest
- Model fitting, prediction, and RMSE calculation
- SmileCluster: Wrappers for clustering algorithms
- K-Means, Hierarchical Clustering, DBSCAN, DENCLUE, CLARANS
- Cluster assignment and centroids calculation
- SmileDimensionality: Dimensionality reduction techniques
- PCA (Principal Component Analysis), MDS (Multidimensional Scaling)
- t-SNE (t-Distributed Stochastic Neighbor Embedding)
Statistical Analysis (SmileStats)
- Probability Distributions:
- Discrete: Binomial, Geometric, Poisson, Hypergeometric
- Continuous: Normal, Exponential, Gamma, Beta, Chi-Squared, T, F, Weibull
- PDF, CDF, quantile, and random sample generation
- Hypothesis Testing:
- t-tests (one-sample, two-sample, paired)
- Chi-squared test, F-test, Kolmogorov-Smirnov test
- Correlation tests (Pearson, Spearman, Kendall) with significance testing
- Correlation Analysis:
- Correlation matrices with p-values
- Support for Pearson, Spearman, and Kendall correlation methods
- Random Sampling: Generate random samples from various distributions
Feature Engineering (SmileFeatures)
- Data Loading: Load datasets from Smile's built-in data repository
- Feature Scaling:
- StandardScaler (z-score normalization with fit/transform workflow)
- MinMaxScaler (range normalization)
- MaxAbsScaler (maximum absolute value scaling)
- RobustScaler (median and IQR-based scaling)
- Feature Encoding:
- One-hot encoding for categorical variables
- Label encoding for ordinal variables
- Feature Selection:
- Sum, difference, product, ratio feature creation
- Imputation: Missing value handling with mean, median, mode, or constant strategies
Code Quality
- Comprehensive @CompileStatic annotation throughout for type safety and performance
- Modern Groovy 5.0+ switch expression syntax (arrow operators)
- Extensive GroovyDoc documentation (207 JavaDoc blocks)
- Comprehensive test coverage (274 tests across 10 test files, 100% test file coverage)
- Idiomatic Groovy code (as double instead of .doubleValue(), NumberExtension usage)
Dependencies
- com.github.haifengl:smile-core 4.4.2
- Requires Java 21 (Smile 4.x not compatible with Java 22+)
- Requires Groovy 5.0+ (for modern switch expression syntax)
Matrix Parquet 0.4.0
- remove parquet-carpet dependency (MatrixCarpetIO) - now using native Parquet implementation
- add support for nested structures: structs (POJOs, maps) and repeated fields (arrays)
- add URL, Path, InputStream, and byte[] input support to MatrixParquetReader (API consistency with matrix-csv and matrix-json)
- add BigDecimal precision and scale control in MatrixParquetWriter write methods
- add in-memory write support via InMemoryOutputFile and InMemoryPositionOutputStream (eliminates temporary files)
- add timezone support for timestamp handling (optional parameter in reader/writer methods)
- MatrixParquetWriter can now write to either a file or directory (using matrix name for filename)
- use matrixName as Parquet schema name if present
- add @CompileStatic to MatrixParquetReader for performance and type safety
- add comprehensive input validation to both reader and writer (null checks, empty matrix, file existence)
- add safeFileName sanitization for directory targets (strips path separators and unsafe characters)
- fix bug: time precision schema/implementation mismatch (now uses MICROS for timestamps, MILLIS for time)
- fix bug: BigDecimal schema inference incorrectly set minimum scale to 2
- extract magic strings to constants for maintainability
- add comprehensive GroovyDoc to MatrixParquetReader and MatrixParquetWriter
- add extensive test coverage including edge cases, validation, and round-trip verification
- cache reflection metadata for struct handling (performance optimization)
- upgrade dependencies
- org.apache.hadoop:hadoop-common [3.4.1 -> 3.4.2]
- org.apache.hadoop:hadoop-mapreduce-client-core [3.4.1 -> 3.4.2]
- org.apache.parquet:parquet-column [1.15.2 -> 1.16.0]
- org.apache.parquet:parquet-hadoop [1.15.2 -> 1.16.0]
Matrix Json 2.1.2
- deprecate JsonImporter and JsonExporter in favor of JsonReader and JsonWriter
- change implementation to use Jackson streaming API instead of JsonSlurper for improved memory efficiency (O(1) memory regardless of JSON size)
- add duplicate key detection in flatten() to prevent silent data loss
- add URL and Path support to JsonImporter (matching matrix-csv API)
- add static export methods to JsonExporter for API consistency
- add comprehensive test coverage for edge cases (empty arrays, single rows, null handling)
- add input validation to prevent writing empty or null matrices
- fix JsonImporter mutation and iteration assumptions
- fix TOCTOU race condition in JsonWriter file creation
- replace broad exception catches with specific exception types
- upgrade dependencies
- com.fasterxml.jackson.core:jackson-core [2.20.0 -> 2.20.1]
- com.fasterxml.jackson.core:jackson-databind [2.20.0 -> 2.20.1]
Matrix Gsheets 0.1.1
Move actual implementation for GsheetsReader and GsheetsWriter and utility methods to GsUtil so that GsImporter and GsExporter are just empty wrappers.
Matrix Groovy Ext 0.1.0
Initial version
- Number extensions allowing for more idiomatic groovy code.
Matrix csv 2.2.2
- deprecate CsvImporter and CsvExporter in favor of CsvReader and CsvWriter
- upgrade commons-csv from 1.14.0 to 1.14.1
- fix bug: empty CSV files now handled correctly (no more IndexOutOfBoundsException)
- fix typos in error messages ("extected" → "expected")
- add comprehensive test coverage for edge cases (empty CSV, header-only, single row/column, mismatched columns)
- add null validation to CsvExporter methods
- add GroovyDoc documentation to CsvImporter and CsvExporter classes and methods