Skip to content

Releases: Alipsa/matrix

Matrix Xchart 0.2.3

31 Jan 19:05

Choose a tag to compare

  • add @CompileStatic to all 17 classes for performance and type safety (100% static compilation, no @CompileDynamic needed)
  • complete empty test methods in HistogramChartTest (testDensityHistogram, testFrequencyHistogramCustom)
  • add comprehensive GroovyDoc to all abstract classes (AbstractChart, AbstractXYChart, AbstractCategoryChart)
  • add comprehensive edge case tests for improved test coverage
  • replace Math calls with NumberExtension methods for idiomatic Groovy (Math.sqrt() → .sqrt())
  • expand GroovyDoc on all public methods with detailed parameter descriptions and examples
  • remove commented println debug statements
  • fix IndexOutOfBoundsException in AbstractChart.makeFillTransparent() method
  • fix multiple calculation issues in chart rendering
  • update README version references and complete StickChart description

Matrix Tablesaw 0.2.2

31 Jan 18:57

Choose a tag to compare

Build Configuration Improvements

  • Added compileTestJava configuration with deprecation and unchecked warnings enabled
  • Added -Xlint:unchecked flag to compileGroovy for improved Groovy code quality checks
  • Build configuration now consistent with matrix-core module standards

Code Quality Improvements

  • Added @SuppressWarnings("unchecked") annotation to TableUtil.createColumn() method to properly handle intentional unchecked generic casts
  • Removed duplicate BigDecimalColumn type check in classForColumnType() method (dead code removal)

Bug Fixes & Improvements

  • BigDecimalColumn enhancements:
    • Fixed asBytes() method to use UTF-8 encoding explicitly instead of platform default charset, ensuring consistent byte representation across all platforms
    • Cleaned up asBytes() method documentation and removed outdated TODO comments
    • Extended toBigDecimal() method to handle additional Number subtypes:
      • BigDecimal - now returns the value as-is without conversion (prevents precision loss from unnecessary double conversion)
      • AtomicInteger - converted via get() for precision
      • AtomicLong - converted via get() for precision
      • DoubleAccumulator - converted via doubleValue()
    • Added comprehensive Javadoc explaining conversion behavior for all Number types
    • Improved test coverage to properly exercise toBigDecimal(Number) conversion path for BigDecimal inputs
    • Updated test assertions to use UTF-8 encoding for deterministic byte array comparisons

Documentation

  • Enhanced Javadoc/Groovydoc documentation:
    • BigDecimalColumnFormatter - Added comprehensive class documentation with usage examples, documented all factory methods, constructors, and formatting methods
    • GdataFrameJoiner - Added class documentation explaining join types, documented all join method variants with parameter descriptions
    • Verified existing documentation in BigDecimalComparator, XlsxWriteOptions, and GdataFrameReader
  • All public APIs now have production-quality documentation

Testing

  • Added JaCoCo code coverage reporting infrastructure
    • Current coverage: 54% instruction coverage, 58% branch coverage
    • Coverage thresholds: 50% overall, 15% per class (baseline to prevent regression)
    • Coverage reports available in HTML and XML formats
    • Excluded low-coverage infrastructure classes from strict requirements
  • Added test coverage for atomic type conversions in BigDecimalColumn
  • Added test for BigDecimal precision preservation
  • All 85 tests passing (2 new tests added)

Deprecations

  • Deprecated OdsReadOptions.builder(Reader reader) - ODS is a binary format, not text-based
  • Deprecated OdsReadOptions.builderFromString(String contents) - ODS is a binary format, not text-based
  • Deprecated OdsReadOptions.builderFromUrl(String url) - ODS is a binary format, not text-based
  • Note: These deprecated methods will be removed in v0.3.0

Dependency Updates

  • com.github.miachm.sods:SODS [1.6.8 -> 1.7.0]
  • org.apache.poi:poi-ooxml [5.4.1 -> 5.5.1]

Matrix Sql 2.3.0

31 Jan 18:49

Choose a tag to compare

  • add option to control whether column names are quoted when creating a table
  • add an execute method to MatrixSql to run arbitrary sql (update, delete, insert etc.)
  • MatrixSqlFactory.create attempts to infer and set the JDBC driver when missing
  • MatrixSql connection lifecycle fixes (reconnect after close)
  • safer, prepared-statement updates with match-column validation
  • ResultSet improvements: updateRow is a no-op for detached sets, null-safe primitive/stream getters, strict unwrap contract
  • close metadata ResultSets for table discovery utilities
  • Dependency upgrades:
    • commons-io:commons-io [2.20.0 -> 2.21.0]
    • se.alipsa.groovy:data-utils [2.0.3 -> 2.0.4]
    • se.alipsa:maven-3.9.11-utils [1.0.0 -> 1.1.0]

Matrix Spreadsheet 2.3.0

31 Jan 17:31

Choose a tag to compare

Major architectural refactoring with significant performance improvements

Breaking Changes

  • removed POI and SODS implementations - FastExcel is now the single XLSX backend, FastOds is the single ODS backend
  • removed ExcelImplementation and OdsImplementation enums - no implementation selection needed
  • explicitly reject legacy .xls files (XLSX only)
  • deprecated SpreadsheetExporter in favor of SpreadsheetWriter

New Features

  • add append/replace support for existing XLSX and ODS files (preserves sheets and metadata)
  • add flexible start position support when writing data (e.g., write to cell B5)
  • add map-based multi-sheet API: writeSheets(Map<String, Position>)
  • add new ODS streaming writer/appender with table attributes/column reuse for styling
  • add profiling support for ODS operations via -Dmatrix.spreadsheet.ods.profile=true
  • XLSX append now inherits sheetFormatPr, column widths, page margins, and fixes relId collisions
  • add comprehensive sheet name sanitization with automatic de-duplication

Performance Improvements

  • ODS read performance: 65-80% faster (medium files: 4.86s → 1.43s, large files: 262s → 53s)
  • switched to Aalto StAX parser for 64% speedup
  • adaptive row capacity sizing to minimize ArrayList resizing
  • type-aware value extraction with switch dispatch
  • optimized trailing empty row detection
  • Null Object pattern for profiling eliminates branching overhead (~14% improvement)

Bug Fixes

  • fix missing return statement in ValueExtractor.getDouble() (percentage parsing)
  • fix 1-based sheet indexing consistency across all importers
  • fix sheet name collision prevention (sanitization could cause silent data loss)
  • fix invalid sheet number handling in URL imports
  • fix null row guards in FExcelReader
  • fix percentage parsing to be locale-independent
  • fix race condition in SpreadsheetExporter static field sharing

Code Quality

  • add column count validation to all write methods
  • add robust cleanup for temp files and XML stream resources
  • add XXE protection with hardened XML parsing
  • replace 15+ println statements with proper logging
  • remove ~500 lines of dead code (POI, SODS implementations)
  • extract duplicate header building logic (DRY improvements)
  • comprehensive test coverage: 79.74% (105 tests passing)
  • add benchmarking suite for performance validation

Dependencies

  • remove org.apache.poi:poi and org.apache.poi:poi-ooxml
  • remove com.github.miachm.sods:SODS
  • remove org.apache.logging.log4j:log4j-api (migrated to matrix-core Logger)
  • add com.fasterxml:aalto-xml 1.3.4 (high-performance StAX parser)
  • upgrade com.github.javaparser:javaparser-core [3.26.4 -> 3.27.0]
  • migrate from log4j to matrix-core Logger (supports slf4j if present, otherwise System.out/err)

Matrix Smile 0.1.0

31 Jan 17:18

Choose a tag to compare

Initial release providing comprehensive integration between Matrix and Smile (Statistical Machine Intelligence and Learning Engine).

Core Features

  • DataframeConverter: Bidirectional conversion between Matrix and Smile DataFrame with support for 18 data types
  • SmileUtil: Pandas-like utility functions for data exploration and manipulation
    • Statistical summary (describe), column information (info), frequency tables
    • Sampling (random, by count, by fraction, with seed)
    • Head/tail operations, null detection and counting
  • Gsmile Extension Module: Natural Groovy syntax extensions for Matrix and DataFrame
    • Matrix extensions: toSmileDataFrame(), smileDescribe(), smileSample()
    • DataFrame extensions: toMatrix(), subscript operators (getAt), filtering, iteration
    • Comprehensive test coverage (24 extension method tests)

Machine Learning Wrappers

  • SmileClassifier: Wrappers for classification algorithms
    • Logistic Regression, Decision Trees, Random Forest, Gradient Boosted Trees
    • Support Vector Machines, K-Nearest Neighbors, Naive Bayes, AdaBoost
    • Model training, prediction, and evaluation with confusion matrices
  • SmileRegression: Wrappers for regression algorithms
    • Linear Regression, Ridge Regression, LASSO, Elastic Net
    • Regression Trees, Gradient Boosted Trees, Random Forest
    • Model fitting, prediction, and RMSE calculation
  • SmileCluster: Wrappers for clustering algorithms
    • K-Means, Hierarchical Clustering, DBSCAN, DENCLUE, CLARANS
    • Cluster assignment and centroids calculation
  • SmileDimensionality: Dimensionality reduction techniques
    • PCA (Principal Component Analysis), MDS (Multidimensional Scaling)
    • t-SNE (t-Distributed Stochastic Neighbor Embedding)

Statistical Analysis (SmileStats)

  • Probability Distributions:
    • Discrete: Binomial, Geometric, Poisson, Hypergeometric
    • Continuous: Normal, Exponential, Gamma, Beta, Chi-Squared, T, F, Weibull
    • PDF, CDF, quantile, and random sample generation
  • Hypothesis Testing:
    • t-tests (one-sample, two-sample, paired)
    • Chi-squared test, F-test, Kolmogorov-Smirnov test
    • Correlation tests (Pearson, Spearman, Kendall) with significance testing
  • Correlation Analysis:
    • Correlation matrices with p-values
    • Support for Pearson, Spearman, and Kendall correlation methods
  • Random Sampling: Generate random samples from various distributions

Feature Engineering (SmileFeatures)

  • Data Loading: Load datasets from Smile's built-in data repository
  • Feature Scaling:
    • StandardScaler (z-score normalization with fit/transform workflow)
    • MinMaxScaler (range normalization)
    • MaxAbsScaler (maximum absolute value scaling)
    • RobustScaler (median and IQR-based scaling)
  • Feature Encoding:
    • One-hot encoding for categorical variables
    • Label encoding for ordinal variables
  • Feature Selection:
    • Sum, difference, product, ratio feature creation
  • Imputation: Missing value handling with mean, median, mode, or constant strategies

Code Quality

  • Comprehensive @CompileStatic annotation throughout for type safety and performance
  • Modern Groovy 5.0+ switch expression syntax (arrow operators)
  • Extensive GroovyDoc documentation (207 JavaDoc blocks)
  • Comprehensive test coverage (274 tests across 10 test files, 100% test file coverage)
  • Idiomatic Groovy code (as double instead of .doubleValue(), NumberExtension usage)

Dependencies

  • com.github.haifengl:smile-core 4.4.2
  • Requires Java 21 (Smile 4.x not compatible with Java 22+)
  • Requires Groovy 5.0+ (for modern switch expression syntax)

Matrix Parquet 0.4.0

31 Jan 17:08

Choose a tag to compare

  • remove parquet-carpet dependency (MatrixCarpetIO) - now using native Parquet implementation
  • add support for nested structures: structs (POJOs, maps) and repeated fields (arrays)
  • add URL, Path, InputStream, and byte[] input support to MatrixParquetReader (API consistency with matrix-csv and matrix-json)
  • add BigDecimal precision and scale control in MatrixParquetWriter write methods
  • add in-memory write support via InMemoryOutputFile and InMemoryPositionOutputStream (eliminates temporary files)
  • add timezone support for timestamp handling (optional parameter in reader/writer methods)
  • MatrixParquetWriter can now write to either a file or directory (using matrix name for filename)
  • use matrixName as Parquet schema name if present
  • add @CompileStatic to MatrixParquetReader for performance and type safety
  • add comprehensive input validation to both reader and writer (null checks, empty matrix, file existence)
  • add safeFileName sanitization for directory targets (strips path separators and unsafe characters)
  • fix bug: time precision schema/implementation mismatch (now uses MICROS for timestamps, MILLIS for time)
  • fix bug: BigDecimal schema inference incorrectly set minimum scale to 2
  • extract magic strings to constants for maintainability
  • add comprehensive GroovyDoc to MatrixParquetReader and MatrixParquetWriter
  • add extensive test coverage including edge cases, validation, and round-trip verification
  • cache reflection metadata for struct handling (performance optimization)
  • upgrade dependencies
    • org.apache.hadoop:hadoop-common [3.4.1 -> 3.4.2]
    • org.apache.hadoop:hadoop-mapreduce-client-core [3.4.1 -> 3.4.2]
    • org.apache.parquet:parquet-column [1.15.2 -> 1.16.0]
    • org.apache.parquet:parquet-hadoop [1.15.2 -> 1.16.0]

Matrix Json 2.1.2

31 Jan 16:57

Choose a tag to compare

  • deprecate JsonImporter and JsonExporter in favor of JsonReader and JsonWriter
  • change implementation to use Jackson streaming API instead of JsonSlurper for improved memory efficiency (O(1) memory regardless of JSON size)
  • add duplicate key detection in flatten() to prevent silent data loss
  • add URL and Path support to JsonImporter (matching matrix-csv API)
  • add static export methods to JsonExporter for API consistency
  • add comprehensive test coverage for edge cases (empty arrays, single rows, null handling)
  • add input validation to prevent writing empty or null matrices
  • fix JsonImporter mutation and iteration assumptions
  • fix TOCTOU race condition in JsonWriter file creation
  • replace broad exception catches with specific exception types
  • upgrade dependencies
    • com.fasterxml.jackson.core:jackson-core [2.20.0 -> 2.20.1]
    • com.fasterxml.jackson.core:jackson-databind [2.20.0 -> 2.20.1]

Matrix Gsheets 0.1.1

31 Jan 16:49

Choose a tag to compare

Move actual implementation for GsheetsReader and GsheetsWriter and utility methods to GsUtil so that GsImporter and GsExporter are just empty wrappers.

Matrix Groovy Ext 0.1.0

31 Jan 19:31

Choose a tag to compare

Initial version

  • Number extensions allowing for more idiomatic groovy code.

Matrix csv 2.2.2

31 Jan 14:47

Choose a tag to compare

  • deprecate CsvImporter and CsvExporter in favor of CsvReader and CsvWriter
  • upgrade commons-csv from 1.14.0 to 1.14.1
  • fix bug: empty CSV files now handled correctly (no more IndexOutOfBoundsException)
  • fix typos in error messages ("extected" → "expected")
  • add comprehensive test coverage for edge cases (empty CSV, header-only, single row/column, mismatched columns)
  • add null validation to CsvExporter methods
  • add GroovyDoc documentation to CsvImporter and CsvExporter classes and methods