Matrix Core Module

The matrix-core module is the foundation of the Alipsa Matrix library. It provides the essential classes and functionality for working with tabular data. In this section, we'll explore the key components of the matrix-core module and how to use them effectively.

Understanding the Matrix Class

What is a Matrix?

A Matrix in the Alipsa library is a grid with a header and defined column types. You can think of it as an in-memory ResultSet or a data frame similar to those in R or pandas in Python. It provides a structured way to work with tabular data, making operations like filtering, transformation, and analysis more intuitive.

Matrix vs. Grid

The library provides two main classes for working with tabular data:

Matrix: A typed table with headers and defined column types
Grid: A uniformly typed table without headers

The Matrix class is more feature-rich and is the primary focus of this tutorial, while Grid provides a simpler structure for working with uniform data types.

Creating a Matrix

There are several ways to create a Matrix object, depending on your data source and requirements.

Creating from Groovy Code

The most common way to create a Matrix is using the builder pattern:

import se.alipsa.matrix.core.*

// Create a Matrix with data defined inline
def employees = [
    "employee": ['John Doe', 'Peter Smith', 'Jane Doe'],
    "salary": [21000, 23400, 26800],
    "startDate": ListConverter.toLocalDates(['2013-11-01', '2018-03-25', '2017-03-14']),
    "reviewPeriod": ListConverter.toYearMonths(['2020-01', '2019-04', '2018-10'])
]

def table = Matrix.builder().data(employees).build()

// Verify the data
assert table.salary[0] == 21000 // get index 0 from the column "salary"
assert table[0, 1] == 21000 // get row 0, column 1
assert table[0, 'salary'] == 21000 // get row index 0 from the column name salary
assert table['salary'][0] == 21000 // get the column and the first index
assert table.row(0).salary == 21000 // get the first row and then the salary column
assert table.row(0)[1] == 21000 // get the first row and then the second column

You can also create a Matrix by specifying columns and their types:

import se.alipsa.matrix.core.*
import java.time.LocalDate

// Create a Matrix with column definitions
def table = Matrix.builder().columns([
    'place': [1, 2, 3],
    'firstname': ['Lorena', 'Marianne', 'Lotte'],
    'start': ['2021-12-01', '2022-07-10', '2023-05-27']
])
    .types([String]*3)
    .build()

// Convert column types after creation
def table2 = table.convert([place: Integer, start: LocalDate])
println(table2.typeNames())  // Prints: [Integer, String, LocalDate]

Creating from a Result Set

When working with databases, you can create a Matrix directly from a JDBC ResultSet:

import se.alipsa.matrix.core.Matrix
import se.alipsa.groovy.datautil.SqlUtil

// Connect to a database and execute a query
SqlUtil.withInstance(dbUrl, dbUser, dbPasswd, dbDriver, this) { sql ->
    sql.query('SELECT * FROM PROJECT') { rs ->
        project = Matrix.builder().data(rs).build()
    }
}

// Now you can work with the project Matrix
println(project.content())

Creating from a CSV File

You can easily create a Matrix from a CSV file:

import se.alipsa.matrix.core.Matrix

// Create a Matrix from a CSV file
def table = Matrix.builder().data(new File('/some/path/foo.csv'), ';').build()

Accessing Data in a Matrix

The Matrix class provides multiple ways to access and manipulate data.

Using [] Notation

Data can be referenced using [] notation. For example, to get the content of the 3rd row and 2nd column:

def value = table[3, 2]  // Row 3, Column 2

You can also access data by column name:

def priceColumn = table["price"]  // Get the entire price column
def price = table[3, "price"]     // Get the price from row 3

If you pass only one argument, you get the column:

def priceColumn = table["price"]
// or alternatively
def priceColumn = table.price

Using Column Names

Matrix provides property-like access to columns:

// These are equivalent
def priceColumn = table.price
def priceColumn2 = table["price"]
def priceColumn3 = table.getAt("price")
def priceColumn4 = table.column("price")
def priceColumn5 = table.column(table.columnNames().indexOf("price"))
def priceColumn6 = table.columns().get(table.columnNames().indexOf("price"))

Using Row Indices

You can access rows by index:

def row = table.row(3)  // Get the 4th row (0-based indexing)
def value = row.price   // Get the price value from that row

Data Manipulation

Converting Data Types

The Matrix class provides several methods to convert column types:

Using a map of column names and types:

def convertedTable = table.convert([
    place: Integer,
    start: LocalDate
])

Using a list of column types:

def convertedTable = table.convert([Integer, String, LocalDate])

Using a closure for complex conversions:

def convertedTable = table.convert('place', Integer, { 
    String val = String.valueOf(it).trim()
    if (val == 'null' || val == ',' || val.isBlank()) return null
    return Integer.valueOf(val)
})

Transforming Data

You can transform data using the apply method:

import se.alipsa.matrix.core.*
import java.time.*

def data = [
    'place': ['1', '2', '3', ','],
    'firstname': ['Lorena', 'Marianne', 'Lotte', 'Chris'],
    'start': ['2021-12-01', '2022-07-10', '2023-05-27', '2023-01-10']
]

def table = Matrix.builder('emp')
    .columns(data)
    .build()
    .convert(place: Integer, start: LocalDate)

// Add 10 days to the start dates
def table2 = table.apply("start", { startDate ->
    startDate.plusDays(10)
})

println(table2.content())

Adding a column

You can add new columns to an existing Matrix using addColumn. A common use case is to derive a new column from an existing one. Here is an example:

import se.alipsa.matrix.core.Matrix
import java.time.LocalDate

import static se.alipsa.matrix.core.ListConverter.*

import java.time.YearMonth

def empData = Matrix.builder().columns(
    emp_id: 1..5,
    emp_name: ["Rick", "Dan", "Michelle", "Ryan", "Gary"],
    salary: [623.3, 515.2, 611.0, 729.0, 843.25],
    start_date: toLocalDates("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27"))
    .types([int, String, Number, LocalDate]).build()

// Add a yearMonth column derived from the start_date column
def table = empData
    .clone() // clone to avoid modifying the original (optional)
    .addColumn("yearMonth", YearMonth, toYearMonths(empData.start_date))

println table.toMarkdown()

Which will output

emp_id	emp_name	salary	start_date	yearMonth
1	Rick	623.3	2012-01-01	2012-01
2	Dan	515.2	2013-09-23	2013-09
3	Michelle	611.0	2014-11-15	2014-11
4	Ryan	729.0	2014-05-11	2014-05
5	Gary	843.25	2015-03-27	2015-03

Subsetting Data

You can create subsets of your data using various methods:

// Using findIndexValues to select rows
def rows = table.rows(table['place'].findIndexValues { it > 1 })
assertEquals(2, rows.size())

// Using subset method
def subSet = table.subset('place', { it > 1 })
assert table.rows(1..2) == subSet.grid().rowList

There is also a filter {} alias with the same behavior, which can read more naturally for dataframe-style code:

def highPlaces = table.filter { it.place > 1 }
assert table.rows(1..2) == highPlaces.grid().rowList

Performing Calculations

Using the Apply Method

The apply method allows you to perform calculations on columns:

import se.alipsa.matrix.core.*

def account = Matrix.builder('account')
    .columns([
        'id': ['1', '2', '3', '4'],
        'balance': [12323, 23400, 45932, 77200]
    ])
    .build()

def ir = Matrix.builder('interest rates')
    .columns([
        id: [1, 2, 3, 4],
        interestRate: [0.034, 0.022, 0.019, 0.028]
    ])
    .build()

def accountAndInterest = account.clone().withMatrixName('accountAndInterest')
accountAndInterest['interestAmount', Double] = Stat.apply(account['balance'], ir['interestRate']) { b,r -> b * r }

// Result will have interest amounts calculated for each account
println(accountAndInterest.content())

Column Arithmetics

The Matrix class overrides basic mathematical operations to work on columns:

import se.alipsa.matrix.core.*

def m = Matrix.builder()
    .columns([
        id: [1, 2, 3, 4],
        balance: [10000, 1000.23, 20122, 12.1]
    ])
    .types(int, double)
    .build()

def r = Matrix.builder()
    .columns([
        id: [1, 2, 3, 4],
        ir: [0.041, 0.020, 0.035, 0.5]
    ])
    .types(int, double)
    .build()

def interest = [10000*0.041, 1000.23*0.020, 20122*0.035, 12.1*0.5]
assert interest == m.balance * r.ir

// You can also multiply all items in a column
assert [10000*0.05, 1000.23*0.05, 20122*0.05, 12.1*0.05] == m.balance * 0.05

Statistical Operations with Stat Class

The Stat class provides various statistical functions:

import se.alipsa.matrix.core.*
import static se.alipsa.matrix.core.Stat.*

def table = Matrix.builder().data([
    v0: [0.3, 2, 3],
    v1: [1.1, 1, 0.9],
    v2: [null, 'Foo', 'Foo']
])
.types(Number, Double, String)
.build()

def summary = summary(table)
println(summary)

// Calculate mean, median, standard deviation, etc.
def mean = mean(table.v0)
def median = median(table.v0)
def sd = sd(table.v0)
def variance = variance(table.v0)

Rolling, Cumulative, and Shift Operations

Matrix 3.7.0 adds time-series style helpers directly on Column and Matrix.

import se.alipsa.matrix.core.*

def prices = Matrix.builder('prices').data(
    day: [1, 2, 3, 4, 5],
    close: [100, 105, 102, 108, 110],
    volume: [10, 12, 11, 15, 18]
).types(Integer, Integer, Integer).build()

// Rolling windows on a single column
def movingAverage = prices.close.rolling(3).mean()  // [null, null, 102.33..., 105, 106.66...]

// Rolling windows across a matrix
def rollingMean = prices.rolling(window: 2).mean()

// Cumulative operations
assert prices.volume.cumsum() == [10, 22, 33, 48, 66]
assert prices.cummax().close == [100, 105, 105, 108, 110]

// Shift-style helpers
assert prices.close.lag() == [null, 100, 105, 102, 108]
assert prices.close.lead() == [105, 102, 108, 110, null]
assert prices.close.diff() == [null, 5, -3, 6, 2]

Rolling windows can also be centered, require fewer observations via minPeriods, and for matrices can be evaluated in a sorted order via by: while preserving original row order.

Indexing and Grouped Access

Matrix now supports database-style indexes on ordinary data columns:

def sales = Matrix.builder().data(
    country: ['USA', 'USA', 'UK', 'UK', 'USA'],
    quarter: ['Q1', 'Q2', 'Q1', 'Q2', 'Q1'],
    revenue: [100, 200, 150, 250, 300]
).types(String, String, Integer).build()

sales.createIndex('country', 'quarter')

assert sales.hasIndex()
assert sales.indexedColumns() == ['country', 'quarter']
assert sales.lookup('USA', 'Q1').revenue == [100, 300]
assert sales.lookup('USA').rowCount() == 3

Grouping now returns a GroupedMatrix instead of a plain map:

def grouped = sales.groupBy('country')

assert grouped.groupCount() == 2
assert grouped.get('USA').rowCount() == 3
assert grouped.level('country') == ['USA', 'UK'] as Set

def totals = grouped.agg(revenue: { Stat.sum(it) })
println totals.content()

If you need the pre-3.7.0 map style for older code, use grouped.toStringKeyMap().

Pipe-Style Chaining

For left-to-right transformation pipelines, use pipe {} or the Groovy | operator:

def result = prices
    .pipe { it.filter { row -> row.close >= 102 } }
    .pipe { it.orderBy('close', true) }

def result2 = prices | { it.selectColumns('close', 'volume') } | { it.cumsum() }

Using Groovy Integrated Queries (Ginq)

Groovy Integrated queries can be used on Matrix rows for powerful data manipulation:

import se.alipsa.matrix.core.Matrix

def stock = Matrix.builder('stock').data(
    name: ['Orange', 'Apple', 'Banana', 'Mango', 'Durian'],
    price: [11, 6, 4, 29, 32],
    stock: [2, 3, 1, 10, 9]
)
.types(String, int, int)
.build()

def result = GQ {
    from f in stock
    where f.price < 32
    orderby f.price in desc
    select f.name, f.price, f.stock
}

// You can also use Matrix's built-in query capabilities
def exp = stock.subset{it.price < 32}.orderBy('price', true)

// Create a new Matrix from the query result
def matrix2 = Matrix.builder(stock.matrixName).ginqResult(result).build()
println matrix2.content()

Will output

stock: 4 obs * 3 variables 
name  	price	stock
Mango 	   29	   10
Orange	   11	    2
Apple 	    6	    3
Banana	    4	    1

The Grid Class

A Grid is a uniformly typed table. The grid class contains static functions to operate on a 2D list structure.

Creating a Grid

import se.alipsa.matrix.core.*

def foo = [
    [12.0, 3.0, Math.PI],
    ["1.9", 2, 3],
    ["4.3", 2, 3]
] as Grid

// A typed way is usually clearer
def bar = new Grid<Number>([
    [12.0, 3.0, Math.PI],
    [1.9, 2, 3],
    [4.3, 2, 3]
])

// Calculate mean
println Stat.means(bar)

Output

[6.066666667, 2.333333333, 3.047197551]

Grid Operations

The Grid class provides several operations:

convert: Converts one column type to another
clone: Creates a deep copy of the grid
transpose: "Rotates" the grid 90 degrees
isValid: Checks if it is a proper grid or not

Elements can be accessed using the simple square bracket notation:

foo[0, 1] = 3.23
assert 3.23 == foo[0, 1]

Using Matrix from Other JVM Languages

If you're using the Matrix library from Java, you'll need to use the underlying methods instead of the shorthand notation:

// Instead of
myMatrix[2, 'id'] = 34;

// You must use
myMatrix.putAt(2, "id", 34);

// Instead of
def myGroovyVar = myMatrix[1, 2];

// You must use
var myJavaVar = myMatrix.getAt(1, 2);

There are some utility classes making Matrix creation less painful in Java:

import se.alipsa.matrix.core.Matrix
import static se.alipsa.matrix.core.ListConverter.*
import java.time.LocalDate
// In Groovy
def gEmpData = Matrix.builder()
    .data(
        emp_id: 1..5,
        emp_name: ["Rick", "Dan", "Michelle", "Ryan", "Gary"],
        salary: [623.3, 515.2, 611.0, 729.0, 843.25],
        start_date: toLocalDates("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")
    )
    .types(int, String, Number, LocalDate)
    .build()

// In Java
import se.alipsa.matrix.core.Matrix;
import static se.alipsa.matrix.core.ListConverter.*;
import java.time.LocalDate;
import se.alipsa.matrix.core.util.Columns;
import static se.alipsa.matrix.core.util.CollectionUtils.*;

var jEmpData = Matrix.builder().data( new Columns()
    .add("emp_id", r(1, 5))
    .add("emp_name", "Rick", "Dan", "Michelle", "Ryan", "Gary")
    .add("salary", 623.3, 515.2, 611.0, 729.0, 843.25)
    .add("start_date", toLocalDates("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27"))
)
.types(int.class, String.class, Number.class, LocalDate.class)
.build();

Conclusion

The matrix-core module provides a solid foundation for working with tabular data in Groovy and Java. Its intuitive API, powerful data manipulation capabilities, and statistical functions make it an excellent choice for data analysis and processing tasks.

In the next sections, we'll explore the additional modules that build upon this foundation to provide specialized functionality for various data formats and operations.

Go to previous section | Go to next section | Back to outline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matrix Core Module

Understanding the Matrix Class

What is a Matrix?

Matrix vs. Grid

Creating a Matrix

Creating from Groovy Code

Creating from a Result Set

Creating from a CSV File

Accessing Data in a Matrix

Using [] Notation

Using Column Names

Using Row Indices

Data Manipulation

Converting Data Types

Transforming Data

Adding a column

Subsetting Data

Performing Calculations

Using the Apply Method

Column Arithmetics

Statistical Operations with Stat Class

Rolling, Cumulative, and Shift Operations

Indexing and Grouped Access

Pipe-Style Chaining

Using Groovy Integrated Queries (Ginq)

The Grid Class

Creating a Grid

Grid Operations

Using Matrix from Other JVM Languages

Conclusion

FilesExpand file tree

2-matrix-core.md

Latest commit

History

2-matrix-core.md

File metadata and controls

Matrix Core Module

Understanding the Matrix Class

What is a Matrix?

Matrix vs. Grid

Creating a Matrix

Creating from Groovy Code

Creating from a Result Set

Creating from a CSV File

Accessing Data in a Matrix

Using [] Notation

Using Column Names

Using Row Indices

Data Manipulation

Converting Data Types

Transforming Data

Adding a column

Subsetting Data

Performing Calculations

Using the Apply Method

Column Arithmetics

Statistical Operations with Stat Class

Rolling, Cumulative, and Shift Operations

Indexing and Grouped Access

Pipe-Style Chaining

Using Groovy Integrated Queries (Ginq)

The Grid Class

Creating a Grid

Grid Operations

Using Matrix from Other JVM Languages

Conclusion