forked from HebiRobotics/MFL
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Moved `common.util` to `mat.util` * Removed unused utilities * Removed multi-release jar dependency due to lack of support for some build tools (e.g. obfuscation) * Improved streaming file sink implementation * Changed `StreamingDoubleMatrix2D` to be a test example * `MatFile` API ** Renamed `size()` to `getNumEntries()` ** Added `clear()` ** Added `getEntries()` for accessing `Iterable<NamedArray>`
- Loading branch information
Showing
153 changed files
with
14,014 additions
and
14,692 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,3 +2,4 @@ | |
*.iml | ||
/.idea | ||
settings.xml | ||
target/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
= Mat-File IO - Changelog | ||
|
||
== 0.2 | ||
* Moved `common.util` to `mat.util` | ||
* Removed unused utilities | ||
* Removed multi-release jar dependency due to lack of support for some build tools (e.g. obfuscation) | ||
* Improved streaming file sink implementation | ||
* Changed `StreamingDoubleMatrix2D` to be a test example | ||
* `MatFile` API | ||
** Renamed `size()` to `getNumEntries()` | ||
** Added `clear()` | ||
** Added `getEntries()` for accessing `Iterable<NamedArray>` | ||
|
||
== 0.1.2 | ||
* fixed binary incompatibilities with Java 6 that were introduced by compiling with JDK 9 | ||
|
||
== 0.1.1 | ||
* added license headers to all files | ||
|
||
== 0.1 | ||
* initial release |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,13 @@ | ||
MatlabTools - A collection of Java libraries for interfacing with MATLAB | ||
Copyright (C) 2018 HEBI Robotics. | ||
This library is free software; you can redistribute it and/or | ||
modify it under the terms of the Apache 2.0 License as published | ||
by the Apache Software Foundation. | ||
This library is distributed in the hope that it will be useful, | ||
but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
Mat-File IO - a Java library for reading and writing MAT Files | ||
|
||
Copyright (C) 2018 HEBI Robotics. | ||
|
||
This library is free software; you can redistribute it and/or | ||
modify it under the terms of the Apache 2.0 License as published | ||
by the Apache Software Foundation. | ||
|
||
This library is distributed in the hope that it will be useful, | ||
but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
|
||
If you have any questions please contact us via https://github.com/HebiRobotics/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,39 +1,185 @@ | ||
= MatlabTools | ||
= Mat-File IO | ||
|
||
The `MatlabTools` repository is a collection of MATLAB related Java libraries maintained by link:http://www.hebirobotics.com/[HEBI Robotics]. | ||
== Introduction | ||
|
||
Migrating our internal libraries to open source is an ongoing effort, so we will be adding more over time. Depending on the interest, selected projects may be moved into dedicated repositories in the future. | ||
Mat-File IO is a Java library for reading and writing MAT Files that are compatible with MATLAB's MAT-File Format. | ||
|
||
== Modules | ||
It's overall design goals are; 1) to support working with large amounts of data on heap constrained and allocation limited environments, 2) to allow users to serialize custom data classes without having to convert to temporary objects, 3) to provide a user-friendly API that adheres to MATLAB's semantic behavior. | ||
|
||
This library currently supports the https://www.mathworks.com/help/pdf_doc/matlab/matfile_format.pdf[Level 5 MAT-File Format] which has been the default for `.mat` and `.fig` files since https://en.wikipedia.org/wiki/MATLAB#Release_history[MATLAB 5.0 (R8)] in 1996. This includes `save` flags `-v6` and `-v7`, but not `-v4` or `-v7.3`. See MAT-File https://de.mathworks.com/help/matlab/import_export/mat-file-versions.html[Versions] for more info. | ||
|
||
This library is free, written in 100% Java, and has been released under an Apache v2.0 license. It works with Java 6 and higher, including Java 9 and 10. | ||
|
||
=== Mat-File IO | ||
== Maven Central | ||
|
||
link:./mat-file-io[Mat-File IO] is a Java library for reading and writing MAT Files that are compatible with MATLAB's MAT-File Format. | ||
Mat-File IO is in the Maven central repository and can easily be added to Maven, Gradle, and similar project managers. | ||
|
||
```XML | ||
<dependency> | ||
<groupId>us.hebi.matlab</groupId> | ||
<artifactId>mat-file-io</artifactId> | ||
<version>0.1.2</version> | ||
<version>0.2</version> | ||
</dependency> | ||
``` | ||
|
||
It's overall design goals are; 1) to support working with large amounts of data on heap constrained and allocation limited environments, 2) to allow users to serialize custom data classes without having to convert to temporary objects, 3) to provide a user-friendly API that adheres to MATLAB's semantic behavior. | ||
[NOTE] | ||
==== | ||
The initial release was done as version 0.x because we are still doing final integration tests. However, it is likely that there won't be any more breaking changes, so 1.0 is expected to be fully compatible and should be released in the near future. | ||
==== | ||
|
||
This library is free, written in 100% Java, and has been released under an Apache v2.0 license. It works with Java 6 and higher, including Java 9 and 10. | ||
== Basic Usage | ||
|
||
For more information and usage examples, please refer to the link:./mat-file-io[documentation]. | ||
The **link:./src/main/java/us/hebi/matlab/mat/format/Mat5.java[Mat5]** class contains static factory methods that serve as the starting points for the public API. The basic types (e.g. `Struct`, `Cell`, `Sparse`, `Char`) follow their corresponding MATLAB semantics (e.g. a struct can't be logical) and provide a fluent API for the most common use cases. All of the numerical types (e.g. `double`, `uint8`, `int16`, `logical`, ...) are represented by the `Matrix` type which offers a unified interface for handling numeric conversions. | ||
|
||
== Building Sources | ||
Similarly to MATLAB, all types are internally represented as a multi-dimensional `Array`. For example, a scalar struct is really a `1x1` array of structs. Most types offer convenience overloads for scalar, linear, 2-dimensional, and N-dimensional access. | ||
|
||
Below are some example snippets on how to use the API. For more examples, please refer to **link:./src/test/java/us/hebi/matlab/mat/tests/Mat5Examples.java[Mat5Examples]** and the various unit tests. | ||
|
||
=== Creating and Writing MAT Files | ||
|
||
A `MatFile` is a data structure that contains a collection of named `Array` variables. | ||
|
||
```Java | ||
// Create MAT file with a scalar in a nested struct | ||
MatFile matFile = Mat5.newMatFile() | ||
.addArray("var1", Mat5.newString("Test")) | ||
.addArray("var2", Mat5.newScalar(7)) | ||
.addArray("var3", Mat5.newStruct().set("x", Mat5.newScalar(42))) | ||
``` | ||
|
||
The created `MatFile` can be written to a data `Sink`. Sinks for various outputs (e.g. buffer, stream, or file) can be created via the `Sinks` factory or by extending `AbstractSink`. | ||
|
||
```Java | ||
// Serialize to streaming file using default configuration | ||
try(Sink sink = Sinks.newStreamingFile("data.mat")){ | ||
matFile.writeTo(sink); | ||
} | ||
``` | ||
|
||
The `MatFile` can also be written to disk using a memory-mapped file. We can initialize the file with the maximum expected size, and then automatically truncate it once the `Sink` is closed. | ||
|
||
```Java | ||
// Serialize to memory-mapped file w/ custom deflate level | ||
int safetySize = 256; // some (small) arrays may become larger when compressed | ||
long maxExpectedSize = matFile.getUncompressedSerializedSize() + safetySize; | ||
try(Sink sink = Sinks.newMappedFile("data.mat", (int) maxExpectedSize)){ | ||
Mat5.newWriter(sink) | ||
.setDeflateLevel(Deflater.BEST_SPEED) | ||
.writeMat(matFile); | ||
} | ||
``` | ||
|
||
=== Reading MAT Files | ||
|
||
A `MatFile` can be read from a `Source`. Similar to `Sink`, a `Source` can represent various types of data inputs. The read API is setup such that users can navigate through known file structures without requiring casts or temporary variables. | ||
|
||
```Java | ||
// Read scalar from nested struct | ||
try(Source source = Sources.openFile("data.mat"){ | ||
double value = Mat5.newReader(source).readFile() | ||
.getStruct("var3") | ||
.getMatrix("x") | ||
.getDouble(0); | ||
} | ||
``` | ||
|
||
=== Advanced Filtering | ||
|
||
In cases where users are only interested in reading a subset of `Array` variables, unwanted entries can be ignored by specifying an `ArrayFilter`. | ||
|
||
```Java | ||
// Filter arrays that follow some criteria based on the name, type, dimension, or global/logical flags | ||
try(Source source = Sources.openFile("data.mat"){ | ||
MatFile mat = Mat5.newReader(source) | ||
.setArrayFilter(header -> header.isGlobal()) | ||
.readFile(); | ||
} | ||
``` | ||
|
||
The filter gets applied only at the root level, so arrays inside a struct or cell array won't be filtered separately. | ||
|
||
=== Concurrent Compression | ||
|
||
The created sources include Java 9's `module-info.java`, but are otherwise backwards compatible with Java 6. The contained unit tests may use Java 8 syntax, so the project needs to be compiled with at least JDK 8. | ||
Almost all of the CPU time spent on reading or writing MAT files is related to compression. Fortunately, root entries are compressed independently from one another, so it's possible to do the work multi-threaded. | ||
|
||
Building with JDK 9 and above | ||
Users can enable concurrent reading by passing an executor service into the reader. In order to activate, the `Source` must also support sub-views (slices) on the underlying data (i.e. byte buffers or memory mapped files). | ||
|
||
```Java | ||
// Concurrent Decompression | ||
ExecutorService executor = Executors.newFixedThreadPool(numThreads); | ||
try(Source source = Sources.openFile("data.mat"){ | ||
MatFile mat = Mat5.newReader(source) | ||
.enableConcurrentDecompression(executor) | ||
.readFile() | ||
} finally { | ||
executor.shutdown(); | ||
} | ||
``` | ||
|
||
Concurrent writing unfortunately requires a temporary buffer for each root variable due to the size not being known ahead of time. The buffer allocation can be customized in case users want to use buffer-pools or memory-mapped buffers. | ||
|
||
```Java | ||
// Concurrent Compression | ||
try(Sink sink = Sinks.newStreamingFile("data.mat")){ | ||
Mat5.newWriter(sink) | ||
.enableConcurrentCompression(executorService) | ||
.setDeflateLevel(Deflater.BEST_SPEED) | ||
.writeMat(mat); | ||
} | ||
``` | ||
|
||
The table below shows a rough performance comparison of working with one of our production data logs. | ||
|
||
[width="100%",options="header",cols="a,a,a,a,a"] | ||
|==================== | ||
| Compression | Size | Threads | Write Time | Read Time | ||
| BEST_COMPRESSION | 144 MB | 1 | 280 sec | 3.5 sec | ||
| BEST_COMPRESSION | 144 MB | 8 | 47 sec | 0.8 sec | ||
| BEST_SPEED | 156 MB | 1 | 7.2 sec | 3.6 sec | ||
| BEST_SPEED | 156 MB | 8 | 1.5 sec | 0.8 sec | ||
| NO_COMPRESSION | 422 MB | 1 | 0.07 sec | 0.2 sec | ||
|==================== | ||
|
||
The data set was very multi-threading friendly (33x [95946x18] double matrices on the root level) and first loaded into memory to avoid disk access bottlenecks. The tests were done on a quad core with hyper-threading (Intel NUC6i7kyk). | ||
|
||
=== Serializing Custom Classes | ||
|
||
We often encountered cases where we needed to serialize data from an existing math library. Rather than having to convert the data into an API class, we added the ability to create light-weight wrapper classes that serialize the desired data directly. | ||
|
||
In order for a class to be serializable, it needs to implement the `Array` interface (easiest way is to extend `AbstractArray`) as well as the `Mat5Serializable` interface. Below are examples: | ||
|
||
* link:./src/test/java/us/hebi/matlab/mat/tests/serialization/EjmlDMatrixWrapper.java[EjmlDMatrixWrapper] serializes link:http://ejml.org[EJML]'s `DMatrix` type | ||
|
||
* link:./src/test/java/us/hebi/matlab/mat/tests/serialization/EjmlSparseWrapper.java[EjmlSparseWrapper] serializes link:http://ejml.org[EJML]'s `DMatrixSparseCSC` sparse matrix | ||
|
||
* link:./src/test/java/us/hebi/matlab/mat/tests/serialization/StreamingDoubleMatrix2D.java[StreamingDoubleMatrix2D] streams incoming row-major data into temporary files and combines them on serialization | ||
|
||
== General Notes | ||
|
||
=== Memory Efficient Serialization | ||
|
||
The MAT 5 format stores all data fields with a header tag that contains the number of bytes and how they should be interpreted. Rather than writing into temporary buffers to determine the serialized size, we added ways to pre-compute all deterministic sizes beforehand. | ||
|
||
The only non-deterministic case is compressing data at the root level, which we can work around by writing a dummy size and overwriting it once the final size is known. Thus, enabling compression requires the root level sink to support position seeking (i.e. in-memory buffers, memory mapped files, or random access files). | ||
|
||
=== Support for Undocumented Features | ||
|
||
Unfortunately, MAT 5 files have several features that aren't covered in the official documentation. This includes most of the recently added types (`table`, `timeseries`, `string`, ...), `handle` classes, `function handles`, `.fig` files, `Simulink` outputs, etc. | ||
|
||
Our current implementation supports reading all of the `.mat` and `.fig` files we were able to generate. It also supports editing and saving of the loaded MAT files, e.g., adding entries, changing matrices, or using a different compression level. However, changes to the undocumented parts, such as setting a property on a `handle` class, will not be saved. | ||
|
||
== Building Sources | ||
|
||
The created sources include unit tests that make use of Java 7 and 8 syntax, so the project needs to be compiled with at least JDK 8. | ||
|
||
mvn package | ||
|
||
Building with JDK 8 | ||
For more information, please check the CI build-script link:Jenkinsfile[] | ||
|
||
== Acknowledgements | ||
|
||
https://github.com/diffplug/matfilerw[MatFileRW] (active fork of https://github.com/gradusnikov/jmatio[JMatIO] maintained by link:http://diffplug.com/[DiffPlug]) served as an inspiration for parts of the implementation as well as a source for test data. We ended up porting and supporting all of their unit tests with the exception of `Base64 MDL` decoding (which we couldn't figure out the use case for). | ||
|
||
mvn package -PnoJava9 | ||
The implementation for reading the undocumented `MCOS` (MATLAB Class Object System) data is based on https://github.com/mbauman[Matt Bauman]'s http://nbviewer.jupyter.org/gist/mbauman/9121961[reverse engineering efforts] as well as MatFileRW's implementation by https://github.com/MJDSys[Matthew Dawson]. | ||
|
||
For more information, please check the CI build-script link:Jenkinsfile[] | ||
`Preconditions` was copied from link:https://github.com/google/guava[Guava]. |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.