This document describes the design of the StorageLib software and how to use it.
StorageLib can be used as a replacement of a database for a single user application. All
data is stored in RAM and any data change copied to CSV (comma separated value) files. The programmer
only needs to define the data classes and their properties, StorageLib creates the code needed
for creation (new()
), update and delete (Release()
). StorageLib maintains parent child
relationships, i.e. when child.Parent
gets updated from Parent1
to Parent2
, StorageLib
removes child
from Parent1.Children
and adds child
to Parent2.Children
. StorageLib supports
transactions and data backup.
Introduction StorageLib usage
- Data Model
- Generated Code
-- Data Class
-- Data Context
- Application Code
Design principals
- Supported relationships
- Relationship notation
- Nullability indicates conditionality
- Relationship examples
- Children define its end of a relationship by having a property with the parent's type
- Parents define the details of a relationship
- Parents first
- Children control the relationship
- Keys
- Data Model
- Generated Data Model Classes
- DC Data Context
- Transactions
- Automatic data files compaction
Challenges
- What should happen when a child gets "deleted"
In a first step, the developer makes a Data Model. Here is the GetStartedDataModel
:
namespace YourNamespace {
public class Parent {
public string Name;
public List<Child> Children;
}
[StorageClass(pluralName: "Children")]
public class Child {
public string Name;
public Parent Parent;
}
}
It shows a Parent
class which can have many children, and a Child
class which must have
exactly one parent. Of course, it would be possible to make the Parent
conditional (nullable)
to cater for orphans, but the goal here is to use a very simple Data Model.
Based on the Data Model, the StorageClassGenerator creates some code like this (extremely simplified):
namespace YourNamespace {
public partial class Parent: IStorageItem<Parent> {
public int Key { get; private set; }
public string Name { get; private set; }
public IReadOnlyList<Child> Children => children;
readonly List<Child> children;
public Parent(string name, bool isStoring = true) {
Key = StorageExtensions.NoKey;
Name = name;
children = new List<Child>();
if (isStoring) {
Store();
}
}
public void Store() {DC.Data._Parents.Add(this);}
public void Update(string name) {
var isChangeDetected = false;
if (Name!=name) {
Name = name;
isChangeDetected = true;
}
if (isChangeDetected) {
if (Key>=0) {
DC.Data._Parents.ItemHasChanged(clone, this);
}
}
}
internal void AddToChildren(Child child) {children.Add(child);}
internal void RemoveFromChildren(Child child) {children.Remove(child);}
public void Release() {DC.Data._Parents.Remove(Key);}
public override string ToString() {
var returnString =
$"Key: {Key.ToKeyString()}," +
$" Name: {Name}," +
$" Children: {Children.Count};";
return returnString;
}
}
}
In reality, the Parent.base.cs
class has over 300 lines of code to support reading the
property values from and writing to a CSV file, transactions and more.
StorageClassGenerator.cs
creates for each data class (= a class defined in a Data
Model) a Key
property (gets later updated by the Data Store), a constructor and the following methods:
Store()
: Adds the instance to its Data Store in the Data ContextUpdate()
: Allows to change the property values of an instanceRelease()
: Removes the instance from its Data Store in the Data ContextToString()
: Shows the property values of the instance
If the class is a parent of another class, the following gets added:
- a
Child
collection (List
,Dictionary
orSortedList
) with the nameChildren
(can have any name) AddToChildren()
: Adds aChild
toChildren
RemoveFromToChildren()
: Removes aChild
fromChildren
Note: AddToChildren()
and RemoveFromToChildren()
are internal
. They get only called
from the Child
class.
Note: Parent.base.cs
is a partial class
. You can add your own code to Parent.cs
, which
will not be overwritten by StorageClassGenerator
. Parent.base.cs
calls many partial
methods
so that you can add additional functionality. They are not shown in the code samples here.
Child.base.cs
code:
namespace YourNamespace {
public partial class Child: IStorageItem<Child> {
public int Key { get; private set; }
public string Name { get; private set; }
public Parent Parent { get; private set; }
public Child(string name, Parent parent, bool isStoring = true) {
Key = StorageExtensions.NoKey;
Name = name;
Parent = parent;
Parent.AddToChildren(this);
if (isStoring) {
Store();
}
}
public void Store() {DC.Data._Children.Add(this)}
public void Update(string name, Parent parent) {
var hasParentChanged = Parent!=parent;
if (hasParentChanged) {
Parent.RemoveFromChildren(this);
}
var isChangeDetected = false;
if (Name!=name) {
Name = name;
isChangeDetected = true;
}
if (Parent!=parent) {
Parent = parent;
isChangeDetected = true;
}
if (hasParentChanged) {
Parent.AddToChildren(this);
}
if (isChangeDetected) {
if (Key>=0) {
DC.Data._Children.ItemHasChanged(clone, this);
}
}
}
public void Release() {DC.Data._Children.Remove(Key);}
public override string ToString() {
var returnString =
$"Key: {Key.ToKeyString()}," +
$" Name: {Name}," +
$" Parent: {Parent.ToShortString()};";
onToString(ref returnString);
return returnString;
}
partial void onToString(ref string returnString);
#endregion
}
}
The Child
code looks similar to the Parent
code, although Update()
is more complicated. The
additional lines keep the parent child relationship updated.
Finally, the StorageClassGenerator.cs
creates a Data Context, a file called DC.base.cs
. It
gives static
access to all stored data classes instances:
namespace YourNamespace {
public partial class DC: DataContextBase {
public static DC? Data {get;}
public static void DisposeData() {dataLocal?.Dispose();}
public CsvConfig? CsvConfig { get; }
public bool IsInitialised { get; private set; }
public IReadonlyDataStore<Child> Children => _Children;
internal DataStore<Child> _Children { get; private set; }
public IReadonlyDataStore<Parent> Parents => _Parents;
internal DataStore<Parent> _Parents { get; private set; }
public DC(CsvConfig? csvConfig): base(DataStoresCount: 2) {
Data = this;
CsvConfig = csvConfig;
if (csvConfig==null) {
_Parents = new DataStore<Parent>();
DataStores[0] = _Parents;
_Children = new DataStore<Child>();
DataStores[1] = _Children;
} else {
_Parents = new DataStoreCSV<Parent>();
DataStores[0] = _Parents;
_Children = new DataStoreCSV<Child>();
DataStores[1] = _Children;
}
IsInitialised = true;
}
protected override void Dispose(bool disposing) {
if (disposing) {
_Children?.Dispose();
_Children = null!;
_Parents?.Dispose();
_Parents = null!;
data = null;
}
base.Dispose(disposing);
}
}
}
The Data Context has for every Data Class a DataStore
(data only in RAM) or DataStoreCSV
(data stored in a CSV
file), depending on the CsvConfig
settings given in the constructor of DC
. A DataStore
is
like a dictionary, it gives access to a stored instance through the instance's Key
property.
Note: In C#, a child uses a reference to point to its parent. When stored in a CSV file, the
child uses the parent's Key
to link to it. During application startup, DC
reads all CSV files.
When it finds a Child
with a key value in the CSV file for a parent, it searches DC.Parents
for the parent with that key and sets Child.Parent
to that Parent
. DC
then adds the Child
to Parent.Children
. The application code usually doesn't need to read Key
, except to determine
if the instance is stored (Key>=0
) or not (Key<0
).
After the StorageClassGenerator.cs
has written all this code, the developer writes now his
application code:
using StorageLib;
using System;
using System.IO;
using YourNamespace;
namespace GetStartedConsole {
class Program {
static void Main(string[] args) {
try {
_ = new DC(new CsvConfig(...)); // 1)
var parent0 = new Parent("Parent0"); // 2)
var child0 = new Child("Child0", parent0); // 3)
var parent1 = new Parent("Parent1");
DC.Data.StartTransaction(); // 4)
try {
child0.Update(child0.Name + " updated", parent1); // 5)
parent0.Release(); // 6)
DC.Data.CommitTransaction(); // 7)
} catch (Exception exception) {
DC.Data.RollbackTransaction(); // 8)
reportException(exception);
}
} catch (Exception exception) {
reportException(exception);
} finally {
DC.DisposeData(); // 9)
}
}
}
}
Some steps in this very simple example:
- creates a new Data Context.
CsvConfig
provides configuration information like if CSV files should be used and where they are stored. - A parent gets created and stored in the Data Context, i.e. in
DC.Data.Parents
and in the CSV file. - A child gets created, stored and added to its parent's
Children
collection. - It is not mandatory to use transactions, but often helpful.
- The child's
Name
gets changed andparent1
becomes the new parent. This automatically removeschild0
fromparent0.Children
and adds it toparent1.Children
. In the child's CSV file, a new line gets added with that update information. parent0
gets removed from the Data Context (=deleted). In the parent's CSV file, a new line gets added indicating thatparent0
is removed.- During
CommitTransaction()
not much needs to be done, since the data is already written to the CSV files. - Upon a
RollbackTransaction()
, all lines written to the CSV files sinceStartTransaction()
get deleted and the instances in the RAM set back to the values they had beforeStartTransaction()
. - For performance reason, CSV file changes get written first into a RAM buffer and only written to the CSV file once the buffer is full or a short time has passed.
DC.DisposeData()
guarantees that all buffers are written to the CSV files when the application shuts down.
When the application starts again and the transaction was successful, the Data Context just
contains child0
linked to parent1
.
StorageLib supports the following relationships between 2 objects:
"one to conditional" (1:c)
"conditional to conditional" (c:c)
"one to many conditional" (1:mc)
"conditional to many conditional" (c:mc)
"one to one" (1:1) should not be needed, because in praxis this means they form together only 1 object.
"one to many" (1:m) is not possible, because in a 1:m relationship, both objects must be created precisely at the same time, one cannot exist without the other. But StorageLib allows only the creation (more precisely: storage, retrieval) of one object at a time. When an instance gets stored, all its properties must have legal values, i.e. cannot be null. Nullable indicates a 'c' relationship. Similarly, when at startup the CSV files are written back into RAM, only one file can be read at a time and only 1 object of that relationship created. A 1:m does not allow one object to exist without the other object of the relationship.
"many x to many x" (m:m), (m:mc), (mc:mc): Like with relational data tables, many to many is not supported. Instead a third child class must be designed additionally to the two parents, which need the mc:mc relationship:
Parent1 <── mc : 1 ──┐
mc │
↕ ╞══> Child
mc │
Parent2 <── mc : 1 ──┘
Child:Parent
1:c
c:c
1:mc
c:mc
Child can only be 1 or c (in this document always first)
Parent can only be c or mc (in this document always second)
1: single
c: conditional
m: multiple
StorageLib supports nullable reference types for children. A child property for a conditional child parent relationship (c:c or c:mc) is marked as nullable in the child.
If a parent can have only one child (1:c or c:c), it has a child property with the child's type. This property must be nullable, because at application start the parent is read before the child and for a short period that property is always null.
If the parent can have many children (1:mc or c:mc), it has a children collection. The collection itself cannot be nullable, but empty (= no child).
Note that the property names can be anything allowed by C#, but In these examples they are just Parent, Child and Children to highlight their role.
1:c
public class Parent {
public Child? Child;
}
public class Child {
public Parent Parent;
}
c:c
public class Parent {
public Child? Child;
}
public class Child {
public Parent? Parent;
}
1:mc
public class Parent {
public List<Child> Children;
}
public class Child {
public Parent Parent;
}
c:mc
public class Parent {
public List<Child> Children;
}
public class Child {
public Parent? Parent;
}
A child can have an unlimited number of child parent relationships
public class Parent1 {
public Child Child;
}
public class Parent2 {
public List<Child> Children;
}
public class Child {
public Parent1 Parent1;
public Parent2? Parent2;
}
Setting up a child parent relationship from a child is easy. Just add a property to the child class which has the parents property. In rare cases, the child needs to define some relationship details:
[StorageClass(areInstancesReleasable: false)]
public class Item {
public string Name;
}
public class OrderDetail {
public Order Order;
public int Quantity;
[StorageProperty(isLookupOnly: true)]
public Item Item;
}
In this example there is an Item class describing items that can be ordered and a class
OrderDetail which tracks the item belonging to a particular order. In this case, there
might be no need for the parent class Item to maitain a list with all orders which have
ordered this item. When a child property has the type of a parent, StorageClassGeneretor
will show an error if there is not corresponding property in the parent class, unless
[StorageProperty(isLookupOnly: true)]
specifies that the parent will not link back
to the child.
However, this is a rare example when further definition is needed with the child property.
Setting up a child parent relationship from the parent's end needs more information, for example if the children collection should be:
- List
- HashSet
- Dictionary<Key, Child>
- SortedList<Key, Child>
- SortedBucketCollection<Key1, Key2, Child>
Most often just a List is used. A Dictionary or SortedList allows to find a child quickly based on it's key. Example of storing company shares and their prices:
public class Share {
public string CompanyName;
public SortedList<Date, Activity> Quotes;
}
public class Quote {
public Share Share;
public Date Date;
public Decimal4 Amount;
}
It is the parent property which defines that for each date there can be only one quote. It usese a SortedList and not a Dictionary to define that. SortedList is much faster than Dictionary when new data gets always written at the end.
However, addional information might be needed, for example when the Child class has 2 properties
with the type Date
.
public class Share {
public string CompanyName;
[StorageProperty(childKeyPropertyName: "Date")]
public SortedList<Date, Activity> Quotes;
}
public class Quote {
public Share Share;
public Date Date;
public Date ReportedDate;
public Decimal2 Amount;
}
The additional information [StorageProperty(childKeyPropertyName: "Date")]
tells
StorageClassGeneretor which Date in the child should be used for the Quote Share relationship.
Sometimes it is helpful to access children by Date
, but several children might have the same
Date
. Example of storing of share sales:
public class Share {
public string CompanyName;
public SortedBucketCollection<Date, int, Sale> Sales;
}
public class Sale {
public Date Date;
public Decimal4 Amount;
}
SortedBucketCollection<Key1, Key2, Child>
sorts all children first by Key1. If several children
have the same value for Key1, they get sorted by Key2. This collection was specifically written for
StorageLib.
When 2 properties in a child link to a parent, the parent's children collection should contain the child only once, regardless if one or more than one child property link to that parent. For this reason, StorageClassGenerator replaces List with a HashSet when 2 or more child properties have the same parent class type.
The parent objects must be stored before the child or the children linking to it. StorageLib
stores all instances (objects) of one class in one file. On application startup, the file gets read
completely and all its objects created immediately, before the next file with the next
class gets read. It must be possible to create any parent without any children, meaning
a Child
property in a parent is always nullable (=conditional) and Children
is always
a collection, which can be empty (=mc).
The child controls to which parent(s) it belongs. For each relationship to a parent it has one
property. When a child property linking to a parent gets stored, the child gets added to the
Child
or Children
property of its parent. When a child changes (Update()
) the value of
its Parent
property from Parent1
to Parent2
, StorageLib
removes child from Parent1.Children
and adds child to Parent2.Children
.
The parents get written to a file without any children information. When the parent gets read first during startup, the parents have no children. The children get added to the parent once the children files gets read.
StorageLib ensures that when a child has a link to a parent, that child is also added to the children collection of the parent. "Children control the child parent relationship" is only relevant in the sense that adding or removing a child from its parent can only be achieved by changing the value of the child property linking to the parent. There is no method supporting removing the child directly form the children collection in the parent.
Each item has a key with a unique value, which is used by DataStore to retrieve, update or release (remove the item from the datstore = delete) an item. Keys are also used to establish child parent relationships (i.e. the child stores the parent's key). The key value gets assigned by DataStore and should not have any meaning for the rest of the software (except that a key of -1 shows that the item is not stored yet). A new key is always 1 higher than the highest already existing key value. This guarantees that items are always stored sorted in asscending key values, which allows for a binary search when retrieving an item in the DataStore.
There are 2 formats how a .csv file can store items:
- If the items are updatable or releasable, the first character of a .csv line indicates if it is an Add, Update or Relase record, followed by the key value.
- I the items are neiter updatable nor releasable, it contains only Add records and therefore the first character does not indicate the record type and even the key value is not written to the file. The first item has always the key 0 and all following items get their keys incremented by one.
Note: If keys are continuous, i.e if there is no key missing, a search is no longer needed to retrieve an item, DataStore can just use the item key to find the item. This works also when the key of the first item is not 0.
DataStore decides if keys are continuous like this:
AreKeysContinuous =
!AreInstancesReleasable ||
Count<2 ||
lastKeyValue - fistKeyValue + 1 == Count;
The Data Model defines all classes and their relationships among them as shown in Relationship examples. The StorageClassGenerator reads the Data Model and creates a) a Data Context class and b) for each class in the Data Model a new class supporting the creation, storing, updating and deleting (removal) of instances.
As with any other C# class, the constructor new() is called to create a new instance (object)
of that class. The
constructor has a isStoring parameter, which indicates if the object should get written
immediately to a file. Otherwise, the instance can get written later to the file by calling its
Store() method. A child can only get stored if all its parents are stored already, otherwise
new(isStoring: true)
an Store()
throw an exception.
StorageLib adds to each class a Key property. Its value is smaller 0 until the object gets stored, at which time the key gets a unique value, which will be biggest key used so far + 1.
None of the properties can be changed directly, they must be changed by calling the object's Update() method. This allows changing and writing the new content to the file of several properties at the same time. For performance reason, it is important that the writing does not get executed for every single property that changes.
If an object is no longer needed, Release() can be called. Release removes the object from the data context and adds an new entry to the file marking the object as deleted.
The DC gives access to all stored data. For each class in the Data Model, a DataStore
collection gets
added to the DC. A particular instance can get retrieved from the class' DataStore
through
the object's Key value. DataStore
implements IReadOnlyCollection for accessing
any data object. A class can indicate in the Data Model that it should be searchable through one
of its property values, like maybe Name. In that case, the StorageLib adds a
Dictionary<string, searchableClass>
to the Data Context. This Dictionary can be used like an
index in a database. StorageLib updates the Dictionary when objects get stored, updated or
released.
At startup, the DC reads all files and fills the collections with instances. When the program runs, data changes get automatically written to the files, although with a small time delay to write many changes at the same time, which improves performance. At application shutdown, the DC guarantees that all data is written (flushed) to the files.
DC has a static property Data, which holds all the collections. The application can access any data through DC.Data.CollectionName.
DC.Data.StartTransaction();
var parent = new Parent(..., isStoring: true);
var child = new Child(Parent, ..., isStoring: false);
child.Store();
child.Update(...);
if (ok){
DC.Data.CommitTransaction();
}else{
DC.Data.RollbackTransaction();
}
Any data change caused by new()
, Store()
, Update()
or Remove()
gets directly written to a
file, even before DC.Data.CommitTransaction()
is called. RollbackTransaction()
removes
whatever was written since StartTransaction()
from the files and restores the data in the RAM
to the values it had before the transaction started.
An Update()
or Remove()
changes already stored data. For performance reasons, the stored
data in the middle of the file does not get changed, but a new line gets added at the end
of the file with the update or deletion data. In RAM is always the latest data, while the
file contains a change history
A lot of disk space can be saved by writing a new data file containing only the actual data. The file with the change history changes its extension from .csv to .bak and a new .csv file gets written with the actual content as stored in RAM.
If the application does not shut down properly because of an exception, no compacted file might get written and the .csv file contains the history instead. When the application then starts again, the history will be executed, with the result that the data in RAM is the same as when the application stopped suddenly.
One challenge is that an instance cannot get truly deleted in .NET. It's only possible to remove all references to an instance.
Another challenge is that in a 1:c relationship, the child is not nullable and must always have
a parent. If a child has a parent, then StorageLib ensures that that parent has a link (reference)
back to that child. So if the child gets released, i.e. removed from the Children
DataStore
in the Data Context, the parent has still a reference to the child.
In theory, this problem could be solved by updating the Parent
property in the child
to null, which would remove the child from the Children
List
in the parent. That's not "nice"
from a design point of view, because releasing an item from the Data Context should not change the
property values of that instance. Also, in an 1:c relationship, it is not possible to set the
Parent
to null.
Even worse is a readonly 1:c relationship, where the Parent
property in the child is readonly.
When the child gets created, immediately a parent child relationship gets established. Once the
child should get "deleted", C# doesn't allow the reference to its parent to be changed, meaning
the parent will also keep a link to the child and the garbage collector can never remove that child.
As a consequence, even a deleted (released) child is still part of its parent's Children
collection.
It makes actually sense if also not yet stored children and parents can establish relationships. For example the user creates some complicated objects and only once everything is set up correctly he decides if the objects should be stored or discarded.
- Readme.md describes main features of StorageLib and gives a high level overview how StorageLib works.
- Setup.md describes how to install a local copy of StorageLib on your PC and how to setup VS for your own application using StorageLib.
- DataModel.md explains how to write your Data Model code, which defines the classes StorageClassGenerator will create for you.