How I built an full-cache ORM

Personal history

The core concepts of this ORM first appeared in 2008, when I had just started learning a new programming language called C# and needed a database. Of course, I had no intention of writing an ORM. I simply needed to store data and work with it. The most convenient approach was to map table columns to class fields and then work with the class logic.

At first, I manually unpacked data from a reader and then manually composed INSERT or UPDATE statements when it was time to persist an object. This was tedious, and I wanted to automate it. On top of that, annoying mistakes and typos would appear from time to time like sending the wrong field to the wrong place, or forgetting to add a field everywhere it interacted with the database, which led to “delayed” bugs. I decided to generate code directly from the database structure.

Much later, when I learned about the ORM concept, I discovered that this approach is called database-first, as opposed to code-first. When an application works with a database, the problem of keeping the application’s data model in sync with the database schema inevitably arises. This is a fundamental problem, and it hasn’t gone away over the years. There are exactly two options: either treat the database schema as the source of truth (which is what I did), or treat the code as the source of truth. In any case, one side has to be generated from the other.

I decided to parse the structure of MySQL tables and generate data classes for C#. In addition to the classes themselves, I generated structures responsible for materializing objects from a reader and composing INSERT/UPDATE commands. This was simpler and more reliable. I manually modified the database schema and ran the code generator. The generator fully rewrote the file with the class definitions, mapping MySQL data types to C# types. Thanks to partial classes, handwritten logic could live in a separate file, while only the accessors were subject to regeneration.

The next problem to solve was data synchronization between multiple clients connected to the same database. Today I know that the correct solution is to write an application server and route all data changes through it. Back then, that understanding didn’t exist. What did exist was MyISAM, which provided a per-table update timestamp out of the box. And that pushed me to use it.

The task was simple: when one client inserts or updates data, other clients should see those changes and reflect them. The solution was crude. Every 10 seconds, the client polled MySQL’s “meta table” that stored information about MyISAM tables. The last modification timestamp was compared to the previous value, and if it differed, the UI was told to redraw some list. Even if that list wasn’t the table itself but related entities, for example, employees.

And it worked perfectly until the data volume was small and the system was simple. Later it became obvious that rereading entire tables was, to put it mildly, a bad idea, especially as the data kept growing. A 10‑second update delay was also too long. Later there would also be the migration from MyISAM to InnoDB, but that came later. Even at that moment, it was clear this approach wouldn’t scale.

The solution was to fetch only changed data. That meant caching the data locally. I couldn’t think of anything smarter than keeping it in a Dictionary<int, T> (later replaced by List for significant memory savings). This introduced constraints: each table had to start with an int ID column, which also had to be a primary key with auto_increment. New data could be fetched using the maximum ID, but what about updated rows?

I had to add a mandatory timestamp column to every table. Now, on a timer, the client checked the meta table and for changed tables ran a SELECT like WHERE ID > %maxId% OR Updated > %maxUpdated%. With traffic drastically reduced, it was possible to poll every second, which was quite acceptable.

Another problem surfaced. Before synchronization could begin, the data had to be cached. The issue appeared when views started receiving data that referenced other data which hadn’t been loaded yet. At first, this resulted in KeyNotFoundException, which I “solved” with what I later learned was simply the null object pattern.

Instead of querying the dictionary directly, I introduced a repository. If the repository couldn’t find an object with a given ID, it returned a “null object” — an empty instance of that type. That’s how I learned to stop worrying and love NULL: the null object allowed access to all fields without littering the code with checks. I could build dependency chains "order → customer → customer details" without worrying about nulls. They simply couldn’t exist; at worst, I’d get an empty string instead of a name.

It quickly became clear that sending an entire object for updates was a terrible idea. Two clients could modify different fields of the same object within the same second, and the second update would overwrite the first. Fields had to be sent individually but only the ones that actually changed.

Around this time, I encountered the ORM concept. I briefly explored Entity Framework and decided that I didn’t need any ORM at all.

Ironically, by that point I already had my own ORM — very specific, but sufficient for my needs. Still, I stubbornly believed that all this complexity wasn’t needed in real development. "No time to sharpen the saw; we need to saw." To be fair, the data access syntax in my ORM was much simpler than in EF. I had a static repository with IEnumerable support, which allowed using LINQ to Objects directly on the repository. EF required declaring a context.

Using the ORM felt as if "the database didn’t exist". That was the guiding principle of the system. I simply pulled data from the repository as much as I wanted, however I wanted. It was fast. Very fast. Insanely fast. Unsurprisingly so: the data was already in memory, there were no database queries and therefore no network latency. ID access was O(1). Conditional filtering was O(N). There was no N+1 problem. If I needed a join, well, I didn’t (sorry for the pun). The entity already referenced another existing entity. Adding an object meant adding it to the repository, and it would magically receive an ID (with the caveat that inserts were not instantaneous; they involved an INSERT followed by SELECT last_insert_id). Changing a field meant simply assigning a new value to it. That change propagated to all clients within the same second.

At this point, an attentive reader has probably already identified the fatal flaw of this ORM: big data. Data that takes a long time to load and consumes large amounts of RAM on the client machine. Using a caching container for such data is, to put it mildly, irrational. It’s far more reasonable to fetch only the required data using WHERE clauses.

For this purpose, I created a low-level class that couldn’t cache yet but already knew how to map data. Query parameters included a raw SQL filter like WHERE YEAR(Orders.Created) > 2012. The container assembled the full SELECT, fetched the data, and mapped it to class fields. Its IEnumerable nature almost begged for a yield return–based implementation, resulting in a fast, pipeline-style construct that streamed freshly read data into the business logic as it came from the reader.

After getting acquainted with EF, I really liked the idea of LINQ to DB: you write queries against the repository as if it were an IEnumerable, and internally it translates the query to SQL and returns the required objects. I honestly tried to dig into DbLinq and extract what I needed, but quickly got lost in the code. Writing my own translator turned out to be simpler—and that’s what I did.

The translator parsed expression trees and translated what it understood into SQL. It didn’t understand much: it could build simple WHERE clauses, ORDER BY (including DESC), and LIMIT. On the C# side, it supported Where, OrderBy/OrderByDescending, Take/Skip, and First/Last/FirstOrDefault/LastOrDefault. Its beauty was its size: very compact, literally two or three screens of code.

It also supported some very specific features, such as translating enums and application-specific types. For example, a query like Sql.Calculations.Where(c => c.State == CalculationState.New) became WHERE c.State = 1 in SQL. I cared a lot about code readability.

Architecture

This was already the fourth type of repository I had introduced into the system. All three previous ones shared roughly the same syntax. To avoid confusion, here are the repository types explicitly:

Table<T>: a low-level data access mechanism with mapping. Works with raw SQL but returns strongly typed objects.
CachedContainer<T>: a lazy repository. On first access, it loads all data from the table. On subsequent accesses, it checks data freshness by comparing last_updated. Ideal for tiny tables that rarely change and must be accessed quickly. Built on top of Table<T>.
TrackedContainer<T>: a “hot” repository. Loaded once at client startup and then kept in sync. The main workhorse of the system. Built on top of Table<T>.
LinqContainer<T>: a “cold” repository. Translates expression trees into SQL queries, fetches data, maps it to entities, and returns the result. Built on top of Table<T>.

The code generator regenerated the file with entity classes from scratch, even pulling comments from the database. It looked roughly like this:

public partial class Brand : sqlobject
{
   	#region Private members

    /// <summary>;
    /// Customer's organization ID
    /// </summary>;
    private int customerID;
    
    /// <summary>;
    /// Brand category (Keywords)
    /// </summary>;
    private int categoryID = 0;
    
    /// <summary>;
    /// Is Luxury brand
    /// </summary>;
    private bool luxury;
    
    #endregion
    
    #region Properties
    
    /// <summary>;
    /// Customer's organization ID
    /// </summary>;
    public int CustomerID
    {
    	get { return customerID; }
    	set { if (customerID != value){ customerID = value; Update("CustomerID", value);}}
    }
    
    /// <summary>;
    /// Brand category (Keywords)
    /// </summary>;
    public int CategoryID
    {
    	get { return categoryID; }
    	set { if (categoryID != value){ categoryID = value; Update("CategoryID", value);}}
    }
    
    /// <summary>;
    /// Is Luxury brand
    /// </summary>;
    public bool Luxury
    {
    	get { return luxury; }
    	set { if (luxury != value){ luxury = value; Update("Luxury", value);}}
    }
    
    #endregion
    
    #region Methods

    public override bool Download(FieldReader fieldReader)
    {
    	bool changed = false;
    	if (fieldReader.Read(ref customerID, "CustomerID")) { changed = true; OnPropertyChanged("CustomerID"); }
    	if (fieldReader.Read(ref categoryID, "CategoryID")) { changed = true; OnPropertyChanged("CategoryID"); }
    	if (fieldReader.Read(ref luxury, "Luxury")) { changed = true; OnPropertyChanged("Luxury"); }
    	return changed;
    }
    
    public Brand CopySqlFields()
    {
    	var copy = new Brand();
    	copy.Name = Name;
    	copy.CustomerID = CustomerID;
    	copy.CategoryID = CategoryID;
    	copy.Luxury = Luxury;
    	return copy;
    }

    #endregion
}

The Download method and Read calls include built-in INotifyPropertyChanged support, making it possible to bind entities directly to the UI grid, for example.

protected void Update(string column, object value)
{
	if (Updates == null) Updates = new Dictionary<string, object>();
	Updates[column] = value;
	OnPropertyChanged(column);
	sql.ItemChanged(GetType(), this);
}

The Update method also supports INotifyPropertyChanged, collecting changed fields into an untyped dictionary and pushing updates to the database after 30 ms. Strong typing isn’t critical here because the connector’s AddParameter method accepts object anyway.

The FieldReader is straightforward: overloaded Read methods for each supported type.

public class FieldReader
{
	public HashSet<string> Fields = new HashSet<string>();
	private MySqlDataReader dataReader;
    public FieldReader(MySqlDataReader dataReader)
    {
    	this.dataReader = dataReader;
    	for (int i = 0; i < dataReader.FieldCount; i++) Fields.Add(dataReader.GetName(i));
    }
    
    public int GetId()
    {
    	if (Fields.Contains("ID"))  return dataReader.GetInt32("ID");
    	return 0;
    }
    
    public bool Read(ref int field, string column)
    {
    	if (!Fields.Contains(column)) return false;
    	var val = dataReader.GetInt32(column);
    	if (field == val) return false;
    	field = val;
    	return true;
    }
    
    public bool Read(ref int? field, string column)
    {
    	if (!Fields.Contains(column)) return false;
    	int index = dataReader.GetOrdinal(column);
    
    	int? val = null;
    	if (!dataReader.IsDBNull(index))
    		val = dataReader.GetInt32(index);
    
    	if (field == val) return false;
    	field = val;
    	return true;
    }
    
    public bool Read(ref bool field, string column)
    {
    	if (!Fields.Contains(column)) return false;
    	var val = dataReader.GetBoolean(column);
    	if (field == val) return false;
    	field = val;
    	return true;
    }
    
    public bool Read(ref DateTime field, string column)
    {
    	if (!Fields.Contains(column)) return false;
    	int index = dataReader.GetOrdinal(column);
    
    	//nullable DateTime converting to minvalue
    	DateTime val = DateTime.MinValue;
    	if (!dataReader.IsDBNull(index))
    		val = dataReader.GetDateTime(column);
    	if (field == val) return false;
    	field = val;
    	return true;
    }
    
    public bool Read(ref string field, string column)
    {
    	if (!Fields.Contains(column)) return false;
    	var val = dataReader.GetString(column);
    	if (field == val) return false;
    	field = string.Intern(val);
    	return true;
    }

    public bool Read(ref Guid field, string column)
    {
    	if (!Fields.Contains(column)) return false;
    	var val = dataReader.GetGuid(column);
    	if (field == val) return false;
    	field = val;
    	return true;
    }
    
    public bool Read(ref decimal field, string column)
    {
    	if (!Fields.Contains(column)) return false;
    	var val = dataReader.GetDecimal(column);
    	if (field == val) return false;
    	field = val;
    	return true;
    }
    
    public bool Read(ref Money field, string column)
    {
    	if (!Fields.Contains(column)) return false;
    	var val = Money.FromDecimal(dataReader.GetDecimal(column));
    	if (field == val) return false;
    	field = val;
    	return true;
    }
    
    public bool Read(ref float field, string column)
    {
    	if (!Fields.Contains(column)) return false;
    	var val = dataReader.GetFloat(column);
    	if (Math.Abs(field - val) &lt; float.Epsilon) return false;
    	field = val;
    	return true;
    }
    
    public bool Read(ref long field, string column)
    {
    	if (!Fields.Contains(column)) return false;
    	var val = dataReader.GetInt64(column);
    	if (field == val) return false;
    	field = val;
    	return true;
    }
    
    public bool Read(ref long? field, string column)
    {
    	if (!Fields.Contains(column)) return false;
    	int index = dataReader.GetOrdinal(column);
    
    	long? val = null;
    	if (!dataReader.IsDBNull(index))
    		val = dataReader.GetInt64(index);
    
    	if (field == val) return false;
    	field = val;
    	return true;
    }
}

One notable detail is that FieldReader tolerates missing fields. Dropping a column from a table won’t crash all connected clients.

The query translator:

public class QueryBuilder
{
	private readonly string tableName;
	private string where = "";
	private string orderby = "";
	private string orderbydesc = "";
	private int limit = 0;
	private string select = "*";
    public bool DefaultNull = false;
    public string Select =>; select;
    
    public string Conditions
    {
    	get
    	{
    		var cond = new List<string>();
    		if (where != "") cond.Add("WHERE " + where);
    		if (orderby != "") cond.Add("ORDER BY " + orderby);
    		if (orderbydesc != "") cond.Add($"ORDER BY {orderbydesc} DESC");
    		if (limit > 0) cond.Add("LIMIT " + limit);
    		return string.Join(" ", cond);
    	}
    }
    
    public QueryBuilder(string tableName)
    {
    	this.tableName = tableName;
    }
    
    public void AddExpression(Expression expression)
    {
    	var method = expression as MethodCallExpression;
    	var name = method.Method.Name;
    	var args = method.Arguments;
    	ParseCallExpression(args, name);
    }
    
    
    private void AddExpression(ref string expr, ReadOnlyCollection<Expression> args)
    {
    	if (!args.Any()) return;
    
    	var arg = args.Last();
    
    	if (!(arg is UnaryExpression lambda)) return;
    	if (arg.NodeType != ExpressionType.Quote) return;
    
    	var toadd = GetSqlExpression(lambda.Operand);
    	if (expr == "") expr = toadd;
    	else expr = expr + " AND " + toadd;
    }
    
    private void ParseCallExpression(ReadOnlyCollection<Expression> args, string name)
    {
    	switch (name)
    	{
    		case "Where":
    			AddExpression(ref where, args);
    			return;
    		case "OrderBy":
    			AddExpression(ref orderby, args);
    			return;
    		case "OrderByDescending":
    			AddExpression(ref orderbydesc, args);
    			return;
    		case "First":
    			AddExpression(ref where, args);
    			limit = 1;
    			DefaultNull = false;
    			return;
    		case "FirstOrDefault":
    			AddExpression(ref where, args);
    			limit = 1;
    			DefaultNull = true;
    			return;
    		case "LastOrDefault":
    			AddExpression(ref where, args);
    			limit = 1;
    			DefaultNull = true;
    			return;
    		default:
    			throw new NotImplementedException(name);
    	}
    }
    
    private string GetSqlExpression(Expression expr)
    {
    	if (expr is LambdaExpression predicate) return GetSqlExpression(predicate.Body);
    	if (expr is BinaryExpression bin)
    	{
    		var left = GetSqlExpression(bin.Left);
    		var right = GetSqlExpression(bin.Right);
    
    		if (bin.NodeType == ExpressionType.AndAlso) return left + " AND " + right;
    		if (bin.NodeType == ExpressionType.OrElse) return left + " OR " + right;
    		if (bin.NodeType == ExpressionType.Equal) return left + " = " + right;
    		if (bin.NodeType == ExpressionType.GreaterThan) return left + " < " + right;
    
    		throw new NotImplementedException(bin.NodeType.ToString());
    	}
    
    	if (expr is UnaryExpression un)
    	{
    		if (un.NodeType == ExpressionType.Not)
    		{
    			return "NOT " + GetSqlExpression(un.Operand);
    		}
    		if (un.NodeType == ExpressionType.Convert)
    			return GetSqlExpression(un.Operand);
    
    
    		throw new NotImplementedException(un.NodeType.ToString());
    	}
    
    	if (expr is MemberExpression ex)
    	{
    		if (ex.Expression.NodeType == ExpressionType.Parameter)
    			return tableName + ".`" + ex.Member.Name + "`";

    		var objectMember = Expression.Convert(ex, typeof(object));
    		var getterLambda = Expression.Lambda<Func<object>>(objectMember);
    		var getter = getterLambda.Compile();
    		var value = getter();
    		return GetValue(value);
    	}
    
    	if (expr is ConstantExpression c)
    	{
    		return GetValue(c.Value);
    	}
    
    	throw new NotImplementedException(expr.ToString());
    }
    
    private string GetValue(object obj)
    {
    	if (obj is string s)  return "\"" + s + "\"";
    	if (obj is int i) return i.ToString();
    	if (obj is Enum) return ((int)obj).ToString();
    	if (obj is DateTime dt) return $"'{dt:G}'";
    	throw new NotImplementedException("Unsupported argument type " + obj.GetType());
    }
}

The query translator implementation follows, along with LinqContainer and LinqProvider, which provide IQueryable<T> support. Their role is fairly straightforward.

public class LinqContainer<T> : TableContainer<T>, IQueryable<T> where T : sqlobject, new()
{
	private LinqProvider<T> provider;

	public void Add(T obj)
	{
		obj.Upload(table.SchemaTableName);
		subscriber.Launch();
	}

	public LinqContainer(string tableName, string schemaName = sql.MainSchema) : base(tableName, schemaName)
	{
		provider = new LinqProvider<T>(table);
	}

	public IEnumerator<T> GetEnumerator()
	{
		return provider.GetEnumerator();
	}

	IEnumerator IEnumerable.GetEnumerator()
	{
		return provider.GetEnumerator();
	}

	public Expression Expression => provider.Expression;
	public Type ElementType => typeof(T);
	public IQueryProvider Provider => provider;
}

public class LinqProvider<T> : IQueryProvider, IOrderedQueryable<T>
{
	private readonly SqlTable table;
	private List<Expression> expressions = new List<Expression>();

	public LinqProvider(SqlTable table)
	{
		this.table = table;
	}

	public IQueryable CreateQuery(Expression expression)
	{
		var type = expression.Type;
		if (!type.IsGenericType)
			throw new ArgumentException($"S0066: Don't know how to handle non-generic type '{type}'");
		var genericType = type.GetGenericTypeDefinition();
		if (genericType == typeof(IQueryable<>) || genericType == typeof(IOrderedQueryable<>))
			type = type.GetGenericArguments()[0];
		else throw new ArgumentException($"S0068: Don't know how to handle type '{type}'");
		return null;
	}

	public IQueryable<TElement> CreateQuery<TElement>(Expression expression)
	{
		var provider = new LinqProvider<TElement>(table);
		provider.expressions = new List<Expression>(expressions);
		provider.expressions.Add(expression);
		return provider;
	}

	public object Execute(Expression expression)
	{
		throw new NotImplementedException();
	}

	public TResult Execute<TResult>(Expression expression)
	{
		var returnType = typeof(TResult);
		expressions.Add(expression);
		var builder = new QueryBuilder(table.TableName);
		foreach (var expr in expressions)
		{
			builder.AddExpression(expr);
		}
		expressions.Clear();

		if (returnType == typeof(IEnumerable<T>))
		{
			var list = GetList<T>(builder.Select, builder.Conditions);
			return (TResult) Convert.ChangeType(list, returnType);
		}

		if (returnType == typeof(T))
		{
			var list = GetList<T>(builder.Select, builder.Conditions);
			if (list.Count == 0)
			{
				if (builder.DefaultNull) return default;
				throw new InvalidOperationException();
			}
			return (TResult) Convert.ChangeType(list[0], returnType);
		}

		throw new NotImplementedException(returnType.ToString());
	}

	private List<TResult> GetList<TResult>(string query, string conditions)
	{
		var list = new List<TResult>();
		FieldReader fr = null;
		foreach (var dr in table.Select(query, conditions))
		{
			if (fr == null) fr = new FieldReader(dr);
			TResult obj = Activator.CreateInstance<TResult>();
			if (obj is sqlobject sqlobj)
			{
				sqlobj.FromReader(fr);
				sqlobj.Download(fr);
			}
			list.Add(obj);
		}
		return list;
	}

	public Expression Expression => Expression.Constant(this);

	public Type ElementType => typeof(T);

	public IQueryProvider Provider => this;

	public IEnumerator<T> GetEnumerator()
	{
		var builder = new QueryBuilder(table.TableName);
		foreach (var expr in expressions)
			builder.AddExpression(expr);

		var list = GetList<T>(builder.Select, builder.Conditions);
		return list.GetEnumerator();
	}

	IEnumerator IEnumerable.GetEnumerator()
	{
		return GetEnumerator();
	}
}

Why all of this

The query mechanism based on IEnumerable and IQueryable is intentionally concise and expressive. Conditional queries do not mix application layers, which significantly reduces cognitive load when reading the code. This is a property shared by almost all ORMs.

In my case, the vast majority of repositories were TrackedContainers. Only the heaviest tables used LinqContainer. From a syntax perspective, however, they were almost indistinguishable. I could change the container type at will and get a working application with different runtime behavior.

Of course, there were nuances. The limitations of LINQ did not allow using business-logic methods as query predicates. Even EF can’t do that—only direct comparisons are supported. With IEnumerable<T>, however, this was possible.

The full-cache ORM concept itself enables persistent objects. This means that once you fetch a set of objects from a repository, you don’t get a copy of the data—you get a copy that carefully imitates the original. You can subscribe to PropertyChanged on an object and receive notifications about changes made elsewhere in the system or even in another client application. For the same ID, TrackedContainer always returned a reference to the same object instance.

Conclusion

As an ORM, this system stopped evolving after LINQ support was added in 2014. That doesn’t mean it died; it means its capabilities were sufficient for further work. The ORM itself did not evolve, but the ERP built on top of it certainly did. It is still running today, with real users—around 100–150 concurrent users. It is used for data collection, analytics, forecasting, and everything else ERP systems are generally used for.

It was an interesting experience of building an ORM that I never planned to build—and even actively denied building. Like a true bicycle builder, I started from the wrong end. While real ORM authors started with queries and then added caching, I started with full caching and only later added queries.

The system looks naive and wrong. I know that. But it works. And it does exactly what it was required to do.

Would I build something like this today? Probably not. I would rather extend EF with proper tracking.

Would I rewrite that system on top of EF? Definitely not. It is good in its own way.

What it always lacked was a separate application for model editing and data synchronization. I had to modify the test database and then apply the exact same changes to production manually. I also had to write my own database schema parser for code generation. If OrmFactory had existed back then, it would have been much simpler and far more convenient.