Sunday, May 10, 2020

TinyDocumentDB - part 3 - querying for data (the plan)

Alright, so this is completely new ground for me. I've never written any kind of dynamic query function for real before. At least not where I also had to design the indexing functionality. I'm writing this part of the test even before I've started to write any code at all since I need to get a grip of the requirements here.

Later on, I do need to add some kind of schema support to divide different types of entities from each other. At the moment, I'm just going to ignore that fact and treat the data as a single entity. From a tech perspective, it's the same thing if it's one or multiple entities in the database.

The plan for upserting data

Let's start out with a simple definition of what we should accomplish:
  • As data flow into the document database (as an INSERT or as an UPDATE, knowns as an  UPSERT), we need to pick up the values of certain fields and store them in some kind of index with a reference back to the original data. (hmm, what about a feature to be able to search historical data).
  • When we search for data, we will query this index and return results based on that.
  • It will not be optimized in any way, just make it work.
Straight forward enough.

The diagram


A plan is nothing without a diagram.


The step-by-step plan for storing data (post diagram)


After the diagram, we have enough to create a step-by-step plan:
  1. On the left-hand side, we have data, we pass that data to the core-client.
  2. Store the data as fast as possible in storage.
  3. The core-client reads any index definition associated with the type of data (we only have one type... data...) and extracts the data from the fields that we define in the index definition.
  4. Read the index-file from storage, update it, and write it to storage.
Of course, from a perspective point of view, this will perform terribly under pressure. But we'll get to that later on.


The plan for querying data


I admit, I haven't given the query language details any thought at all and it will not look like this later on. 

But I do think that it's important to have a GET query mechanism available as well as a POST query mechanism. The GET variant will be easier to use ad-hoc and the POST can offer some more fine-grained usage.
 

The step-by-step plan for querying data (post diagram)


At a top-level, we would do something like this.

  1. Get the query to the Core.Client.
  2. Read the associated index file based on the entity type (we only have one)
  3. Parse the index file for search results and store any unique document keys (the index file could also contain more information on where in the document the information is found and such perhaps?).
  4. Read all the data from storage and return it.
Straightforward enough. Now I have a rough plan to get started with the implementation.


Summary



So achieved so far: (blog post part number within [x])

  • [2] A REST API for reading and writing
  • [2] InMemory or File-system reader writer
  • [2] CI/CD pipeline using Github Actions
  • [3] A plan for indexing data and for querying it

New ideas/questions:

  • [2] How to handle schemas without making it complicated?
  • [2] Schema stitching?
  • [3] Storing more data in the index?

Next up:

  • Querying of data - simple indexing implementation


Thursday, May 7, 2020

TinyDocumentDB - Part 2 - The most simple API ever


So this is what I got so far. On the left side, we have a REST API with support for GET using an Id and two POST implementations.

One of the POST implementations needs an explicitly defined Id in the route, the other will use reflection to get the id from the content. At the moment, it just looks for a property named 'Id', but it should be definable later on if this is something that will stand the test of time.


The Core.Client is the coordinator at the moment for reading and writing data to some kind of storage. I've implemented two basic variants; an InMemoryReader/writer and a FileReader/Writer.


The API

Created as a .NET Core WebApi that serves as a runtime host for the Core.Client basically.


I only handle JSON at the moment and plan to keep it that way.


The reader and the writers


The reader and writer should simply read and store data to some kind of storage as fast as possible. It should not handle caching or any hocus pocus at all. (we will get to caching later on).

  public interface IStorageReader
  {
      Task<string> Read(string id);
  }

  public interface IStorageWriter
  {
      Task Write(string id, string content);
      Task Delete(string id);
  }
To illustrate how basic this is right now, I present the Write() method of the FileStorageWriter to you:
  public async Task Write(string id, string content)
  {
      var path = Path.Combine(FileSettings.BaseFolder, FileSettings.GenerateFileNameFromId(id));
      Directory.CreateDirectory(FileSettings.BaseFolder);
      await System.IO.File.WriteAllTextAsync(path, content);
  }

This code creates a path to the file by appending a base folder (hardcoded) and a filename generated from the SHA256 hash of the Id passed into the function. Then we write all the content to disk.

Summary


So achieved so far:

  • A REST API for reading and writing
  • InMemory or File-system reader writer
  • CI/CD pipeline using Github Actions

New ideas/questions:

  • How to handle schemas without making it complicated?
  • Schema stitching?
Next up:

  • Querying of data - simple indexing strategy



Wednesday, May 6, 2020

A Tiny DocumentDB-as-a-Service

It's been a while since I've blogged. About 2,5 years to be exact. Historically, this blog was all about Xamarin but I've decided to expand the surface area to just about anything regarding coding, electronics, and development. There will still be a lot of Xamarin of course.

In my current assignment, I've been doing a lot of backend development and have stumbled across many different data sources.

I must admit that I haven't worked with document databases before and honestly thought that all of this NoSQL-stuff was kinda weird. Now, I've changed my mind and also included document databases into my container of stuff that I love.

So, to pay tribute to this new love, I've decided to create the tiniest version of a document database as a service that I possibly can do and I'll add it to the TinyStuff collection of tiny stuff on Github in due time.

I will from time to time post about my progress here and of course, the code will be available for anyone to use.


The naive blue-print


TinyDocumentDb must (should?) support (no order of priority):
  • REST API
  • .NET Core WebAPI (and/or Azure Functions deploy) runtime
  • Caching
  • Extendable
  • Storage independent (think file system, Azure Blob Storage, permanent marker on a duck)
  • Partial document updates
  • Eventual consistency
  • Secure


So that's that for this post. The next will most likely a very simple sample API.