Tech Stack: What’s running the Aidvisors engine

Aidvisors are Artificially Intelligent Advisors evaluating the risk probability and suggesting preventive actions to better manage the risk related to your financial investments.

Below is the current technological stack as of Q3 2021.


Why do we provide our tech stack?

  • First, it allows to show how transparent, serious and diligent we are in everything we do. Would you invest with something you don’t understand?
  • Second, it allows to explain how we offer our customers a system that provides exclusive and privileged information from a very unique and secure AI system.
  • Third, it provides enough information to help understand how different we are from traditional algorithmic trading solutions or financial fake news. There are just too many solutions out there, including wannabe traders who provide their trades through excel sheets or cool videos. Although getting information from different independent sources is very important, just be careful.
  • Fourth, it may help other AI professionals out there. It feels better taking our time to build useful AI that works in practice, instead of wasting valuable time and resources on useless libraries, tools and services. If you need any help with your own AI project, contact us.

More on our technological approach

The Aidvisors we offer on this website are the results of research and development that started years ago. Because we are in the information age, and more specifically in the age of artificial intelligence, technology is evolving rapidly. As a result, the AI engine that supports our machine learning pipelines is an amalgamation of technologies that are in constant evolution, and it will keep changing in the years to come.

What we do is to find the best fit for the concepts we develop and adopt new technologies only when they reach the appropriate readiness level. When the available technologies are not mature enough or not doing what we need, we don’t hesitate to develop our own proprietary technologies. In such case we keep it simple because we know that as soon as a technology becomes ready we’ll switch.

Moreover, we try to be as autonomous and independent as possible. This allows us to implement changes swiftly, to keep our operation cost low and most important, to ensure Aidvisors will be available for the years to come. Actually, this is our way to implement risk management at the technological level to ensure durability of Aidvisors technology stack. We could not pretend to offer risk management at the investment level without managing the risk at the technological level.


Application Platform

Linux is a light and mature operating system, simply because it was developed as a light version of the already mature UNIX family of operating systems. With Kubernetes and Docker, Linux just became even more versatile.

That application platform allows us to run on-premise software and to leverage cloud platforms as well. After some trial and errors with the plenty of options that exist, we decided to run our own private cluster with Ubuntu and K3s which made deployment very Very VERY simple and reliable.

Database

There would be a lot to say about database. Briefly, if you screw up when you choose your database, you will have a slow system that doesn’t scale. Speed was key to us because there’s a lot of data to process, store and restore.

We compared MongoDB with other traditional relational databases and the performance gain was huge. With MongoDB and few other enhancements we were able to increase the system speed by a factor of 600. So we stuck with MongoDB and added Redis over time. Nowadays, we wonder if PostgreSQL would be a better fit but we didn’t benchmark it yet.

MongoDB benefits

What we like about MongoDB is that it supports JSON documents wonderfully and it even supports files of any size with GridFS. That makes it pretty easy to serialize any object or native data structure in any programming language without using ORM frameworks such as Hibernate. Such ORM frameworks are usually slowing your system in a way that can be difficult to trace and they require to maintain a lot of code just for serialization. But serializing data in JSON readable format or dumping in native format is usually pretty fast in most programming languages.

MongoDB pitfalls

The journey with MongoDB in term of scalability was definitely not an easy one. MongoDB has a few pitfalls in term of scalability that we struggled with:

  • First, the maximum size limit of 16MB per JSON document makes it hard to store huge data structures as we normally have.
  • Second, transactions required to ensure atomicity with multiple JSON documents also suffer from the same maximum size limit of 16MB, making it even harder to store huge data structures. We ended up developing our own simple transaction system by versioning JSON documents and leveraging the single document atomicity.
  • Third, MongoDB sharding is too complex, too centralized and too slow to handle huge database efficiently so we ended up partitioning data into smaller databases. The wonderful thing with database partitioning and application containerization is that its possible to move computation close to where the data actually resides, and that’s what we did our own simple way.
  • Fourth, backup and restore is not trivial as we would like when handling a lot of data, especially because you usually need as much RAM as you have data to backup. Because we use database partitioning we were able to develop simple solutions once again.

General database recommendations

A general set of recommendations for AI professionals out there:

  • Containerize everything, including the database instances. Move computation close to where the data resides. That’s actually easy to do with container orchestration.
  • Use the fastest storage available, such as NVMe SSD.

Backtesting Engine

There is a great article about: Should you build your own backtester? In our case we did and it was a good decision because it made it easy to integrate with custom data structure and produce any financial indicator we want. But once again, speed was key to us. We tested and benchmarked our backtesting engine against other backtesting software and it was faster by a factor of 20.

Briefly, we developed our backtester in C# to leverage speed and avoid managing memory. The backtester is based on vectorization, similar to AmiBroker. We developed a Python wrapper on top of Mono to create an ultra-fast cross-platform backtester. Actually, it’s faster to rerun trading algorithms instead of storing the results in database, which help with database scalability because we store less data.

Machine Learning Engine

The most important for us with machine learning was to remain independent, flexible and being able to use the latest algorithms when they are available. The best fit was to leverage the Python programming language with its great AI library ecosystem and strong community support.

For most machine learning models we use the Python library scikit-learn. For deep learning, we use Tensorflow and its Python API. We also use other Python libraries such as XGBoost. For hyperparameter optimization we use the Python library scikit-optimize.

Machine Learning Platform

The glue between all components above is what we call the machine learning platform. It allows to run the machine learning pipelines, collect the results in database and visualize their performance.

At the time we started there was nothing as nearly complete as Kubeflow, and still we hesitate to switch because many of our requirements are not supported by Kubeflow.

We developed a simple Python framework which allows to create projects, phases, tasks and operations that are all stored/restored in JSON. It supports to resume projects anytime, anywhere without having to recompute everything. It also provides metrics for all micro-tasks that are run. The metrics are both for system performance as for financial performance. The performance metrics were key for runtime efficiency.

The tasks are distributed through a very simple queue system called MRQ that matched our tech stack beautifully and so far it’s good enough to debug when required. The run of one machine learning pipeline over 20 years of data will normally end up in more than 500,000 micro-tasks. Debugging this for performance and accuracy is actually not easy. But MRQ made it simple enough.

Business Intelligence Platform

We created our own simple BI platform to visualize all the performance and financial metrics created by the machine learning platform.

It’s a simple ReactJS application that runs in our private Kubernetes cluster and it leverages the flexibility of the awesome React-Chart-Editor from Plotly. Getting any data from MongoDB to build any custom charts is really easy with that awesome React component.

This Website

And to finish that tech stack: WordPress. Yes the website is built with that good old buddy. Using ReactJS for a simple website is just too much work for a business which has to focus on artificial intelligence.

Essentially a bunch of JSON files are transferred automatically on the WordPress instance one-way only. The metrics from JSON files are displayed with PlotlyJS charts. The rest is just traditional WordPress. Thank you WordPress.

The good news is that it makes this website entirely independent from the machine learning pipelines. That makes the whole system highly secure. We cannot get our machine learning pipelines hacked through the online website. That’s how we offer our customers a system that provides exclusive and privileged information from a very unique and secure machine learning platform.


This is it for now!

Popular AI Reports