Big data technologies and retrieval architectures

To process and reason on large data sets we use big data technologies, generative AI and information retrieval architectures.

Data science models

We build models from data using data science techniques to analyse, enrich, reason, predict and prescribe actions.

Generative AI

We use the latest generative AI techniques, with embeddings and transformer-based models.

Solution development 

In our solution development of decision systems we combine analytic models with computational pipelines and distributed architectures.
more information
  • Distributed processing (Hadoop, Map/Reduce architectures).
  • In-memory computing (Spark).
  • ElasticSearch / Lucene.
  • Custom information retrieval indices.
  • Custom string matching indices (suffix arrays, finite state machines).
  • Lambda architectures.
  • Lakehouse (Databricks Delta Lake, Spark, mlflow).
more information
  • Machine learning, clustering, classification, pattern recognition.
  • Predictive analytics, prescriptive analytics.
  • Natural language processing.
  • Graph pattern matching and processing.
  • Recommendation Systems (content based, collaborative filtering, k-means similar, product/delivery based).
  • Knowledge Graphs.
more information
  • Large Language Models, combined with knowledge graphs and vector search for augmented retrieval (RAG).
  • LLMs (LLaMA-3-8B/70B, Falcon, Mixtral 8x7b, etc.)
  • Transformer based networks for text processing, Convolution neural nets (CNN) for Image Retrieval/Classification
  • using mainly the following platforms/tools: LangChain, Mosaic ML, MLflow, HuggingFace, Spark NLP / Databricks / Azure ML.
more information
  • Decision logic, predictive analytics, optimization & simulation, monitoring & learning, decision support for knowledge work, management of decision logic.
  • Computational pipelines with Big Data architectures.
  • Development of information retrieval systems and architectures (in-memory indexing and distributed architectures).
  • Model development in Java, Python, Keras/Tensorflow.
  • Development of DSL (domain specific language) solutions.

Relational and noSQL technologies

Besides traditional relational databases we use NoSQL technologies.

Natural language processing

With NLP we extract relations and meaning from text, using rule-based, statistical and deep learning NLP techniques.

ML / Deep learning

For learning we apply artificial neural networks and deep learning models.

Software development 

In our software development of decision systems we wrap our core solutions in applications or workflows using a full software development stack.
more information
  • Document-based stores (MongoDB).
  • Key-value stores (Redis).
  • Column-based stores (HBase, Cassandra).
  • Graph-based (Neo4j).
more information
  • Text mining, natural language processing.
  • Computational linguistics (stemming, POS tagging, phrase structure).
  • Entity & relation extraction.
  • Disambiguation.
  • Topic classification (Vector space, LSI, LDA).
  • Word embedding models (word2vec, Glove).
  • Sentiment & opinion mining.
  • Large Language Models
more information
  • Transformer based networks for text processing
  • Convolution neural nets (CNN) for Image Retrieval/Classification
  • Feed-forward NN (backprop, projection, SVM), Support Vector Machines (SVM), SOM / Kohonen.
  • Ensemble learning (Random Forest, Adaboost, Gradient Boosting).
  • Decision trees, Association rules, Market basket analysis
  • Graph reasoning (random walk).
  • Probabilistic graphical models (Bayesian).
more information
  • Full stack system development and architecture services.
  • Application development - Java, Python, C++, C, etc.
  • Enterprise architectures with Java EE, client/server, SOA, micro-service architectures, orchestration.
  • NoSQL (MongoDB, etc.), ELK (ElasticSearch).
  • Core SaaS solutions tech stack: Java, Python, MongoDB, Vaadin, REST/GraphQL, OpenAPI.
Data → SolutionTo model a solution from data, we use machine learning, clustering and classification techniques, pattern recognition and predictive analytics.
Knowledge → RepresentationTo represent explicit knowledge structures, we use domain specific languages and knowledge graphs. Knowledge representations are not only human engineered by knowledge acquisition but often derived by reasoning and learning models.
Data + Knowledge
→ ML (Machine Learning) + MR (Machine Reasoning)
Often effective solutions are achieved by hybrid models, combining: - data driven machine learning (deep ML on data, ML on data features), - knowledge driven machine reasoning (MR on knowledge representations and knowledge graphs) - knowledge driven machine learning (ML on knowledge graphs). 
Text → ValueTo extract and identify value from text sources, we use natural language processing, text mining and linked data.
ProcessingTo power advanced information retrieval and algorithmic processing, we use memory-based index structures, distributed computing and big data technologies.
We use the latest technologies from Generative AI
> building models with 
  • Embeddings
  • Large Language Models
  • generative AI chain-of-thought.

> building solutions iteratively through 
  • RAG (Retrieval Augmented Generation), 
  • Fine-tuning (Embeddings, Fine-tune feature models), 
  • Pre-training (MosaicML).

> as regulatory-grade solutions : 
  • airgap deployment 
  • on your private trusted data 
  • running on your on-premise/private cloud infrastructure 
  • grounding LLMs with knowledge base and knowledge graph integration 
  • including vectorDB semantic search for RAG 
  • linked with your data management/governance process. 

Embeddings

Vector representations of words/sentences capturing meaning by occurrence of neighbors.

Capture semantic relationships word/sentence level.

Input for Deep Learning

Improve various NLP tasks (entity recongition, relation extraction, ..)

Large Language Models

Deep Learning foundational models.
Transformer-based.

Process entire paragraphs, documents, corpus.

Document understanding

Strong baseline from foundational model, transfer learning / fine-tuning on domain

Generative AI

LLM + Knowledge Graphs + Vector Search.

→ 

Chat/conversation

Retrieval Augmented Generation

Insights.

Proactive predictive and prescriptive analytics
Reactive/descriptive analytics activities and architectures are typically well supported by a linear process of data ingestion, consolidation, integration and aggregation to macro-level data models and prepared analysis dimensions. This since analysis and its underlying data needs, typically start from proposed hypothesis which one refines, confirms and supports using underlying data. This can be supplemented by looking for trends on macro- and/or aggregated level. The processes and their underlying data are typically all relying on operational data sources, which implies that there is often no distinction needed between operational data flows and models, and the pure analytical data flows and models.Proactive/predictive-prescriptive analytics can have quite a different functional and technical architecture, often implying a more cyclic process, starting from the data, building information and patterns, which are transformed after analysis to insights, to take action and create value. Both actions and changes in the environment give rise to changes in the data, which again drives the next cycle in the process.
Technologies in our default tech stack:


Azure components:








📂 For our experience in technology projects, see our Tech Portfolio.