dtree

Very fast and flexible string searching using a DAG (Directed Acyclic Graph)

Features Installation Quick Start GitHub

Overview

dtree is a powerful C++ header-only library that enables very fast and flexible searching of strings based on multiple criteria using a directed acyclic graph (DAG) structure. It combines interval arithmetic with three-valued logic to provide sophisticated query capabilities.

Header-only library - Easy to integrate into your C++ projects

Key Features

DAG-Based Search

Utilizes directed acyclic graphs for efficient and flexible string searching across multiple criteria.

Interval Arithmetic

Advanced interval arithmetic with support for projective infinity and three-valued comparison logic.

Expression Trees

Build and evaluate complex expression trees with support for arithmetic and logical operators.

Three-Valued Logic

Implements Kleene logic (True/False/Maybe) for sophisticated query predicates.

SQL Interface

Toy SQL parser that translates simple SQL queries into native dtree queries.

Memory Comparison Ordering

Custom string and number types that maintain memcmp ordering for optimal performance.

Core Concepts

qtl::string

A string type similar to std::string_view that can contain any std::string while maintaining memcmp ordering. Can also contain out-of-band separator tokens.

qtl::number

Contains values from arithmetic types or decimal float, stored with memcmp ordering. Supports range E−6176 to E+6144 (IEEE decimal128 range).

qtl::kleen

Three-valued logic implementation: True/False/Maybe. Based on Kleene logic.

qtl::interval

Interval arithmetic with trinary logic comparisons. Intervals may contain the projective infinity, allowing sensible division by intervals containing zero.

qtl::expr

Expression trees supporting arithmetic and logical operators. Parse strings into expression trees and evaluate them with interval arithmetic.

qtl::store

Database abstraction using interval arithmetic and trinary logic to query data with expression predicates.

Requirements

  • C++ Standard: C++17 or later
  • Compiler: Clang 7.0.1+ or Clang 10.0.0+
  • Boost: Boost 1.69+ (for Spirit X3 parser)
  • Optional: boost::multiprecision::cpp_dec_float for decimal float support

Installation

Ubuntu Installation

sudo apt-get update
sudo apt install clang
sudo apt install make

# Install Boost 1.70+
wget -c 'http://sourceforge.net/projects/boost/files/boost/1.70.0/boost_1_70_0.tar.bz2'
bunzip2 boost_1_70_0.tar.bz2
tar -xvof boost_1_70_0.tar
cd boost_1_70_0/
./bootstrap.sh --with-toolset=clang
./b2 headers
sudo ./b2 install

# Clone the repository
git clone https://github.com/questrel/dtree.git
cd dtree
make

Docker Installation

# Build the Docker image
sudo docker build --tag dtree:0.1 .

# Run the container
sudo docker run -it dtree:0.1

# Inside the container, run tests
bash test.bash
Note: You will need a substantial VM to run dtree. Testing was done on n1-standard-4 (4 vCPUs, 15 GB memory) on GCP. The test may take several minutes and up to 32 GB.

Quick Start

Basic Usage

dtree is a header-only library. To use it, include the appropriate header in your C++ code:

#define TEST_H "qtl/module.h"
#include TEST_H

Building Tests

# Make all test executables
make

# Build with debug diagnostics
make CXX='clang++ -DDEBUG'

# Run fuzz testing
make module.fuzz.out
AFL_NO_ARITH=1 AFL_EXIT_WHEN_DONE=1 /usr/local/bin/afl/afl-fuzz \
  -i- -x fuzz/module.dict -o fuzz/module.out -- ./module.fuzz.out

Example SQL Query

After running bash test.bash, you can test queries:

SELECT ASCII_word, NUMERIC_TO_STRING(letter_product,letter_sum)
FROM word
WHERE letter_product*letter_sum<=letter_product+letter_sum

# Output: "A"_s,"1, 1"_s,

# Test basic query
SELECT * FROM word

Documentation

For detailed information about the concepts and architecture:

Library Modules

qtl/out.h - Templates for printing STL containers with proper formatting
qtl/string.h - String type with memcmp ordering and separator tokens
qtl/container.h - Vector and tuple types stored with memcmp ordering
qtl/number.h - Numeric types with memcmp ordering
qtl/bool.h - Three-valued Kleene logic implementation
qtl/bounds.h - Boundary types for distinguishing open and closed interval endpoints
qtl/interval.h - Interval arithmetic with trinary logic
qtl/tree.h - Generic expression tree implementation
qtl/operators.h - Operator table and precedence hierarchy
qtl/expr.h - String-to-expression-tree parser using Boost Spirit X3
qtl/store.h - Database abstraction with interval-based queries
qtl/sql.h - SQL parser for converting queries to store operations
qtl/randstream.h - Random stream transformation using arithmetic coding

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

All contributors are expected to abide by the code of conduct.