R&D Toolchain Labs

We are a team of experts that create next-gen tools for software development, bringing the latest research achievements to production. Our mission is to enhance developers productivity and efficiency, product quality and security with cutting-edge solutions. We leverage the latest technologies and methodologies to drive industry evolution.

AI for code

productivity

Boosting Large Language Models by structured code models and code analysis

#cooperation

Cooddy

quality

Source code analysis tool using Data-Flow Analysis and Static Symbolic Execution

#opensource

IdeaLS

productivity

Intellij IDEA plugin that turns IntelliJ IDEA into an LSP server and delivers the full power of IDEA's language support to LSP clients.

#opensource

JacoDB

quality

Pure Java library that allows you to get information about Java bytecode outside the JVM process and to store it in a database.

#cooperation

#opensource

KLEEF

quality

Symbolic Execution Engine.

#opensource

klogic

quality

Strongly typed embedding of relational programming language miniKanren into Kotlin.

#cooperation

#opensource

KSMT

quality

Unified Kotlin/Java API for various SMT solvers and SMT formula manipulation library

#opensource

Sudu Cloud IDE Backend

productivity

Scalable cloud native source code model and language server for IDEs and Github like services

Sudu Editor

productivity

Lightweight code editor with built-in support for C++ and Java code models.

#opensource

Sudu Cloud IDE DFS

productivity

Distributed file system (DFS) service optimized for IDE usage

USVM

quality

Universal Symbolic Virtual Machine is an ultimately powerful language-agnostic core for implementing custom symbolic execution based products.

#cooperation

UnitTestBot

quality

Automated unit test generation and precise code analysis.

#cooperation

#opensource

Micro R&D Projects

Symbolic execution + RL

Improvement of path selection strategy efficiency for symbolic execution by using Machine Learning/Reinforcement Learning

In this work, we propose a novel learning-based strategy able to effectively select promising states for symbolic execution to optimize specific metric (it might be test coverage, instruction coverage, etc.). We have used RL algorithms for learning policy, because path selection problem is naturally formulated in terms agent-environment interaction and getting reward. Also we have used sophisticated method for encoding world state - combination of hand crafted features, GNN embeddings and RNN memory.

Addition for Bear (C++)

Adding information about the project's linking structure to the result of Bear's work.

There is a JSON Compilation Database that is used in clangd and parsers to determine how files are compiled in a project. But this information does not fully describe the structure of the project, which is often needed for more accurate analysis or, in the case of UTBotCpp, for rebuilding the project in llvm bitcode, so link_commands.json was also used in UTBotCpp old description, but they were made for a very outdated python version of Bear; in this project, the export of this information is implemented on the new version of Bear, which is already written in C++.

Automatic inference of recursive invariants based on catamorphisms

Infer invariants of programs with complex data structures

Complex data structures such as binary trees are particularly challenging for program analysis techniques. For example, many symbolic execution tools have little support for them. Those that do support them quickly take up all the RAM and as a result cannot generate test coverage or reach the desired points in the code. Automatic inference of invariants for complex data structures will help such tools by taking on the burden of analyzing code fragments with complex structures. Since complex data structures are often arranged recursively, their systematic support requires the inference of recursive invariants. Catamorphisms are recursive functions with a simple schema which, on the one hand, allow systematic support of complex data structures, and, on the other hand, are simpler for automatic inference.

SMT-solver for bit vector and array theory

Performance tuning of SMT solvers for symbolic execution engines.

State-of-the-art SMT solvers are strikingly efficient in solving logical constraints of various theories: from linear integer arithmetic to algebraic data types and quantifiers. They provide rich API used by researchers in numerous domains. However, symbolic execution demands only limited theories: quantifier-free theories of integer and double bit-vectors and arrays. The goal of this project is to investigate how SMT solvers can be tuned to achieve better performance for specific needs of symbolic execution engines.

Cooddy + KLEEF

Integration of a Cooddy static analyzer with a KLEEF symbolic execution virtual machine.

The purpose of this project is to come up with a static analyzer for C/C++ which could automatically report many types of program defects while having low false positive rate. The idea is to quickly find a lot of defects by a fast imprecise approach of the Cooddy and then filter out false positives by a robust precise approach of the KLEEF.

Backup application for GitHub

Tool for backup GitHub organizations, include repositories content and metadata.

GitHub organizations provide a lot of opportunities for collaborative work. Inside the GitHub organization there is not only a code base, but also a huge amount of metadata: issues, pull requests, members and etc. Despite all the conveniences that GitHub provides, using GitHub organizations has a number of possible risks of data loss.

It is tool for backup GitHub organizations, include repositories content and metadata.

AI-guided symbolic execution

Layout for the Structure view in IntelliJ CE

Simplify IJ structure view tuning for custom languages

Structure view is one of the most-used in the IDE. Tuning it for the specific language or sub-language usually requires quite a bit of manual work by implementing a structure view adapter. But in reality it filters out the language tree and changes presentation a bit. For some simple cases it’s enough to specify custom icons and the list of language nodes to display in the IDE. It means that one can try writing some, let’s say JSON config that IDE saves to project settings and uses for enhanced displaying of various small sublangs or even big languages. Matching could be performed by, for example, class names, or code construct types. Benefit for the student: deep dive into the code model API, useful plugin as a result. At the same time, this task could be reduced to make it more simple, or made more complex to provide more customization.

KOSAT

Fast and light SAT solver

A SAT solver is an algorithm for solving the instances of the Boolean satisfiability problem (SAT). Most of the successful SAT solvers today employ the Conflict-Driven Clause Learning (CDCL) paradigm, which implements the depth-first search in the decision tree of a SAT instance, augmented with a plethora of different heuristics. Usually, the CDCL solvers are written in C or C++ in order to squeeze out all possible performance.

The goal of the KOSAT project is to implement a full-featured CDCL SAT solver in Kotlin language. Not only it allows for multiplatform usage, it also makes the source code easier to read and modify, compared to C/C++ projects. Meanwhile, thanks to the support for many relevant heuristics, Kosat shows sufficient performance for most tasks, and can be configured to suite the user needs. An additional educational feature of Kosat is the support for visualization of the CDCL process, which helps in the understanding of the algorithm and the SAT solver internals.

Domain classes generator for protobuf models

Relieve developers from writing boilerplate code for protobuf models and gRPC clients

Protogen is the tool that relieves developers from writing boilerplate code for protobuf models and gRPC clients by generating high-level domain object Java classes with concise API suitable for business logic. It’s easy to configure and integrate into a simple project with gRPC.

Many (if not the majority) of protobuf message specifications resemble domain entity descriptions very much. However, the code that is generated by the Java gRPC framework from protobuf specifications isn’t suitable for purposes other than network interaction. While domain entities still need to be expressed in the code concisely, there is a need to transform domain objects into protobuf objects and vice-versa. That implies writing straightforward, but repetitive and annoying code, which is both error-prone and hard to maintain. This is where Protogen comes to help.

Encoding java-bytecode in CIRCUIT-SAT for solving program equivalence problems

Reduce the Code Clone detection problem for java-bytecode programs

The Code Clone detection problem consists in finding the fragments of code, that implement identical functionality. It has numerous applications in practice, but is hard to solve, especially when the considered fragments are not similar to each other.

This problem has evident ties with the Logical Equivalence Checking (LEC) problem - for two Boolean circuits to decide whether they are pointwise equivalent.

The goal of the project is to reduce the Code Clone detection problem for java-bytecode programs from a certain class (with the support of operators from a restricted set) to LEC, and further encode it to Boolean satisfiability problem (SAT), in order to apply modern SAT solvers for its solving.

Site with demo repo: https://github.com/eqimd/transbyte-site