Design and Implementation of Programming Language with Generalized Sets, Types, and Functions as First-Class Values

Disclaimer: This is a very formal and old article about one of my old project of designing and making a language with mad, nearly esoteric, but still cool type system with composable gradual refinements. I felt like it should not be forgotten and put it into this blog; I will write some expansions of ideas described here soon. The implementation is of course unfinished and it’s interpreted instead of compiled, but still

Abstract

We present several design ideas for an experimental programming language featuring flexible and expressive type system that ensures error prevention while preserving readability, maintainability and convenience of writing code. We argue that this goal can be achieved by unifying syntactic approach to types, sets, and functions in order to make them composable and interchangeable, and by implementing possibility to use all of them as first class citizens. In this paper we give a high-level overview of the main language features along with some implementation details. We conclude with plans for future work.

Introduction

Many modern programming languages, such as Haskell, Scala, Agda, use rich type systems for preventing defects and specifying requirements for program entities and operations on them, increasing expressiveness and usability. There are two well-known approaches to advancing type systems, that draw much attention nowadays: dependent types and refinement types. While dependent types allow types to depend on values, refinement types provide an ability to augment types with logical predicates. Dependent types typically require higher amounts of assistance by programmers. Opposing to them, type systems based on refinement types have a high degree of automation and expressiveness of verification of program code. Refinement types do not require detailed proofs as systems with dependent types do, and they allow user to conveniently construct the constraints. In many existing implementations of systems with refinement types, the refinements are orthogonal to the base types system of the language. For example, in LiquidHaskell, which is an extension of Haskell programming language, constructions with refinement types are specified in the form of special annotations and used exclusively for static validation of the source code. In refinement type annotations specifying base types is mandatory even if they can be inferred from program code. The integration of logical predicates into the type system and building language around refinements potentially give us new ways of solving problems. In this paper, we attempt to design a language with a novel approach to type refinements.

Generalization of Sets, Types, and Functions

The syntactic and semantic possibilities of systems with refinement types can be extended by unifying the approach to the usage of types, sets, and functions. Let’s see what we can do. We use types to annotate values to limit them to some set of values. Firstly, we can do exact same thing with sets.

x : Int
y : {1, 2, 3, 4, 5}

In this example, we set x to be a variable of base type of integers and y to be in a set.

We can apply the same approach to predicates:

Even x = (x % 2 = 0)
a : Even
// suppose Lowercase 
   is predicate for lowercase strings
myString : Lowercase
x = 0

As can be seen from these examples, predicates and sets are possible to be used literally in place of types: there is no special syntax to distinguish them. To make such behaviour possible, we should introduce a program entity that generalizes predicates, types, and sets, and takes a value as an argument and returns it if the particular condition is met. If the predicate is not satisfied, value should not be returned and this case should be considered as type error.

Now let’s see how these definitions can be used inside functions.

myFunction (x : Even) = div x 2

Successful type checking when using this function implies that intersection of domain of function (expected type of argument) with actual argument is non-empty.

In the designed programming language, usage of these unified program entities is carried out by the same generalized operations: intersection, union and other, including user-defined operations. Another important feature is the ability to use arbitrary functions, including anonymous ones, at the type level.

Here we show some fragments of the program code that illustrates these features.

Nat = Int, (>=0)

In this example, we define type of natural numbers as an intersection of integer base type and positive predicate expressed with partially applied more-or-equal operator. Let’s use them in the defininition of a function.

myFunction y (x : Even, 
                  Nat {1, 2, 3, 4, 5}
                  (<y)) 
    : {Int & (>1)} 
    = {x + 1}

myFunction takes two arguments: y and x. The second argument is even natural number from the set containing concrete values. The second argument x is also less than the first argument y. Function returns a set of integers greater than 1. Return value is defined as set of all possible x increased by one. y here is constrained only by possibility to use of less-than operator.

In this language, type annotations do not forbid possibility of using functions inside them, if it is possible for them to compute their inverse functions. In some cases this is trivial and can done automatically (for example, increasing by a number, or returning a value that satisfies a condition), in other cases the behavior can be defined by user, making him able to prohibit some cases (such as division by zero for inverse of function of multiplying by number).

Here is an example of more advanced usage of inverse functions.

Person = has firstName : String
             lastName  : String
parsePerson (firstName ++ " " ++ lastName) 
    = Person firstName lastName
    
input <- read String
person : Person = 
    if parsed = parsePerson input
       then parsed
       else Person "John" "Doe"

At the first lines, we define Person record. Then, parsePerspon function is defined using pattern-matching-like syntax involving an inverse operation of concatenation function. Implementation of inverse concatenation (++) might be quite complex, but it can be made to parse more complex cases, such as regular expressions, or functions generating lists.

At the end of the listing, usage of parsePerson is shown. If input string wasn’t matched with parsePerson’s pattern, default value specified by alternative branch of the if expression.

Implementation

As a prototype implementation of the programming language, we developed static typechecker. It can also work as an interpreter and read-eval-print loop to analyze evaluation of expressions in this programming language, but we still can’t call it a full implementation of an interpreter as we still have not fully designed the input-output system and now use impure functions as placeholder for input and output in programs.

The interpreter converts the text of a program into a system of judgments about variables. The system can be checked by an SMT solver, and normalized to specific values ​​(if there is enough information to do this), dynamically evaluating expressions of the language.

We chose C++as language of implementation. Source code of implementation is available in our repository on GitHub . At the moment of writing this paper, the most actual code is in branch cast-eval.

Let’s point out main aspects of architecture of this type checker.

At current stage, implementation can be considered as naive, as it mostly involves evaluating every expression (including types and computations at the type level) using wide set of pre-defined operations, and the fact that entities of this language are based around inheritance hierarchy and utilizing C++ type system and templates.

We can divide the architecture of interpreter into two parts: “expressions” and “system”. There are also tools for parsing and running read-eval-print loop, debugging language’s behaviour, etc.

Expressions part contains around 44 classes describing logic of constructing, using and combining language’s entities. Code contained in second category is defining how basic computations and environment work.

Future Work

The usage of arbitrary functions in type annotations of a language with static typing is not possible due to the fact that static typechecker might not have enough information about dynamic variables. To retain this possibility, it is required to introduce dynamic language elements: the use of gradual typing would allow to seamlessly combine static and dynamic checks, leaving the latter for the runtime. Gradual types can also be combined with refinement types.

One of the nearest steps is to implement separating the stage of static typechecking from runtime checks by introducing gradual typing, and investigation of possible problems.

We will also explore embedding an SMT solver such as Z3 into language’s type system for solving complex systems. We also plan to carry research on other language possibilities.

References