A REFAL Interpreter

Что это такое?
The REFAL logo

REFAL (Recursive Functions Algorithmic Language) was designed by Valentin F. Turchin at the Moscow Institute of Physics and Technology in 1966, with the first implementation completed in 1968. The language was designed for artificial intelligence, natural language processing, and more natural construction of recursive algorithms (Turchin, 1989).

REFAL pioneered the concept of pattern matching as a programming paradigm and was the first languages to emphasize functional programming as the main computational model. This paradigm led to certain performance difficulties during the initial implementation of the language.

Valentin Toertsjin (1977).jpg
Valentin Turchin in 1977

Beyond his work as a physicist and cyberneticist, Turchin was also active politically (Turchin, 1978). His support for Andrei Dimitrievich Saharov, physicist and designer of Soviet thermonuclear weapons turned dissident, led to Turchin being effectively prohibited from working scientifically in the USSR. In 1977 he fled to the United States. His memoir does not mention the means through which he did this, as emigration from the Soviet Union was at that point strictly prohibited.

Language

REFAL is unique partially in its simplicity. Compared to other languages (even those of that time period) it contains far fewer features, control structures, built-in functions, or syntax. Yet the few features it has have been carefully designed so as to allow a level of expression that simply is not possible in other languages.

Symbols

Symbols as a concept in computer programming date back to the invention of LISP in 1958. LISP was the first widely used language to allow programmers to freely operate on data more complex than numbers. It was designed for artificial intelligence programming and was used at the Massachusetts Institute of Technology’s Artificial Intelligence lab for some of the pioneering research in the field.

Beyond just AI though, symbols generally allow for more flexible programming. A “symbol” abstractly can encompass almost any type of data. Building upon some of the ideas behind LISP, REFAL also incorporates symbolic programming heavily into its design. All data in REFAL is in some way a symbol, and all data structures are represented as certain combinations of symbols.

Patterns

REFAL is built around “patterns”, an encoding of certain characteristics of data. The simplest example of a pattern is a literal: a symbol or number that just matches itself. The pattern 123 matches the number 123. Similarly, hi matches the two symbols h and i in order.

Variables can be included in patterns to match arbitrary data. The pattern s.Single contains a variable named Single of type s. There are three possible types for a variable:

sA single symbol (e.g. a)
eAn expression: a series of symbols (e.g. hello)
tA single symbol or parenthesized expression (e.g. (yes) or f)

Once a variable in a pattern has matched (i.e. the input data is of a compatible type), the name given to that variable can now be used to refer to the data it matched. A variable that has already matched is known as “bound,” while one that has not yet matched is known as “free.”

Functions

A function in a REFAL program consists of a name and several associated sentences. A sentence is simply a pattern and a value to return if it matches. Functions are called using angle brackets, e.g. <Function-name arguments>.

The following example function demonstrates three simple sentences:

Example-function { 1 = One;
                   2 = Two;
                   s.Otherwise = Something-else; }

Recursion

Unlike most other languages, REFAL has no iteration constructs. Looping must be done through recursion.

For example, take the following function that reverses an expression:

Reverse { = ; /* return nothing */
         s.Start e.Rest = <Reverse e.Rest> s.Start; }

The first sentence says that if given nothing, the function should return nothing. The second sentence says that if given a symbol followed by an expression, return the reversed expression followed by the symbol.

Technical Details

I have implemented an interpreter for a subset of the REFAL language. The several features that remain missing are simply due to time constraints (I wrote it over one week); I hope to complete the implementation within the next few weeks.

In addition to the features present in the original REFAL-5 implementation, my interpreter features an interactive “shell” which allows users to type in expressions or functions and immediately see the result. The original implementation required users to save their entire program to a file and then run it, while mine allows interactive development.

Optimization

The main algorithm necessary in a REFAL implementation is a pattern matcher: the algorithm which determines if a given expression matches a given pattern. Several aspects of REFAL’s design make it difficult to implement such an algorithm efficiently.

Consider the pattern e.A e.Middle e.A e.End. Because each e variable can match any number of symbols, in order to determine if this matches we will need to try to match each against many different subsets of the input expression. In the worst case scenario, for an input n symbols long, we may need to make $n^2$ attempts before finding a way to match it to the pattern.

There is no way to improve the worst-case time complexity for the pattern matching in REFAL; it is simply a property of the fact that variables are allowed in patterns. However, it is both possible and necessary to optimize performance in non-worst-case scenarios.

Take for instance the pattern e.Start s.End. A naïve implementation might choose subsets of 0, 1, 2, etc symbols for the e.Start variable to match until one succeeds, however this is unnecessary. From a glance we can tell that the s.End variable will always take up one symbol (because it is of type s), so the first variable must match everything except that last symbol.

This property can be extended to many cases where limitations on the length of the subset that an e-variable must match are known. In non-worst-case scenarios this may greatly improve performance (in most uses with e-variables even to $O(n \log n)$). The REFAL programmer is advised (Turchin, 1989) to program with these performance characteristics in mind as this problem is impossible to avoid in implementing the language.

An optimization explicitly documented in the REFAL reference manual is the matching of parenthesis. If the data contains pairs of matching parenthesis, they are “linked” together to form one “token”.

Further Work

There remain several things to do before this implementation can be called complete. There are several dozen built-in functions (Turchin, 1999) that need to be implemented, as well as some minor language features (such as quoted symbols). The work to implement these will be fairly minor compared to what has already been completed.

Conclusion

The relative ease with which a language as abstract as REFAL can be implemented today speaks to the technological advances of over 50 years. Although I was much better equipped technologically than the Turchin, the process I went through to implement this was still similar to what he and his team did; albeit in C++ instead of Fortran and on a computer thousands of times faster.

It is a testament to the genius of its creators that a language as conceptually simple as REFAL can be so powerful. Even today it stands out among the crowd of languages for its elegance, simplicity, and expressiveness. I hope that my interpreter may be useful to somebody, even if it’s just to myself; that in a small way I may contribute to this language’s survival.

Leave a Reply

Your email address will not be published. Required fields are marked *