A Conversation with Manfred von Thun

SA: May I begin by asking you about the origins of Joy? You come to programming language design by way of philosophy. How exactly did that come about? Or are these activities unrelated?

MvT: The relation is mainly historical, and certainly not logical. My first fifteen year period in academia, first as a student and later as staff, was a comedy(?) of errors. I enrolled in Psychology and Philosophy because "I wanted to know how the human mind works". But rats and stats in Psychology and logic and scientific method in Philosophy quickly changed my directions. In particular, I became interested in the justification of induction, and I wrote my PhD in the field of inductive logic and logical probability. After my appointment here at La Trobe I had to catch up with deductive logic and have been teaching it at various levels. I have also taught courses in philosophy of science, philosophy of psychology, cybernetics, and of course various bits in first year philosophy. In 1978 two of us logicians started using the university's new DEC10 computer. My colleague soon settled on Snobol, and I on Pascal. We had no help at all, and at that time for me there was just the Pascal User Manual and Report and Wirth's Algorithms + Datastructures = Programs. I was particularly fascinated by the miniature compiler for the PL0 language. Most of my Symbolic Programming in Pascal, which I wrote over quite a number of years, was influenced by Wirth. I have given courses at all levels for the Computer Science Department.

SA: All your programming in Pascal was procedural. So how did you get into functional programming? How did Joy evolve?

MvT: In the early 1980's I came across the famous Backus paper "Can programming be liberated from the von Neumann style," and I was immediately intrigued by the higher level of programming in his FP. From my deductive logic I also knew about Quine's predicate functors and about Tarski's cylindric algebras. Like Schoenfinkel's and Curry's combinatory logic, Backus elimiminates not only assignable variables but also formal parameters, and Quine eliminates variables of quantification. I designed a strange little logic programming language based on some of these ideas, and implemented it in Prolog. It had some rather useless features - among others a "Wheatstone bridge" combinator which takes five binary relations to produce a new binary relation.

Joy then evolved from this in an entirely haphazard way: First I restricted the binary relations to unary functions, and this of of course was a dramatic change. Second, to allow the usual arithmetic operations with their two arguments, I needed a place from which the arguments were to come and where the result was to be put - and the obvious place was a stack with a few shuffling combinators, originally the four inspired by Quine. Third, it became obvious that all these combinators could be replaced by unary functions, with only function composition remaining. Finally the very different distinctively Joy combinators emerged, which take one or more quoted programs from the stack and execute them in a specific way. Along the way of course, lists had already been seen as just special cases of quoted programs. This meant that programs could be constructed using list operations and then passed on to a Joy combinator.

SA: So Joy emerged after a tortuous history. But by now it is quite stable. How would you describe its main features?

MvT: The language Joy is a purely functional programming language. Whereas all other functional programming languages are based on the application of functions to arguments, Joy is based on the composition of functions. All such functions take just a stack as argument and produce a stack as value, but there are a few which additionally take files and the file system as argument and value. Because of the stack, much of Joy looks like ordinary postfix notation. However, in Joy a function can consume any number of parameters from the stack and leave any number of results on the stack. Moreover, there are some stack functions which shuffle the top few elements of the stack around. So, although for example the program 2 3 + is an arithmetical expression in postfix notation and also in Joy notation (leaving 5 on the stack), the Joy program 5 dup * (leaving 25) is not arithmetic because dup is neither a number nor an arithmetic operator. That is why I say that much of Joy "looks like postfix". In reality this is what Billy Tanksley so appropriately called "concatenative notation". The semantics of this notation is summed up by: "The concatenation of appropriate programs denotes the composition of the functions which the programs denote". In such languages the application operation of the lambda calculus is uniformly replaced by composition.

What distinguishes Joy from (the functional subsets of) Forth and Postscript is the datatype of quoted programs. Many Joy functions expect quoted programs on top of the stack and execute them in different ways, effectively by dequoting. This is similar to Lisp's eval, but in Joy there are many of them, each performing the job of higher order functions. But whereas in other languages the higher order functions take abstractions as arguments, the combinators of Joy take quoted programs from the stack.

As a result, there are no named formal parameters, no substitution of actual for formal parameters, and no environment of name-value pairs. Consequently Joy has an exceptionally simple algebra, and its programs are easily manipulated by hand and by other programs. Many programs first construct another program which is then executed by one of the combinators.

SA: In Joy, a primitive like + operates on the stack. Should we think of this operation as replacing the top two elements x y with their sum x+y, or as two successive operations: first replace y with the unary function _+y and then replace x with x+y?

MvT: You spoke of "replacing" and "and then", which are imperative or procedural notions and hence have no place in a purely functional language. So the official answer has to be: the program 2 3 + denotes the composition of three functions, and the program 5 denotes one function and it is identical with the composition of the other three. How the addition is performed internally is a different matter, it might be by pattern matching, or by two push operations followed by an add (the obvious stack implementation), or any other (invisible) method that gives the right result.

But you also said "Should we think...", and that requires a different answer. Indeed imperative or procedural thinking can be very useful for Joy, both for explaining the meaning of the Joy primitives and also when writing programs. Here one should be eclectic and pragmatic - use whatever works. I would tend to use your first mode of thinking, and only sometimes the second. Two other modes would use message passing: a) send the parameterless addition message to the pair 2 3, or b) send the addition message with parameter 3 to the number 2. Maybe Forth programmers know more about the psychology of stack programming.

SA: You mention the combinators of Joy. What exactly is a combinator, and how do those of Joy differ from the combinators of Schonfinkel and Curry?

MvT: Combinators are second (or even higher) order functions which take first (or higher) order functions as parameters. They may return another function or instead immediately apply that function to some argument(s). When we say "John loves Mary and conversely", this might be analysed as "(loves AND CONVERSE(loves)) (John, Mary)". Here CONVERSE is a unary combinator (corresponding to the passive transform "John is loved by Mary"), and AND is a binary combinator. Similarly, "Bob admires himself" might be analysed as "SELF(admires) (Bob)", where SELF transforms the binary admires predicate into a unary one. These combinators are not restricted to predicates (functions which yield a truth value). For example, the unary squaring function might be defined with the binary multiplication function by SELF(*). In Schonfinkel's and Curry's system combinators similar to these serve to shuffle arguments of functions around, a job which in the lambda calculus is done by variables.

SA: K programmers should understand this. We would define SELF as a function which takes a binary function f and returns a unary function which applies f[x;x]:

  SELF:{[f]{f[x;x]}}
  g:SELF[*]
  g[3]
9

MvT: If a parenthesis free notation such as postfix is used, then there are two possibilities: either use SELF as a higher order function as before, or work on a stack, using a stack function dup:

5 SELF(*)
5 dup *

Joy, like Forth uses the latter. Note that dup is not a second order function at all. The same two possibilities would arise for prefix notation. In the same way the effect of CONVERSE can be achieved by a stack function swap. All the functions involved now are strictly speaking unary functions from stacks to stacks. Even multiplication is unary, although one can of course continue to think mainly in terms of how many parameters a function expects on the stack.

SA: Yes, I think a similar choice exists for K. We can define 'sqr' as either SELF[*] or as the unary function

sqr:*. 2#

which makes two copies of its argument, and then applies * "dot-wise", mapping the first copy to *'s first argument, and the second copy to *'s second argument.

MvT: But there are combinators which do not simply shuffle arguments around. One of these is the mapping function which applies a function to all members of a list to produce a list of results. In a fantasy notation:

MAP(SELF(*)) ([1 2 3 4]) = [1 4 9 16]

Or in postfix and in concatenative Joy notation:

[1 2 3 4] MAP(SELF(*)) 
[1 2 3 4] [dup *] map

Whereas MAP above is a second order function, Joy's map is just an ordinary unary first order function from stacks to stacks - it expects a list and what in Joy is called a quoted program on top of the stack, and it returns a stack with a single list on top, the list [1 4 9 16].

SA: This is where Joy and K diverge. For us, 'each' really is second-order: it takes a function and returns a function which applies to each element of a list. We are taught to understand

sqr'2 3 4

as having the structure

(sqr')2 3 4

Apply 'each' to 'sqr', and apply the resulting function to 2 3 4. It is even more obvious in classical APL, where functions such as 'sqr' are not first class. You can operate on a function only by feeding it as on operatand to one of a handful of "operators", such as /, and then immediately applying the resulting function to data. In K, +/ is a first class function, which can be assigned and passed into other functions as an argument.

MvT: Joy has a large number of such combinators which are really first order functions that do the work of second order functions because they expect one or more quoted programs on top of the stack. Consider the conditional IF-Then-Else combinator, where the if-part produces a truth value and the then-part and the else-part are again functions (and not statements).

IFTE(ifpart,then-part,else-part)

In Joy there is a combinator ifte with the most common syntax

[if-part] [then-part] [else-part] ifte

which expects three quoted programs on top of the stack. But actually quoted programs which map and ifte expect on top of the stack do not have to be pushed just before the combinator - they can be the result of stack shuffling or construction. Similar to ifte is a combinator linrec for linear recursion which can eliminate the need for many recursive definitions that would otherwise clutter up the symbol table and the programmer's mind:

[if-part] [then-part] [rec1-part] [rec2-part] linrec

The implicit recursion occurs between the [rec1-part] and the [rec2-part] quoted programs. There is a similar combinator for binary (tree-) recursion that makes quicksort a one-liner:

[small] [] [uncons [>] split] [swapd cons concat] binrec

SA: This seems quite powerful! Over the years, APL developers have experimented with a variety of recursion operators, hoping that, just as the classical adverbs such as 'reduction', 'scan', and 'each' enabled us to factor out explicit loops, so the new operators would help eliminate at least some explicit recursion. For example, in K, we can define a function 'apply_to_atoms' which takes a monadic function f and returns a g which, applied to a list, recurses to the atoms and applies f:

  apply_to_atoms:{[f]{:[@x;f x;_f'x]}}
  f:2*!:
  g:apply_to_atoms f
  A:(1 2;(3 2 4;(5 6)))

  A
(1 2
 (3 2 4
  5 6))

  g A
((,0
  0 2)
 ((0 2 4
   0 2
   0 2 4 6)
  (0 2 4 6 8
   0 2 4 6 8 10)))

But it's never been clear what the "best" set of K-recursors might be, partly because K functions can take any number of arguments, and partly because recursion can involve more than one function.

To what extent to do you think the recursive combinators of Joy will replace explicit recursion? APLers are familiar with the fact that it often takes time and one or two "aha's" before a problem breaks apart into an "array" solution, a program without loops and counters. And some problems are stubbornly, perhaps essentially loopy. It seems to me that a certain way of thinking needs to be acquired, analogous to "array thinking" in APL, which would let the programmer factor out the recursion pattern of problem such as quicksort. I'm thinking here of the way the nested recursion combinator arose as you contemplated Ackermann's function, which most programmers would have said required explicit recursion.

MvT: Yes, the recursion combinators are very powerful indeed. To have just one linear recursion combinator seems only possible in a language in which all functions have the same arity (like in Joy, where all are unary). I do not have a proof for this claim, except that I have not been able to define one for Lisp or Scheme, and if it were possible it would surely be in the literature. Maybe it can be done with clever macros. The same is true for tail recusion as a special case of linear recursion, and also for binary and nested recursion.

But probably it is not true that all recursion patterns can be captured by a small fixed menu of recursion combinators, especially the mutual recursion patterns. As an example, one might look at the intricate and highly specialised mutual recursion that occurs in the definition of the Lisp functions eval and apply. For those cases ordinary explicit recursive definitions seem unavoidable. On the other hand, I would estimate that at least 50% of recursions in Joy can be handled by the list combinators map, filter and fold, of the remainder at least 50% can be handled by the more general linear recursion combinator. What is left can often be handled by the binary and nested recursion combinators, and finally a small percentage does need explicit recursion.

As far as programming practice is concerned though, I have to confess that earlier on I sometimes caught myself first thinking recursively and only later using a combinator to remove the recursion. It does not happen much nowadays, the combinators do become second nature.

SA: In APL we are used to thinking of programs as special datatypes, distinct from lists, arrays, and quotations. But in Joy, lists and programs are the same datatype.

[10 20 30]

is a list with three elements, but it is also an example of a Joy program. Can you spend a moment explaining how this works?

MvT: If you ask primary school children to do the sum: "5 + 0 - 0 + 0" they will be perplexed and find it hard. Indeed, some problems are difficult because they really are so easy. In Joy, as in Lisp and some other list processing languages, lists can be heterogeneous, the elements can be of all sorts of types. So this is a perfectly acceptable list:

[11 22 true London Peter Santa-Claus foo dup *]

This list might have been the result of concatenating three lists:

[11 22 true] [London Peter Santa-Claus foo] [dup *]

The first list would normally be described as containing two numbers and a truth value, because that is the ordinary meaning of the symbols. Now consider the second list - are its members a city, two persons ..? No, sorry, there is no Santa-Claus, just having a symbol or name does not create a thing to which the name refers. What about foo ? It is a standard symbol for nothing in particular. Finally in the third short list, what does it contain? two symbols or the two functions which they normally stand for? If the latter, then with an appropriate definition even foo from the other list might stand for something. So, it is not at all easy to say what a list really is. But it is straightforward to understand concatenation of lists.

Much the same with all the other operations on lists: taking its first members, deleting its first member, sticking something on the stack in front of a list, writing out a list or any of its members. But the stack is as problematic as any other list: what is the result of taking the first element of [11 22] or of [London Peter] or of [Santa-Claus foo] or of [dup *]? Taking the first of a list leaves something on the stack, but what is it? the number 11, the city London (too big!), Santa-Claus (sorry), the duplication function? I think that we have to say that strictly speaking all lists and all stacks just contain representations of something that may or may not refer to something in reality.

In Joy, as you remarked, some lists like [11 22 true] can be executed by a combinator, and this one in particular results in three things being pushed onto the stack. Another that can be executed is [dup *], and, as we say, "it expects a number on the stack and replaces it by its square". But in our pedantic mode we should rephrase that.

In describing Joy I have used the term quotation to describe all of the above, because I needed a word to describe the arguments to combinators which fulfill the same role in Joy as lambda abstractions (with variables) fulfill in the more familiar functional languages. I use the term list for those quotations whose members are what I call literals: numbers, characters, truth values, sets, strings and other quotations. All these I call literals because their occurrence in code results in them being pushed onto the stack. But I also call [London Paris] a list. So, [dup *] is a quotation but not a list.

SA: It's interesting how K and Joy partition these concepts in different ways. In K, strings can be used to quote code:

"2+3"

As you would expect, it doesn't evaluate unless you apply "eval":

  ."2+3"
5

But a list evaluates immediately, so one can use K expressions to build lists "from within". For example

(!5;!3)

builds a list whose first item is 0 1 2 3 4 and whose second item is 0 1 2. But in Joy, supposing that we had the program

  3 enum
[0 1 2]

we couldn't use it in the following context:

[5 enum 3 enum]

since this is a program. Instead, we would construct the list out of its parts:

5 enum 3 enum unit cons

Of course, as you pointed out earlier, it is this very feature which allows the Joy programmer to construct programs out of simpler components.

MvT: You want the infra combinator, I think. It expects a list and above that a quotation. It then treats the list as a temporary stack and executes the quotation on that. More often than not the list that is supplied is actually an empty list. Example:

  [] [2 3 + 4 5 *] infra
[5 20]

Now if enum is already implemented as you described, producing a list, and if [5 enum 3 enum] is already on top of the stack, then all you need is just [] swap infra, which will leave [[0 1 2 3 4] [0 1 2]] on the stack. Of course if you want the two sublists concatenated, then you do an extra [concat] infra first, or in this case simply flatten.

SA: Programming in Joy over the last few years, I've come to think of the operators as falling into two classes: those which produce new information from data, such as + and cons, and those which move items on the stack into position where the informational operators can do their work, such as swap and dup. Languages with variables appear to have a productivity advantage here, since one can simply invoke data by name. Billy Tanksley has proposed closing the gap with something he calls "shuffle notation", where "before" and "after" stack patterns are specified. For example, if the stack is

.. 10 20 30

and you want:

.. 20 20 30 10 10

then you could say:

"abc-bbcaa" shuffle

which would have the same effect as:

rolldown [dupd] dip dup

What's your take on this and similar efforts to deal with the problem of "stack noise"?

MvT: Indeed, conventional languages with named formal parameters in definitions of functions can simply use the name to pick up the value of the corresponding actual parameter. For functions with many parameters this is indeed an advantage. But these languages need to carry around an environment of name-value pairs. Joy tries to avoid the latter completely, but pays a price when there are many parameters. One possibility would be to do what Forth does: when appropriate allow named formal parameters. An interesting alternative is Billy Tanksley's solution or some syntactic variant of that. It does not introduce formal parameters to be used in the body of a function, and also it need not be used just at the beginning of a definition. So it deserves very careful attention.

The proposal in effect defines an infinity of possible stack shuffling operators. Currently Joy has 12 inbuilt stack shuffling operators, a few others defined in the standard library and other, deeper shufflings can be expressed using the dip combinator. Two possiblities arise: 1) replace all the current Joy mechanisms by the new shuffle operator, or 2) allow both, and leave it up to the programmer to select what fits best. To make any decision one would want to see some programs expressed in the old and in the new notations. I have not done any experiments along these lines.

One other consideration concerns efficiency. The shuffle operator (or any syntactic variant) requires a before-after after string to be examined - and that means interpreting its pattern. A simple minded implementation would be slow. An only slightly more sophisticated one would look for common patterns, and replace for example "a-aa" shuffle by dup. It would use the interpreted form only in the less common but tricky cases, like your example. On the other hand, this decision might well be left to the programmer. So, my answer is not very definite at all, sorry.

SA: It might be interesting to explore algorithms which translate shuffle notation into a fixed vocabulary of stack-shuffling words. Are there small bases for which efficient algorithms exist? What if we allow the stack diagrams to express structural transformations, e.g.:

[ab]c -- [ac][bc]

I imagine that Brent Kerby would have opinions on this matter (and quite possibly an algorithm or two!)

MvT: Yes, Brent Kirby has done very interesting work on a combinatory algebra for concatenative languages, see his paper from the main page of Joy. For Joy itself the following seems to be an adequate base for shuffling the stack: the three simple operators swap dup and pop, together with the combinator dip. (This combinator expects a quotation on top of the stack and below that another value. It pops the two, saving the value somewhere, executes the quotation, and then restores the saved value on top. So, for example, [swap] dip will interchange the second and third element on the stack.) But I do not have a proof that these four primitives are indeed complete in the sense required.

However, if the transformations also take apart lists on the left of the transformation pattern, or construct lists on the right of the transformation pattern, then the list operations will be needed. In fact it helps to think of your transformation [ab]c -- [ac][bc] to be composed of two transformations, firstly [ab]c -- abc and secondly abc -- [ac][bc]. Then the first has to take the list apart, and in general the list operators first and rest should be adequate for that, but uncons and unswons would also help. The second transformation has to construct two lists, and in general the list constructors [] (empty list) and cons should do, but swons might also be useful. Both kinds of component transformations would also need some of the four general primitives mentioned earlier. An actual implementation would presumably collapse the two component transformations back into your original.

I have not done any work on implementing this sort of thing for Joy. If I were to do so, I would look at the by now well understood implementations of pattern matching that are used in languages such as Prolog, Miranda and now in Haskell. An older book by Peyton-Jones describes this in detail. For a while I used to think that these transformations would re-introduce something like lambda variables into Joy, and that there is a problem of what the scope of the introduced variables would be. But I was wrong, the scope is just the right side of the transformation, so the variables cannot actually occur elsewhere in the Joy program. So if somebody wants to volunteer ...

SA: I'm curious about Joy's internals. For example, in APL and its descendants, at least some of the bulk datatypes are implemented in consecutive memory locations without any links, and for that reason APL doesn't require garbage collection. I know that K uses a reference-counting mechanism. Since the typical K application uses a small number of large objects, this is quite efficient. So what choices have you made for Joy, and for what reasons?

MvT: Yes, the implementation of bulk datatypes in consecutive memory locations is in many ways ideal simply for efficiency reasons. I have often wondered what kind of a cousin of Joy might be implemented like that without sacrificing efficiency elsewhere. One worry of course concerns adding a new first element to a very large array: it requires copying the entire array. So, if this happens often, much effiency is lost. But must it happen often? Presumably the APL experience shows that it need not be a concern at all for a vast number of applications.

Then there is the possibility of a mixed implementation: mostly consecutive, but some links. This may or may not require memory management in some way, either as mark-sweep, or copying, or reference counting. The last cannot handle circular structures, but that may or may not matter. At any rate there seem to be many possibilities for languages similar to APL, J and K but with a concatenative syntax to explore. You yourself, Stevan, have done some interesting work in this field with cK, a concatenative version of K. I look forward to seeing more of it, not the least because it would offer an implementation that is more efficient than the current Joy because of the way bulk datatypes are handled. At the same time, I see no reason why a more sophisticated implementation of Joy could not use the same method wherever possible.

This is perhaps the best place to mention Billy Tanksley's work, which is inspired by the consecutive memory implementation of the stack of the Forth language. Since in Joy the stack is a sequence just like lists and other quotations, there is a possibility of using consecutive memory locations wherever possible and using links only where necessary. So I think Billy's work is a step in this direction, and I welcome it.

But you also asked about the choices made in the implemention of Joy, and for their reasons. My background had been very much in logic and other symbolic processing, and the structures one needs there tend to be trees of some sort. So I have had rather little use for arrays in consecutive locations, except for implementing trees. I had used Prolog a fair bit, but Lisp I only studied in detail from a purely theoretical perspective. It was clear to me from the outset that a uniform linked implementation for Joy would be simultaneously simple and sufficiently general. Since Joy then had no assignments or other destructive updates, and hence no possibility of circular lists, memory management by reference counting was a possibility that I did consider quite carefully. In the end I decided against it, largely on the grounds that a problem would arise if I ever wanted to allow circular lists. I knew that the problem would not be insurmountable, but for a first implementation it looked like a lot of work. It so happens that Joy is still purely functional and hence has no destructive updates of any kind, so a simple reference counting memory management would still be possible. That would have the advantage of not requiring occasional hiccups for memory management and hence better suitablitity for real time work. On the other hand, it is known that reference counting is more CPU intensive, spreading the total work more continuously. Whether Joy will remain purely functional for ever (look what they did to the original Lisp), I do not know at this stage.

SA: What is the current status of Joy? What are the plans for the future development?

MvT After several proof-of-concept implementations, the current implementation, written in unadorned C, evolved smoothly from one that was started 1995. In early 2001 John Cowan added files, floating point numbers and access to standard C functions, and he did a lot of cleaning up. He also added the option of not using my original garbage collector but using the professional BDW collector, an improvement which allowed space used for strings to be reused. This extended the functionality of Joy tremendously, and I a very grateful for all the work he did. Since then Nick Forde has also added a number of features, and I thank him for that.

The current implementation is essentially an interpreter. It translates (compiles) the external ASCII form of programs into internal tree code which is then interpreted in a rather conventional manner. No attempt is made to do any optimisation, and my earlier "clever tricks" I always came to regret fairly soon because they interfered with my garbage collector. There is a growing list of general and special purpose libraries, ranging over many kinds of possible programming applications, and most of my work in the last few years has been on those. For the immediate future I plan to examine and test the so far rather underutilised module system, by writing libraries for some specialised types such as big sets, trees and dictionaries of arbitrary basetypes.

Several other people have published other more or less complete Joy interpreters, written in ML and in Scheme, in the "concatenative" mailing group. At this point in time I have no plans to write a full compiler. A first version of such a compiler would presumably use C as an intermediate language and leave the generation of machine code to the C compiler. I would very much welcome if somebody were to take up the task.

12 December 2003