|
This episode is about compile-time metaprogramming, and specifically, about implementing DSLs via compile-time metaprogramming. Our guest, Laurence Tratt, illustrates the idea with his (research) programming language called Converge. We started by talking about the importance of a custom syntax for DSL and took a brief look at the definition of DSLs by a chap called Paul Hudak. We then briefly covered the disctinction between internal and external DSLs. More to the point of this episode, we discussed the concept of compile-time metaprogramming, and the language features necessary to achieve it: in converge, these concepts are called splice, quasi-quote and insertion. We then looked at how the Converge compiler works, and at the additional features that are required to implement DSLs based on the metaprogramming features mentioned above. Using an example, we then walked through how to implement a simple DSL. Looking at some of the more technical details, we discussed the difference between the parse tree and the abstract syntax tree and at different kinds of parsers - specifically, the Earley parser used by Converge. In multi-stage languages (i.e. languages that execute programs and meta programs) error reporting is important, but non trivial. We discussed how this is done in Converge. We finally looked at how to integrate Converge's expression language into your DSL and how to package DSL definition for later use. The last segment look at the process of implementing a DSL in converge and about some of the history and practical experience with Converge. Lessons learned from building Converge wrap up the episode. TranscriptThis episode is on Compile-Time Metaprogramming or creating domain specific languages through syntax extension. In order to talk about this rather non-mainstream topic we have a guest. Our guest today is Laurence Tratt. He is a research fellow at the department of computer science at Kings College in London and he is co-lead of their software and systems modeling team.He is also the creator and inventor basically of the Converge programming language. This language implements the concepts we are going to talk about in this session. This is once again a session about language design with a specific focus on domain specific languages. It's going to be very interesting, have fun and again we recorded this episode at the JAOO Conference 2006 in Denmark in Aarhus and have fun. Welcome our guest Laurence Tratt. Welcome, thank you very much. So we are talking today about a fancy title: DSL implementation, a compile-time via syntax extension and also maybe another title could be Compile-Time Metaprogramming. Yes you could say that. You could also simplify it somewhat if you wanted to just say macros. As an academic we like to choose ridiculously long titles. Yeah and also there are probably differences with regards to macros as we know them in C++ or C and we will talk about that in a moment. So, I mean we have talked in previous episodes about DSLs. So we probably don't have to fill people in about what a DSL is but if you could give us your interpretation of what you think DSLs are important for and why you chose to implement DSLs the way you do it and maybe also explain as an overview what the way actually is, you choose to implement DSL's. Okay well I think the term Domain Specific Language has been around for a very, very long time and I think that's its kind of been forgotten at various stages, what's interesting is that we are going through a resurgence in use of the term and if you go out to the developer on the street and say DSL, one of the things you are going to get back a lot now is talking about the use of Ruby, the way that they are doing is sort of inline DSL's, some function calls using some of the little block structures they have to get things going. I think that's a perfectly valid use of DSL. However, I don't think that it's necessarily using the concept to its fullest potential. So what I would like to see with DSL's is people using some fairly rich languages, new languages, new syntax to express things that would be very difficult to express before. So it's more than just hacking around a little bit with some programming language stuff that you already have. It's very much about creating something new allowing you something that you couldn't do easily before. And that's primarily, syntactically something different or -- I think that if you want people to have a good chance of expressing something very useful they couldn't do before, it does involve new syntax. Yeah, I mean the Ruby folks, and we will talk to Obie Fernandez about this stuff in a subsequent episode, that if you'd have embedded syntax in or embedded DSL's in Ruby the freedom we have for adapting a syntax is limited and I guess that's one of the focus points where you think you would try to do better on that. Yes absolutely I think what they are doing is they are making very good use of things they already had but of course that only gets you so far and that may be all you need but if you restrict yourself to that you are very much constraining the possible universe of things that you can express. Okay so if you try to go back to what a DSL actually is, I think you have this nice preface stuff from this guy called Hudak, I didn't know him personally or actually didn't know this name. So does it make sense to briefly go over these characteristics of DSL's and show what you are disagree. Yup I think it's a very good thing. So my thinking behind this is although the term DSL or, maybe not always the term, but the concepts have been around for a long time. There is some very nice pair of papers from 1996 from a chap called Paul Hudak and I apologize if I pronounced his surname wrong, I've never met him either, from the university of Yale and what he did at that point was nailed down what it is that's interesting about DSL's, some of the relevant issues and some very high level points. So maybe I can go over those. One of the things he was saying is that outlining what a DSL should be and he has as several points. I think the ones that are the most relevant as he was saying, a DSL is an abstraction relevant to a specific domain. In other words you take a particular problem and you say well here is how I can express it in the normal programming language. I have to do this function code and this function code that's very low level, how can I move it up a level? Then you have a DSL. He also said DSL's need to lead to demonstrable increase in productivity in the sense that if it would still take you as long to do what you wanted to do as you could do before, you haven't really gained anything. You have to get something useful from productivity, from using a DSL for it to be useful. Its also basically saying that you want to try to make it no more difficult than using a normal programming language. In fact if you are lucky you might be able to make things more accessible than a normal programming language. Which is the idea of having non-programmers, program using a DSL; at least that's kind of the holy grail? Yes, that's the probably unachievable holy grail and in the end I think you have to be realistic about that these things but the more that you can lower the barrier to entry is a good thing and I think the final thing that Hudak outlined is that people who are using DSL's they need to be able to write their solutions in the DSL's fairly easily and they need to be able to maintain them. Well that's something that is rather critical you don't want to write a DSL on Friday afternoon in the office, come back on Monday and think "What the hell does this do? I can't understand what I've written." So you are kind of advertising the idea of integrating DSL's into a host language, which is just for our listeners, a different approach compared to what we talked about in our Model-Driven Development episodes and it's comparable to what the Ruby folks do and so why did you think its useful and important to embed it in the host languages? And what are suitable host languages? Okay it's a very good question. I think the reason that you want to embed things in the host language is the alternative is making standalone DSL and to me the best example of that is make. Most people are going to be familiar with make I guess who listen to this and make is a great tool. I still use it, as I am old fashioned. But it's also very complicated; it has own grammar. So someone somewhere, some poor person has implemented very complex tool, they have gone away and created the grammar using lex and yacc. You can always feel make is a sort of little virtual machine in a sense to make files. It's a complex standalone application and you can't reuse any of it and it has to have all this error handling and reading in files all these things at the low level. So I think the reason one would like to embed DSL in a programming language, is maybe we can get some of this low level stuff that's not very relevant to the problem we are trying to do, maybe we can get the host language to do it for us. For example, expressions could be kind of an obvious candidate for that? Yes and also you hopefully will get simple things like simpler input/output you would be able may be to use things like the error messages. There is a whole host of the things that you hopefully pickup for free. Now exactly what you consider to be a good host language is dependent on opinions. So Hudak for example thought that everything should forceably be in Haskell and although I have nothing personally against Haskell, I'm not sure this is an entirely realistic thing to expect of people. So I think that the sort of host languages that you need, as a bare minimum, they have to be very flexible host languages. Now I also furthermore believe that really that means you need to be able to extend their syntax and the fact that the language like a Ruby thing is very flexible language and like a Python. It's not quite the same thing as a sort of a Lisp like language perhaps, which suddenly you can start extending it in ways the original design has never thought of. That's maybe because the Lisp language basically has no syntax so its easy to it extend it by -- its trees, lists. Absolutely and so Lisp is not a good candidate because no one is going to use it. That's the odd point yes. And it's so minimalistic that you can do anything you want with it, but it doesn't give you enough at the beginning to start with. So that's a non-starter, so what you are really looking for is this sort of mix I think all of what I would call a modern programming language something that looks like a Python or Ruby or Java or even a C++, whatever you consider that a normal developer would be comfortable with, with some of this flexibility that goes back to the Lisp sort of days. So what you did then is kind of to invent your own language called Converge, I guess you will talk about it in a minute and specifically this language supports as I understood a macro system that supports DSL's. So I think its worth talking about this one. Okay so Converge is a language, a new language. It looks initially a lot like Python to people; I think that's what people will first notice about it. It looks like Python that's been a bit disguised by somebody and then on top of that it has this compile-time metaprogramming stuff. Now compile time metaprogramming is a very unwieldy term, but basically it means it has a macro system and the macro system thing that it has is very heavily influenced by that in Template Haskell, so for anyone who thinks that I'm bashing Haskell earlier, there are some genuinely good ideas that one can pluck from these languages and so what Converge has is three language features it needs to integrate a compile-time metaprogramming facility into it: You have to have a macro core. Now that's a very standard and easy thing. It goes all the way back to Lisp and then the next thing that it has is something that has stopped any language since Lisp having a decent macro system effectively and that is you need to be able to build up abstracts syntax trees. So let me just paddle back one step to make sure we don't lose our listeners: the idea of compile-time metaprogramming is basically that you have a way of adding source code to the program that is not evaluated at run time but rather during the compilation process and the result of evaluating this piece of source code is a syntax tree that is then integrated into the byte or machine code that the complier creates, right? Yes absolutely so the trick is with this compile-time metaprogramming if you are interacting with the complier, the compiler goes around compiling a file, much a normal complier does and it hits something that's its called a splice in Converge but you can just call it a macro call. That's fine. And when it hits one of those it says "Oh" and it effectively evaluates that expression at compile-time and that expression has to return an abstract syntax tree which is basically an object -- Object graph. Yeah representing some code and its relatively well known how to do the this macro call, this splice thing and Converge has some neat little tricks that actually make it rather easier to use than even say something like Template Haskell but it's fundamentally a relatively well solved problem -- the trick is how you make these abstract syntax trees. Yeah and just again to relate this to, for example C macros it's bit different because a C macro is evaluated on source level so it returns kind of a piece of text that is replaced with a macro call right? So yes, I mean, the C macro system and I happen to enjoy C macros because they're kind of fun in their own way, a twisted little way, but they are a very much double barrel shotgun and you can't really control when you pull the trigger, the bullet can go anyway. Yeah there is no typing, no error checking It's basically just text expansion and really the C macro system, you can use it in any language. The C pre-processor doesn't really know anything much about C at all. The point to these more advanced macro systems is that they are very well integrated into the language. They can make certain safety guarantees that for example in a C like macro system or any templating type or something like that. You've got the chance of variables from two different things conflicting with each other and the pre-processor in C can make no guarantees about that. People go through all sorts of hoops adding extra brackets and crossing their fingers that nothing goes wrong. Whereas in Converge a lot of this stuff is handled for you, you don't have to worry about, well you don't have to worry as much about shooting yourself in the foot. One of the tricks of Converge is then I guess that you can program the code that creates the AST in the same language as, also in Converge, right? Yes so it's a completely homogeneous environment. Right, yeah. So during the compilation you probably, actually run some kind of virtual machine that interprets the stuff and -- So the Converge compiler is written in Converge. So the compiler is running in the virtual machine so when it comes across one of these little spice things it just temporarily compiles a temporary module and injects that in the virtual machine its running in anyway. So there is a complete interaction between the complier, and the splice code and that's rather important because it allows the user code to interact with the compiler, get it to do things on its behalf, ask it questions and that sort of thing. Okay so in addition to the splice construct I think there are two others. One is the quasi quote and the other is the insertion. Can you explain them briefly and then we will go on and show how they fit together. Well quasi-quote is, that name I am not quite sure how they came up with it, is from the Template Haskell people ,is the mechanism that allows you to build abstract syntax trees and the way it does it is incredibly intuitive; they solved the problem that no one had solved since Lisp and what you do is you surround the normal expression that you write in the programming language with some funny brackets, the square bracket and a vertical bracket on either side and so you have an expression like two plus three surrounded by these funny brackets and that evaluates to some objects which are the abstract syntax tree. So in that case the outer most object is going to be a plus object on the left hand side, we'll will say, integer and then we will say two and on the right hand side it will say integer three and so suddenly you can create these very abstract things that are called abstracts syntax trees, unsurprisingly, using the normal concrete syntax that you familiar with. Instead of writing code that says new AST note -- blah, blah, blah, .plus add child, new AST note in that or -- blah, blah You have hit the nail on the head as to why these systems have always failed horribly before, because that is a completely unscalable approach. Absolutely yeah, that's also just to divert quickly this is the reason why in model driven development you don't use a model to model transformation to create source code because that would go back to creating instances of the source notes of the code notes, its kind of AST of target language and it's just horrible. Yes, absolutely and so finally there is this, the third feature is by far the least complicated -- it's called insertion in Converge; its called something rather different in Template Haskell and basically it allows you to merge ASTs together. So you can build AST's off in chunks and then nicely and using the normal syntax merge these things together. Okay so that allows us basically to hack the stuff a compiler does by modifying the output AST, it creates, based on code that looks just like the other code in our source code but that's not yet DSL. So where does, how does this relate? What else is there in Converge to make it a DSL friendly language? And how does this look like? Yeah you are very much right in saying that that's not all that you need for DSLs and in fact I think what's worth thinking about here is why would other languages that have one of these macro facilities, well Lisp has one because it doesn't have anything enough built-in. Nice to have a macro language to try to build you extra complex things on top. C has one because C is a very annoying language to program in. You have to have this pre-processor to make it at least somewhat palatable. The compile-time metaprogramming in Converge on its own it's a fairly rich language and has all the things that you expect. So really its all put there to enable DSL's and it turns out that to have DSL support that is where you can have a little bit of your source code that's a completely arbitrary syntax of your choosing you need one extra language feature and that's called the DSL block and it looks rather like the splice, a normal splice in the compile-time metaprogramming sense and that single feature allows you, which is really only about ten lines of code in the complier. It's a very simple layer on top, certainly allows you to embed these completely alien looking things and make the whole facility underneath worthwhile. So, you say that there is this piece of Converge code and then there is some kind of marker that defines what kind of DSL -- just like maybe in HTML where you have the script and then type equal Javascript or something and then there is this other code and the trick then is that this code is processed during compilation. It's pretty much the same thing and obviously there is a lot more to this that there is in the example of the outline. We will talk about that in a minute. Yeah but the principle is the same. Okay, lets talk about an example and the example you talked about in the talk this morning and again we are recording this at the JAOO conference in Denmark in October what is it? 2006, the example you showed there is a time table something to describe train time tables, so what do we have to as a DSL developer? What as a DSL user, as a guy who extends the complier, what do we have to do to make this work? Okay well the first thing that you have to be aware is that DSL blocks, this DSL input that the user writes is just really a random string the complier doesn't know anything about it. So the DSL implementation author, the person defining the DSL, you have to do several things. The first obviously is you have to design a language of some sort because the input must be coming into a format that you have told the user to conform to. So grammar definition or what? Probably. It's a free country, you can do it however you want. I would very much suggest that you do it in a traditional way where you define a grammar and parse it and then you get a parse tree and you write a little translator thing that takes in a parse tree and returns a Converge Abstract Syntax Tree. So can you maybe just for our not so language savvy listener subset explain the difference between a parse tree in language and AST? I can indeed. So parsing is a very fun activity, shrouded in mystery and it actually could be much simpler than it often appears. Basically it's a two stage process. The first thing that happens is you take your input and you tokenize it or lex it depending on how you like call it. And all that's doing is splitting it up into words so its like taking a random European language and saying "Well, whenever I see a space, that delimits two words and a full stop similarly." Then you take all these words that you have tokenized and then you try and actually make sense of them in terms of the grammar. So if you make a natural language example, you are trying to say "Okay, what's the verb in my sentence, what's the nouns?" and so on and then you are trying to make a tree structure out of that. It allows you to determine how the sentence that's come in is structured, at this point all we have done is determined what the user told you. So that's the parse tree? That's the parse tree and then you are converting that, however you want then into the abstract syntax tree and there may be a very close relation between the two but your translation, you are effectively writing a Mini Complier. Right, and the abstract syntax tree is basically an instance of it, has to be in terms of the host language, right? So you are converting your domain specific language syntax, parse tree, into a set of constructs, objects instances of the AST types of the host language of Converge? Yes, if its easier to think of, you can think of it, I think, without objects. You can imagine it just as a textual templating approach if you think of a sort of model to text translation, effectively the abstract syntax tree is like the template text. But you don't do it with templates; you actually in fact technically you do it by instantiating the AST object. Yes, because you get an awful lot of benefits that way but it's somewhat, sometimes a slightly hard concept to get hold of. So if you move in your mind from template to text to nice objects that do lots of other things for you then you are on the right path. Okay. One thing I would like to talk about before we move on in how to implement a DSL obviously it should have become clear by now we are talking about textual DSL's. There is no way of integrating some graphical notation because the editor wont make that you could maybe do some ASCII art and why not maybe for tables. So what about parsing? Do I have to write my own parser to build the parse tree? Okay. So parsing is one of these things I think I said early is absolutely shrouded in mystery. It goes back to the fact that it's in the mid 1960s that the theoretical computer science people already started looking and thinking. Well you know we doing a lot of this programming stuff now we have different programming languages, how we going to make sense of this, how we are going to come up with nice ways of understanding what the user's told us and of course computers at the time were incredibly slow. So they invented some very limited algorithms that can't parse a lot of languages. It would be a bit like saying I have a way of parsing natural language. It works very well for Luxembourgish. And Subject, Predicate, Predicate object kind of complexities. Yeah and its not going to work for English, or French, or German and you think "Well that's not entirely useful, is it?" It's very limited and we are actually still stuck with this today. So people have very negative opinion of parsing because we are stuck with weird terms like LR, LL, LALR; none of these terms we should be forced to know. They really just tell us about the limitations of the underlying algorithm.The good news, not that well known, so I hope that that I am spreading the good news gospel here, is that there are much better algorithms out there that allow you to parse any context free grammar in any same programming language and C++ is excluded from this. It's context free, which means you can parse nearly anything that you can imagine. So if you're a DSL author, what you have to do is say okay there is a parsing algorithm built in to convert, its called an early parser, it can do anything that I want really. I just have to write a little simple grammar definition and unlike using Lux or yacc or, JavaCC or those sort of things I can write it in a way that makes sense to me as a human and I am not subject to the whims of the parsing algorithm. So you said before that the user has to come up with some way of transforming the string into a parse tree and then in an AST but basically what you say that there is a built in library that helps you do this if you provide an early grammar and then you get all the magic. So that the function to implement the DSL parsing is a couple of lines calling to library I guess. Absolutely, it's one of the fundamental philosophies that I've had when creating Converge is that DSL's follow a fairly standard sentence sequence of steps and one of those is every DSL, every sensible DSL is going to parse some text, get a parse tree and then translate that. So why not have a single function that takes this in and returns the parse tree and that's exactly what there is there. When you write one of these little DSL implementation functions that takes in the DSL blocks. So basically the handler function for a particular kind of syntax? Yes, exactly you say "Okay, well here I am going to hand off to another function, here is the input I got from the user, here is the grammar and here are some extra keywords that I may need to know about and that's it. Please give me back the parse tree and it does." Yeah, so that the manual work, tedious parsing stuff has gone away and then the grammars you need to write are basically some kind of EBNF like thing where you define keywords and token stuff. Yeah, it's a fairly standard EBNF type thing of course, going back to what I said earlier it's a pure EBNF in the sense that's not like a yacc sort of thing where, if anyone ever had the misfortune to use yacc, you write what looks like a perfect EBNF notation, you run it through yacc and it says "shift reduce error," what the hell is a shift reduce error? Why do I care about it? None of that sort of stuff. You just put your normal EBNF notation and it does it. Yeah, but you don't add any side effect declarations, this stuff you do, this in a separate function that's actually the transformation from the parse tree to the AST. So in a lot of parsing tools as the parsing algorithm is parsing text you are also doing side actions but a lot of those are because of this sort of limitations again of machines 40 years ago where they couldn't hold things in memory. Nowadays it's much easier to parse the whole text -- Build the whole thing Exactly. Build the whole tree in Converge, those are represented as nice simple lists and then to go over that fully created tree and do things then and that makes an lot of things awful lot easier. It's kind of like SAX vs. DOM in the XML processing kind of -- This is this is very much a DOM view of the world and not a SAX view of the world, yes. Yeah, so then you write probably functions that transform the parse tree elements into objects in the AST and obviously you write these functions in Converge? Yes. And probably the nice thing is that, there you probably then use this splicing stuff to actually create the objects for the AST which is actually quite nice. Yes, so again, this is very much this idea of an homogeneous environment that's a translator, and I use the word translator because I don't necessarily like to use the term "mini complier"; it frightens people and that really, a lot of the hard work is taken out of it and it's not frightening like compiler or so and Converge, using a very, very simple framework and basically you write one function, per rule, your grammar and that takes in a little bit of a parse tree and as you said creates the objects for an abstract syntax tree and you just, using those three compile-time metaprogramming features we talked about earlier: the splice, the quasi-quote and the insertion, in the insertion that's it -- everything is built after those. Okay, so lets shift gears a little bit and talk about error reporting and this is an interesting topic because some of you listeners might have heard about using C++ templates for compile-time metaprogramming which first of all, it looks awful and second, one thing that is really cool about it, if you do something wrong you get those error messages with an error message is 25,000 characters long because some identifier has been built up through concatenating the recursive template evaluation. So that's a big problem. So one thing you talked about in your presentation is that error reporting is something you specifically take care of in Converge; I think its worth talking about. Yeah, well it was one of those things, I mean, it's obvious in the sense that error reporting is important. No one in their right mind is going to say its not that important but I mean when I delivered the first version of Converge it had some nice-ish error reporting things but it just didn't seem enough and what became clear to me is that both the person who writes DSL, the implementation author, and the people who use it require good error reporting and generally neither side got good error reporting. At best, maybe one or the other did and those systems were few and far between. So it became very clear to me that if you want to write to a good DSL well even if it's good, it's still going to have problems and I've never written an error free line of code yet in my life and I don't expect there are many people out there who have. So you have got to come up with a way that you can report errors to both the implementation author and the user. And that is particularly tricky because if you consider for example again C macros, if the complier reports an error then of course the error is reported based on the code that is there after the macro has been expanded but if the developer looks into the source code, of course the macro is there. So it's very hard to actually find out what's happening and I guess it's kind of the same problem here because the error is reported based on the abstract syntax tree that has been created from the transformation instead of what the DSL input was like. Well that's one of the things that I set out to tackle and I think that I have come up with a solution that seems very neat and seems to tackle the whole problem, as it were. So the basic philosophy is this: there is a concept in Converge, it doesn't really matter what it's called, called a "source info" and basically it tracks the location of some source code from the point that it's parsed until the point that it's converted into a byte code instruction run in the virtual machine. So, like the traceability stuff the MDA folks talk about? Yes, it is very similar to that but it's very different than the way that these things are normally done in the programming languages, normal programming languages, if an error happens at run time you get a little report saying at line such and such this error occurred but there is very much the idea that each error is just associated with one line. In Converge -- because when an error happens it might be related either to the user's DSL input or the translation the DSL implementation author created, error reports can be associated with more than one source location. So you can get an error and it says "Well, I got to this point in the stack frame and this is associated with two source code points or three or four" because you can layer DSL's and then if you are a DSL user... So you said you can layer a DSL's, which means you can embed DSL's in other DSL's. You can indeed. Nice. So that means again you have to take a dramatically different approach because if error reporting was just based on reporting some weird part of the C++ compiler string's concatenation, well you won't even be able to work out which layer of your embedded DSL the thing occurred it, let alone any more detail. So here the error report comes back and says well the error is related to the third line in your DSL input as a user and in this line in converge source code which was the translation class. So both the user of the DSL and the implementation author can track down the error and work out which one of them is responsible. Another thing that's maybe interesting is that you talked about before that having DSLs integrated into host languages is particularly useful because you can re-use parts of the host language and one specific aspect is probably the expression language, so am I right, there? You are very right because one of the things that again going back to this nice chap Mr. Hudak noticed is that DSLs always tend to evolve their requirements and one of the things that he put his finger right on the money here is that as they evolve, they always tend to evolve features the people who wanted to get in from programming languages. In other words you start with a small DSL and then you think I really need a for loop in here, oh then I really need an if statement and then something that looks like a function call. They always start to create features of that ilk. So if you are going to add in an expression language you don't want to do what most DSL's do which is put in a badly designed one that's been created from scratch I mean badly implemented, badly debugged and tends to fall over in horrible ways. So Converge allows you to embed its own expression language within DSL's that way you are taking advantage of the fact that somebody, in this case me, that designed, hopefully designed well, but that's another issue, designed and implemented and tested and debugged that thing and in fact you can add this expression language into your DSL, its two lines of code for a DSL implementation author. So suddenly rather than having to write hundreds and hundreds of lines of code to make their own one, that's basically a bit crappy, that you use someone else's for two lines of code. So these two lines of code in your DSL probably grammar or -- is it in the grammar or you say it or does it say you can use any kind of Converge code here or is it only you can use expressions or you can only used function calls or can you restrict what you can use from those language? You could in theory embed any part of Converge you wanted. In practice I would very much suggest that you stick to just embedding the expression language, fortunately the expression language in Converge includes things like function definitions because it's an entirely expression based language. So functions evaluate to functional objects that you can then call. So with that you can probably do nearly everything you're likely to want to do and if you are particularly masochistic maybe you can do some other things as well. Okay so maybe not yet directly wrap up, but to summarize what we talked about, one thing you kind of emphasize is that you don't just have a language but rather a kind of little mini process that kind of suggests people -- who are in free country and everybody can do what they want -- but suggests to people a process of how to build the DSL and as part of Converge. Yeah I think this is something that's very important. One of the problems if you do a stand alone DSL, you go out and make something that looks a little bit like the make program. You have to start from scratch, there's no obvious order in which you need to do things and so the problem is you have a slightly random sequence of steps every time you do it and as we all know when you do a slightly random sequence of steps the result might be rather random as well. So Converge has a sort of implicit process as I think of it and the way that it does it, is really there are several features built in languages say well if you do this step then you might as well do this step next and it really starts off with the fact as I said earlier you can use the normal Converge tokenizer to tokenize things. Now, you can do write your own tokenizer if you want but don't -- I mean we are pretty -- Why? I mean what's the point? Exactly, so if you start with that and then you write a grammar those are steps, one and two, then you write a translation class and then you test and debug it then you deploy it and each one follows very naturally from the other and you don't have to put a great deal of brain power in thinking "how am I going to create this thing," that's pretty much mapped out for you. So and deploying means you can wrap this as a kind of library and make available to your users or... Oh well, deploying, it means whatever you want it to mean that's the one thing that's -- it doesn't really give you any great handle with, because deployment means shoving things off to different peoples, different architectures. Okay but is there a way of packaging up such a DSL definition? There is, because one of the things I haven't mentioned is Converge is a little unusual, if you think it's just a Python style language, it's a bit unusual in that every single file is compiled to byte code file and all those are linked together into a binary file and obviously it's not machine code executable but it could as well be. So you can just ship that single linked executable off to somebody and if they've got the Converge virtual machine at the other end they can just run it. Okay, so you are from a university I think we didn't talk about that before and you are from Kings Collage in London, I got that right. Tomorrow, I interview someone from University College this morning and University College is seems like the biggest competitor. So has this stuff actually been used or is it a purely research prototype thing? Oh, it's a very good question and yes I'm from Kings. The way this started was I was involved in a research project and we were working with industry and we are working, in fact, all to do with models, UML, and so on, and we were creating DSL's and we were really in trouble trying to create these things rapidly. So I developed all this stuff trying to find a way around my problem and I kind of did it in isolation which is the standard academic way of doing things and then after I had a semi useful product I shipped it off to some of my industrial collaborators and they looked it and -- it looks like it could be vaguely useful. They put a couple of people on and gradually there is been handful of people who have been creating real DSLs. So I have created a couple of fairly decent sized ones mostly to do with model transformations. I did a system, it has three interlocking DSL's: one to do with making little graphical languages, one for model transformations and they related and those were some thousand lines of code; it's a not a big system but if you tried to do that in the traditional DSL creation mechanism it could be very difficult and the industrial collaborators consultancy services who have funded a lot of this research have gone off and done some really interesting DSL work of their own and of course they are not academics, they have chosen a whole load of very different things than I would ever have thought were practical of doing. Okay, so before we wrap up do you want to kind of give us a couple of points wisdom that you learned from your DSL academic creation experience that people should consider when building their own DSL infrastructure and maybe, except from using Converge. Well obviously, it would be very nice if people use Converge and that's an almost guaranteed route to success. I think there are some things that I have noticed: the first thing is that if you use an approach like Converge uses you get some fairly easy to use DSL's. In other words if someone can program, they don't have to know anything actually to use the DSL, the barrier to entry here is very low. You mean that's from a DSL user's point of view. Yeah DSL users perspective exactly. Because it fits in nicely into a generic programming environment. Exactly, so the barrier to using these things is very low and that's very important. However someone still has go away and design these languages and design is a matter of taste and some people have good taste, some people have bad test, some people have no test at all and you are therefore alone and whoever implements the DSL, designing a good language there's nothing Converge or anything else can do to force good design in the end. Right yeah. One of the things that I think I have become particularly aware of is, where the advantage comes in an a DSL and what's become clear to me is something that in a sense is very obvious. If you think of textual regular expressions there are ways of expressing really complex constraints over text and we use them on the command line on, graph, or perl or whatever and in a handful of characters you can express something that if you had to write a program, would take sometimes pages to write and involve horrible things like back tracking. So you get a lot of the benefit from DSLs when a bit of terse syntax allows you to express something very complicated. That said, if you just add new syntax to your DSL willy nilly you end up in all sorts of problems because it makes the things rather complicated to use and also once you have added syntax to a language you can never take syntax away, the users become very attached to syntax. So its better to start off in a cautious fashion I have realized -- Make the language as small as possible. Start small, always start small, you cant predict which way these things will go and also another little psychological oddity of DSL users if you give someone a library they might say I don't like the name of that function or some very minor comments but you don't tend to get that much push back. When you give someone a language, they suddenly feel as if they have the right to all sorts of opinions and you get back all sorts of feedback that you have to take account of. So the less you give them to begin with the less you end up having the unpick and you can't predict what they will say, that's the one thing I am sure of. One point I would like to touch on briefly at the end is that the issue of editor support. I mean, one reason why Ruby can be used for embedded DSL's is first of all there is a quite flexible syntax and second is its not very strongly typed, actually, its not at all statically type. So you don't have the typical editors for code completion, syntax highlighting and all that stuff. So I guess you somehow have the same issue here so if you have a piece of Converge code, regular Converge code, you might have an Emacs, whatever its called, configuration to do syntax highlighting but you can't easily, I mean, it's conceivable but its additional work to do to, actually make their DSL specific code completion syntax highlighting stuff; that's not the scope of the project at this time? It's certainly not something that I would personally be able to get involved in mostly because I am very old fashioned and primitive person. On some days I will program in five to six different programming languages so I use a bulk standard text editor, but I do understand that people do want this rich text editing support and I very much support that; code completion is very nice. And in the end this takes us to the intentional programming folks, right, with whom we might talk at some later stage, maybe, I don't know but so they have this full support including editor, debugger and stuff so what about debugging in your case? There is no built in debugger in Converge; again, I am little bit old school. Printf was given to me and it's done me since I found out about it. But going back to the editing issue I think one of the good things about this, is the barrier to entry here is very low if say -- I said to you when you create your DSL you will also have to tell me how to do the completion or something and there are approaches which force that and then you suddenly raise a barrier a lot. Yeah it could be optional right? Yeah but one of the things that Converge has going in its favor although it is like Ruby, a dynamically typed language, its slightly more static; things like variable references and multi references are entirely determined at compile time so actually you can probably make a much better stab at completion then you can in similar languages and I think if you design your DSL's in the correct fashion you can make a similar good stab at those but as you said that's not my main focus. Okay is there anything else you want to say at this time? Well, obviously, thank you very much for the interview and Converge -- Yeah thank you. Well Converge is still in its early days. I wouldn't want anyone to bet their business or their life on it but it is something I would very much encourage people to go and have a look at to give comments I think it's a really nice way of developing DSL's. It gives us something that we didn't really have before, a very nice mechanism. So please feel free to go to the website and download the version its convergepl.org And we will put in the show notes. Thanks very much. Thank you and I think we are going to some party at the conference now so we have to wrap this up, we have ten minutes left. I am sure it will be a wild time. Thank you. Yeah okay, bye. |