A Domain-Specific Language is a programming language that mimics the terms, idioms, and expressions used among experts in the targeted domain. Code written in a DSL reads like structured prose for the domain. Ideally, a domain expert with little experience in programming can read, understand, and validate this code. Sometimes, a domain expert might be able to write DSL code, even if he isn’t a professional programmer.
DSLs are a large topic. We’ll only touch the surface of DSLs and Scala’s impressive support for them. For more information on DSLs in general, see [Fowler2009], [Ford2009], and [Deursen]. The basic build tool we used for the book’s examples, sake
, uses a DSL similar to the venerable make
and its Ruby cousin rake
. (See the README in the code download archive for details.) For other examples of a Scala “internal” and “external” DSLs, see [Ghosh2008a] and [Ghosh2008b]. For some advanced work on DSLs using Scala, [Hofer2008] explores polymorphic substitution of alternative implementations for DSL abstractions, which is useful for analysis, optimization, composition, etc.
Well-crafted DSLs offer several benefits.
However, DSLs also have several drawbacks.
However, when a DSL is appropriate for an application, e.g., when it would be used frequently to implement and change functionality, a well-designed DSL can be a powerful tool for building flexible and robust applications.
From the implementation point of view, DSLs are often classified as internal and external.
An internal (sometimes called embedded) DSL is an idiomatic way of writing code in a general-purpose programming language, like Scala. No special-purpose parser is necessary for internal DSLs. Instead, they are parsed just like any other code written in the language. In contrast, an external DSL is a custom language with its own custom grammar and parser.
Internal DSLs are easier to create because they don’t require a special purpose parser. On the other hand, the constraints of the underlying language limit the options for expressing domain concepts. External DSLs remove this constraint. You can design the language any way you want, as long as you can write a reliable parser for it. The downside of external DSLs is the requirement to write and use a custom parser.
DSLs have been around a long time. For example, internal DSLs written in Lisp are as old as Lisp itself. Interest in DSLs has surged recently, driven in part by the Ruby community, because they are very easy to implement in Ruby. As we’ll see, Scala provides excellent support for the creation of internal and external DSLs.
Let’s create an internal DSL for a payroll application that computes an employee’s paycheck every pay period, which will be two weeks long. The paycheck will include the employee’s net salary, which is the gross salary minus the deductions for taxes, insurance premiums (at least in some countries), retirement fund contributions, etc.
To better understand the constrasts between code that makes use of DSLs and code that does not, let’s try both techniques on the same problem. Here’s how the paycheck might be calculated for two employees, without the help of a DSL.
// code-examples/DSLs/payroll/api/payroll-api-script.scala import payroll.api._ import payroll.api.DeductionsCalculator._ import payroll._ import payroll.Type2Money._ val buck = Employee(Name("Buck", "Trends"), Money(80000)) val jane = Employee(Name("Jane", "Doe"), Money(90000)) List(buck, jane).foreach { employee => // Assume annual is based on 52 weeks. val biweeklyGross = employee.annualGrossSalary / 26. val deductions = federalIncomeTax(employee, biweeklyGross) + stateIncomeTax(employee, biweeklyGross) + insurancePremiums(employee, biweeklyGross) + retirementFundContributions(employee, biweeklyGross) val check = Paycheck(biweeklyGross, biweeklyGross - deductions, deductions) format("%s %s: %s\n", employee.name.first, employee.name.last, check) }
For each employee, the script calculates the gross pay for the pay period, the deductions, and the resulting net. These values are placed into a Paycheck
, which is printed out. Before we describe the types we are using, notice a few things about the foreach
loop that does the work.
First, it is noisy. For example, it mentions employee
and biweeklyGross
incessantly. A DSL will help us minimize that “noise” and focus on what’s really going on.
Second, notice that the code is imperative. It says “divide this, add that”, and so forth. We’ll see that our DSLs look similar, but they are more declarative, hiding the work from the user.
Here is the simple Paycheck
class used in the script.
// code-examples/DSLs/payroll/paycheck.scala package payroll /** We're ignoring invalid (?) cases like a negative net * when deductions exceed the gross. */ case class Paycheck(gross: Money, net: Money, deductions: Money) { def plusGross (m: Money) = Paycheck(gross + m, net + m, deductions) def plusDeductions (m: Money) = Paycheck(gross, net - m, deductions + m) }
The Employee
type uses a Name
type.
// code-examples/DSLs/payroll/employee.scala package payroll case class Name(first: String, last: String) case class Employee(name: Name, annualGrossSalary: Money)
The Money
type handles arithmetic, rounding to four decimal places, etc. It ignores currency, except for the toString
method. Proper financial arithmetic is notoriously difficult to do correctly for real-world transactions. This implementation is not perfectly accurate, but it’s close enough for our purposes. [MoneyInJava] provides useful information on doing real money calculations.
Interested readers might like this reference: http://www.javaranch.com/journal/2003/07/MoneyInJava.html
Thanks for the link, Steve. Will add a reference to it.
// code-examples/DSLs/payroll/money.scala package payroll import java.math.{BigDecimal => JBigDecimal, MathContext => JMathContext, RoundingMode => JRoundingMode} /** Most arithmetic is done using JBigDecimals for tighter control. */ class Money(val amount: BigDecimal) { def + (m: Money) = Money(amount.bigDecimal.add(m.amount.bigDecimal)) def - (m: Money) = Money(amount.bigDecimal.subtract(m.amount.bigDecimal)) def * (m: Money) = Money(amount.bigDecimal.multiply(m.amount.bigDecimal)) def / (m: Money) = Money(amount.bigDecimal.divide(m.amount.bigDecimal, Money.scale, Money.jroundingMode)) def < (m: Money) = amount < m.amount def <= (m: Money) = amount <= m.amount def > (m: Money) = amount > m.amount def >= (m: Money) = amount >= m.amount override def equals (o: Any) = o match { case m: Money => amount equals m.amount case _ => false } override def hashCode = amount.hashCode * 31 // Hack: Must explicitly call the correct conversion: double2Double override def toString = String.format("$%.2f", double2Double(amount.doubleValue)) } object Money { def apply(amount: BigDecimal) = new Money(amount) def apply(amount: JBigDecimal) = new Money(scaled(new BigDecimal(amount))) def apply(amount: Double) = new Money(scaled(BigDecimal(amount))) def apply(amount: Long) = new Money(scaled(BigDecimal(amount))) def apply(amount: Int) = new Money(scaled(BigDecimal(amount))) def unapply(m: Money) = Some(m.amount) protected def scaled(d: BigDecimal) = d.setScale(scale, roundingMode) val scale = 4 val jroundingMode = JRoundingMode.HALF_UP val roundingMode = BigDecimal.RoundingMode.ROUND_HALF_UP val context = new JMathContext(scale, jroundingMode) } object Type2Money { implicit def bigDecimal2Money(b: BigDecimal) = Money(b) implicit def jBigDecimal2Money(b: JBigDecimal) = Money(b) implicit def double2Money(d: Double) = Money(d) implicit def long2Money(l: Long) = Money(l) implicit def int2Money(i: Int) = Money(i) }
Note that we use scala.BigDecimal
, which wraps java.math.BigDecimal
, as the storage type for financial figures.
Deductions are calculated using four helper methods in payroll.api.DeductionsCalculator
.
// code-examples/DSLs/payroll/api/deductions-calc.scala package payroll.api import payroll.Type2Money._ object DeductionsCalculator { def federalIncomeTax(empl: Employee, gross: Money) = gross * .25 def stateIncomeTax(empl: Employee, gross: Money) = gross * .05 def insurancePremiums(empl: Employee, gross: Money) = Money(500) def retirementFundContributions(empl: Employee, gross: Money) = gross * .10 }
Each method might use the employee information and the gross salary for the pay period. In this case, we use very simple algorithms based on just the gross salary, except for insurance premiums, which we treat as a fixed value.
Running the script for the payroll API produces the following output.
(665) $ scala -cp ... payroll-api-script.scala Buck Trends: Paycheck($3076.92,$1346.15,$1730.77) Jane Doe: Paycheck($3461.54,$1576.92,$1884.62)
The above code works well enough, but suppose we wanted to show it to the Accounting Department to confirm that we’re calculating paychecks correctly. Most likely, they would get lost in the Scala idioms. Suppose further that we need the ability to customize this algorithm frequently, for example because it needs to be customized for different employee types (salaried, hourly, etc.) or to modify the deduction calculations. Ideally, we would like to enable the accountants to do these customizations themselves, without our help.
We might achieve these goals if we can express the logic in a DSL that is sufficiently intuitive to an accountant. Can we morph our API example into such a DSL?
Returning to the script for the payroll API, what if we hide most of the explicit references to context information, like the employee, gross salary, and deduction values? Consider the following text.
Rules to calculate an employee's paycheck: employee's gross salary for 2 weeks minus deductions for federalIncomeTax, which is 25% of gross stateIncomeTax, which is 5% of gross insurancePremiums, which are 500. in gross's currency retirementFundContributions are 10% of gross
This reads like normal English, not code. We have included some “bubble” words [Ford2009] that aid readability but don’t necessarily correspond to anything essential, like to
, an
, is
, for
, of
, and which
. We’ll eliminate some of these unnecessary words and keep others in our Scala DSL.
Compared to the version in the payroll API script, there’s a lot less clutter obscuring the essentials of the algorithm. This is because we have minimized the explicit references to the contextual information. We only mention employee
twice. We mention gross
5 times, but hopefully in “intuitive” ways.
There are many possible internal Scala DSLs we could construct that resemble this ad hoc DSL. Here is one of them, again in a script, which produces the same output as before.
// code-examples/DSLs/payroll/dsl/payroll-dsl-script.scala import payroll._ import payroll.dsl._ import payroll.dsl.rules_ val payrollCalculator = rules { employee => employee salary_for 2.weeks minus_deductions_for { gross => federalIncomeTax is (25. percent_of gross) stateIncomeTax is (5. percent_of gross) insurancePremiums are (500. in gross.currency) retirementFundContributions are (10. percent_of gross) } } val buck = Employee(Name("Buck", "Trends"), Money(80000)) val jane = Employee(Name("Jane", "Doe"), Money(90000)) List(buck, jane).foreach { employee => val check = payrollCalculator(employee) format("%s %s: %s\n", employee.name.first, employee.name.last, check) }
We’ll go through the implementation step by step, but first, let’s summarize the features of Scala that allow us to implement this DSL.
Consider this line in the definition of payrollCalculator
.
employee salary_for 2.weeks minus_deductions_for { gross =>
This infix notation is equivalent to the following less-readable form.
employee.salary_for(2.weeks).minus_deductions_for { gross =>
You can see why we wrote 2.weeks
before, because the result of this expression is passed to salary_for
. Without the period, the infix expression would be parsed as employee.salary_for(2).weeks…
There is no weeks
method on Int
, of course. We’ll revisit this expression in a moment.
Method chaining like this is often implemented where each method returns this
so you can continue calling methods on the same instance. Note that returning this
allows those method calls to occur in any order. If you need to impose a specific ordering, then return an instance of a different type. For example, if minus_deductions_for
must be called after salary_for
, then salary_for
should return a new instance.
Because chaining is so easy, we could have created separate methods for salary
, for
, minus
, and deductions
, allowing us to write the following expression.
employee salary for 2.weeks minus deductions for { gross =>
Note that calls to for
are preceded by different calls with very different meanings. So, if the same instance is used throughout, it would have to track the “flow” internally. Chaining different instances would eliminate this problem. However, since no computations are actually needed between these words, we chose the simpler design where words are joined together, separated by _
.
Returning to 2.weeks
, since Int
doesn’t have a weeks
method, we use an implicit conversion to a Duration
instance that wraps an Int
specifying an amount.
// code-examples/DSLs/payroll/dsl/duration.scala package payroll.dsl case class Duration(val amount: Int) { /** @return the number of work days in "amount" weeks. */ def weeks = amount * 5 /** @return the number of work days in "amount" years. */ def years = amount * 260 }
The weeks
method multiples that amount by 5 to return the corresponding amount of work days. Hence, we designed the payroll calculator to work with days as the unit of time. This decision is completely hidden behind the DSL. Should we later add support for work hours, it would be easy to refactor the design to use hours instead.
Duration
is one of the ad hoc types that we designed to encapsulate the implicit context, to implement helper methods for the DSL, etc. We’ll discuss the implicit conversion method we need in a moment.
A number of the implementation objects use apply
to invoke behavior. The rules
object encapsulates the process of building the rules for payroll calculation. Its apply
method takes a function literal, Employee => Paycheck
.
Now let’s explore the implementation, starting with the rules
object and working our way down.
// code-examples/DSLs/payroll/dsl/payroll.scala package payroll.dsl import payroll._ object rules { def apply(rules: Employee => Paycheck) = new PayrollBuilderRules(rules) implicit def int2Duration(i: Int) = Duration(i) implicit def employee2GrossPayBuilder(e: Employee) = new GrossPayBuilder(e) implicit def grossPayBuilder2DeductionsBuilder(b: GrossPayBuilder) = new DeductionsBuilder(b) implicit def double2DeductionsBuilderDeductionHelper(d: Double) = new DeductionsBuilderDeductionHelper(d) } import rules._ ...
The function literal argument for rules.apply
is used to construct a PayrollBuilderRules
that will process the specified rules. It is used at the very beginning of the DSL.
val payrollCalculator = rules { employee => ...
The rules
object also defines implicit conversions. The first one is used by the 2.weeks
expression. It converts 2
into a Duration
instance, which we discussed previously. The other conversions are used later in the DSL to enable transparent conversion of Doubles
, Employees
, etc. into wrapper instances that we will describe shortly.
Note that the rules
object is imported so these conversions are visible in the rest of the current file. It will also need to be imported in files that use the DSL.
The PayrollBuilderRules
is our first wrapper instance. It evaluates the function literal for the whole rule set, wrapped in a try/catch
block.
// code-examples/DSLs/payroll/dsl/payroll.scala ... class PayrollException(message: String, cause: Throwable) extends RuntimeException(message, cause) protected[dsl] class PayrollBuilderRules(rules: Employee => Paycheck) { def apply(employee: Employee) = { try { rules(employee) } catch { case th: Throwable => new PayrollException( "Failed to process payroll for employee: " + employee, th) } } } ...
Note that we protect access to PayrollBuilderRules
, because we don’t want clients using it directly. However, we left the exception public for use in catch
clauses. (You can decide whether or not you like wrapping a thrown exception in a “domain-specific” exception, as shown.)
Note that we have to pass the employee as a “context” instance in the function literal. We said that it is desirable to make the context as implicit as possible. A common theme in our implementation classes, like PayrollBuilderRules
, is to hold context information in wrapper instances and to minimize their visibility in the DSL. An alternative approach would be to store context in singleton objects so other instances can get to them. This approach raises thread safety issues, unfortunately.
To see what we mean concerning the context, consider the part of our script that uses the payroll DSL, where the deductions are specified.
... { gross => federalIncomeTax is (25. percent_of gross) stateIncomeTax is (5. percent_of gross) insurancePremiums are (500. in gross.currency) retirementFundContributions are (10. percent_of gross) }
This block is calculating the deductions and deducting them from the gross pay to determine the net pay. The gross
instance handles this process. There is no “communication” between the four lines of the function literal. Furthermore, federalIncomeTax
, insurancePremiums
, etc. are objects with no connection to DeductionsBuilder
(as we’ll see shortly). It would be great if they could be members of DeductionsBuilder
or perhaps some other wrapper instance enclosing this scope. Then each line would be a method call on one or the other wrapper. Unfortunately, this doesn’t work. Hence, each line must specify the gross
instance to maintain continuity. We jump through various hoops to support the syntax, yet allow gross
to be available, as needed.
So, we contrived the convention that “raw” numbers, like the insurance deduction, have to be qualified by the particular currency used for the gross pay. We’ll see how the expression 500. in gross.currency
works in a moment. It is something of a hack, but it reads well and it solves our design problem.
Here is a possible alternative design that would have avoided the problem.
... { builder => builder federalIncomeTax (25. percent_of gross) builder stateIncomeTax (5. percent_of gross) builder insurancePremiums 500. builder retirementFundContributions (10. percent_of gross) }
Now the fact that a builder
is being used is more explicit and federalIncomeTax
, insurancePremiums
, etc. are methods on the builder. We opted for a more readable style, with the penalty of a harder implementation. You’ll sometimes hear the phrase fluent interface used to refer to DSLs that emphasize readability.
Here is GrossPayBuilder
.
// code-examples/DSLs/payroll/dsl/payroll.scala ... import payroll.Type2Money._ protected[dsl] class GrossPayBuilder(val employee: Employee) { var gross: Money = 0 def salary_for(days: Int) = { gross += dailyGrossSalary(employee.annualGrossSalary) * days this } // Assume 260 working days: 52 weeks (including vacation) * 5 days/week. def weeklyGrossSalary(annual: Money) = annual / 52.0 def dailyGrossSalary(annual: Money) = annual / 260.0 } ...
Recall that rules
defines an implicit conversion from Employee
to this type. It is invoked by the expression employee salary_for
, so the GrossPayBuilder.salary_for
method can be called. GrossPayBuilder
initializes the gross
and appends new values to it whenever salary_for
is called, which assumes we’re adding gross pay in increments of days. Finally, salary_for
returns this
to support chaining.
Deduction calculation is the most complex part. When minus_deductions_for
is used in the DSL, it triggers the implicit conversion defined in rules
from the GrossPayBuilder
to a DeductionsBuilder
.
// code-examples/DSLs/payroll/dsl/payroll.scala ... protected[dsl] class DeductionsBuilder(gpb: GrossPayBuilder) { val employee = gpb.employee var paycheck: Paycheck = new Paycheck(gpb.gross, gpb.gross, 0) def currency = this def minus_deductions_for(deductionRules: DeductionsBuilder => Unit) = { deductionRules(this) paycheck } def addDeductions(amount: Money) = paycheck = paycheck plusDeductions amount def addDeductionsPercentageOfGross(percentage: Double) = { val amount = paycheck.gross * (percentage/100.) addDeductions(amount) } } ...
DeductionsBuilder
saves the employee
from the passed in GrossPayBuilder
, which it doesn’t save as a field. It also initializes the paycheck
using the calculated gross pay.
Note that the currency
method simply returns this
. We don’t need to do anything with the actual currency when this method is invoked. Instead, it is used to support a design idiom we’ll discuss below.
The minus_deductions_for
does the important work. It invokes the function literal with the individual rules and then returns the completed Paycheck
instance, which is ultimately what rules.apply
returns.
Our remaining two methods are used to calculate individual deductions. They are called from DeductionsBuilderDeductionHelper
which we show now.
// code-examples/DSLs/payroll/dsl/payroll.scala ... class DeductionCalculator { def is(builder: DeductionsBuilder) = apply(builder) def are(builder: DeductionsBuilder) = apply(builder) def apply(builder: DeductionsBuilder) = {} } object federalIncomeTax extends DeductionCalculator object stateIncomeTax extends DeductionCalculator object insurancePremiums extends DeductionCalculator object retirementFundContributions extends DeductionCalculator protected[dsl] class DeductionsBuilderDeductionHelper(val factor: Double) { def in (builder: DeductionsBuilder) = { builder addDeductions Money(factor) builder } def percent_of (builder: DeductionsBuilder) = { builder addDeductionsPercentageOfGross factor builder } }
Now we see that federalIncomeTax
, etc. are singleton objects. Note the “synonym” methods is
and are
. We used are
for the objects with plural names, like insurancePremiums
and is
for the singular objects, like federalIncomeTax
. In fact, since both methods delegate to apply
, they are effectively “bubble” words that the user could omit. That is, the following two DSL lines are equivalent.
federalIncomeTax is (25. percent_of gross) federalIncomeTax (25. percent_of gross)
The apply
method takes DeductionsBuilder
and does nothing with it! In fact, by the time apply
is called, the deduction has already been calculated and factored into the paycheck. By implication, the presence of expressions like federalIncomeTax is
are effectively syntactic sugar (at least as this DSL is currently implemented). They are a fancy form of comments, but at least they have the virtue of type checking the “kinds” of deductions that are allowed. Of course, as the implementation evolves, these instances might do real work.
To see why DeductionCalculator.apply
is empty, let’s discuss DeductionsBuilderDeductionHelper
. Recall that the rules
object has an implicit conversion method to convert a Double
to a DeductionsBuilderDeductionHelper
. Once we have a helper instance we can call either the in
method or the percent_of
method. Every line in the deductions function literal exploits this instance.
For example, (25. percent_of gross)
is roughly equivalent to the following steps.
rules.double2DeductionsBuilderDeductionHelper(25.)
to create a new DeductionsBuilderDeductionHelper(25.)
.
percent_of(gross)
method, where gross
is a DeductionsBuilder
.
gross.addDeductionsPercentageOfGross(factor)
In other words, we used DeductionsBuilderDeductionHelper
to convert an expression of the form Double method DeductionsBuilder
into an expression of the form DeductionsBuilder method2 Double
. DeductionsBuilder
accumulates each deduction into the paycheck we’re building.
The expression 500. in gross.currency
works almost identically. DeductionsBuilder.currency
is effectively another bubble word; it simply returns this
, but gives a readable idiom for the DSL. The in
method simply converts the Double
to a Money
and passes it to DeductionsBuilder.addDeductions
.
So DeductionCalculator.apply
does nothing, because all the work is already done by the time apply
is called.
So which is better, the original API implementation or the DSL implementation? The DSL implementation is complex. Like any language, testing its robustness can be a challenge. Users will try many combinations of expressions. They will probably not understand the compiler error messages that refer to the internals we’ve hidden behind the DSL.
Designing a quality DSL is difficult. With an API, you can follow the Scala library conventions for types, method names, etc. However, with a DSL, you’re trying to imitate the language of a new domain. It’s hard to get it right.
It’s worth the effort, though. A well-designed DSL minimizes the translation effort between requirements and code, thereby improving communications with stakeholders about requirements. DSLs also facilitate rapid feature change and they hide distracting implementation details. As always, there is a cost/benefit analysis you should make when deciding whether or not to use a DSL.
Assuming you’ve made the “go” decision, a common problem in DSL design is the finishing problem [Ford2009]. How do you know when you’ve finished building up the state of an instance and it’s ready to use?
We solved this problem in two ways. First, we nested the calculation steps in a function literal. As soon as rules(employee)
was invoked, the paycheck was built to completion. Also, all the steps were evaluated “eagerly”. We didn’t need to put in all the rules, then run them at the end. Our only ordering requirement was the need to calculate the gross pay first, since the deductions are based on it. We enforced the correct order of invocation using instances of different types.
There are cases in which you can’t evaluate the build steps eagerly. For example, a DSL that builds up a SQL query string can’t run a query after each step of the build process. In this case, evaluation has to wait until the query string is completely built.
By contrast, if your DSL steps are stateless, chained method invocation works just fine. In this case, it doesn’t matter when you stop calling chained methods. If you chain methods that build up state, you’ll have to add some some sort of done
method and trust the users to always use it at the end.
add some some <- sort of
Thanks.
When you write a parser for an external DSL, you can use a parser generator tool like Antlr [Antlr]. However, the Scala library includes a powerful parser combinator library that can be used for parsing most external DSLs that have a context-free grammar. An attractive feature of this library is the way it defines an internal DSL that makes parser definitions look very similar to familiar grammar notations, like EBNF (Extended Backus-Naur Form - [BNF]).
Parser combinators are building blocks for parsers. Parsers that handle specific kinds of input, e.g., floating-point numbers, integers, etc., can be combined together to form other parser combinators for larger expressions. A combinator framework makes it easy to combine parsers to handle sequential and alternative cases, repetition, optional terms, etc.
We’ll learn more about parsing techniques and terminology as we proceed. A complete exposition of parsing techniques is beyond our scope, but our example should get you started. You can find additional examples of parsers written using Scala’s parser combinator library in [Spiewak2009b], [Ghosh2008a], and [Odersky2008].
For our parser combinator example, we’ll reuse the example we just discussed for internal DSLs. We’ll modify the grammar slightly, since our external DSL does not have to be valid Scala syntax. Other changes will make parser construction easier. Here’s an example written in the external DSL.
paycheck for employee "Buck Trends" is salary for 2 weeks minus deductions for { federal income tax is 25. percent of gross, state income tax is 5. percent of gross, insurance premiums are 500. in gross currency, retirement fund contributions are 10. percent of gross }
Compare this example to the internal DSL we defined in the section called “A Payroll Internal DSL” above.
... = rules { employee => employee salary_for 2.weeks minus_deductions_for { gross => federalIncomeTax is (25. percent_of gross) stateIncomeTax is (5. percent_of gross) insurancePremiums are (500. in gross.currency) retirementFundContributions are (10. percent_of gross) } } ...
In our new DSL, we insert a specific employee in the script. We wouldn’t expect a user to copy and paste this script for every employee. A natural extension that we won’t pursue would allow the user to loop over all salaried employees in a database, for example.
Some of the differences are “gratuitous”; we could have used the same syntax we used previously. These changes include removing underscores between words in some expressions and expanding camel-case words into space-separated words. That is, we turned some single words into multi-word expressions. We made these changes because they will be easy to implement using parser combinators, but using the same multi-word expressions would have added a lot of complexity to the internal DSL’s implementation.
We no longer need “local variables” like employee
and gross
. Those words still appear in the DSL, but our parser will keep track of the corresponding instances internally.
The remaining changes are punctuation. It is still convenient to surround the list of deductions with curly braces. We now use a comma to separate the individual deductions, as that will make the parser’s job easier. We can also drop the parentheses we used before.
To see how closely the internal DSL for Scala’s parser combinator library resembles the context-free grammar, let’s start with the grammar itself, written in a variation of EBNF. We’ll omit commas to separate sequences, for clarity.
paycheck = empl gross deduct; empl = "paycheck" "for" "employee" employeeName; gross = "is" "salary" "for" duration; deduct = "minus" "deductions" "for" "{" deductItems "}"; employeeName = "\"" name " " name "\""; name = ... duration = decimalNumber weeksDays; weeksDays = "week" | "weeks" | "day" | "days"; deductItems = ε | deductItem { "," deductItem }; deductItem = deductKind deductAmount; deductKind = tax | insurance | retirement; tax = fedState "income" "tax"; fedState = "federal" | "state"; insurance = "insurance" "premiums"; retirement = "retirement" "fund" "contributions"; deductAmount = percentage | amount; percentage = toBe doubleNumber "percent" "of" "gross"; amount = toBe doubleNumber "in" "gross" "currency"; toBe = "is" | "are"; decimalNumber = ... doubleNumber = ...
We can see that most of the terminals (the literal strings paycheck
, for
, employee
, the characters ‘{’ and ‘}’, etc.) will be “bubble” words, as defined in the previous section. We’ll ignore these after parsing. The Greek letter ε is used to indicate an empty production for deductItems
, although there will rarely be no deductions!
We didn’t spell out the details for decimal numbers, double numbers, and allowed letters in the employee names. We simply elided those definitions. We’ll handle the details later.
Each line in the grammar defines a production rule. The end of the definition is marked with a semicolon. A nonterminal appears on the left-hand side of the equals sign. The right-hand side consists of terminals (e.g., the literal strings and characters we just mentioned) that require no further parsing, other nonterminals (including possibly a recursive reference to the left-hand side nonterminal), and operators that express relationships between the items. Notice that the grammar forms have a hierarchical decomposition; although not a directed acyclic graph, as generally speaking, these grammars can have cycles.
We have a context-free grammar because every production rule has a single nonterminal on the left-hand side of the equals sign, i.e., without any additional context information required to specify the production’s applicability and meaning.
Production rules like toBe = "is" | "are"
mean the is
production (a terminal in this case) or the are
production will match. This is an example of an alternative composition.
When productions are separated by white space on the right-hand side of another production, e.g., prod1 prod2
, both productions are required to appear sequentially for a match. (Most EBNF formats actually require a comma to separate these items.) Hence, these expressions are more like “and” expressions, but sequential composition is so common that no &
operator is used, the analog of |
for alternative composition.
The production rule with "{" deductItem { "," deductItem } "}"
demonstrates how to specify optional (zero or more) repetitions. This expression matches a literal { character, followed by a deductItem
(another production), followed by zero or more expressions consisting of a literal comma ‘,’ and another deductItem
, and finally ending with a literal ‘}’ character. Sometimes an asterisk is used to indicate repetition zero or more times, e.g., prod *
. For repetition at least once, prod +
is sometimes used.
Finally, if we had optional items in our grammar, we would enclose them in square brackets, [ … ]
. There are other kinds of composition operators possible (and supported in the Scala library), a few of which we’ll discuss below. See the [ScalaAPI2008] entry for Parsers
for more details.
Here is the parser written using Scala’s parser combinators. At this point, we won’t do anything to actually calculate an employee’s paycheck, so we’ll append V1
to the class name.
// code-examples/DSLs/payroll/pcdsl/payroll-parser-comb-v1.scala package payroll.pcdsl import scala.util.parsing.combinator._ import org.specs._ import payroll._ import payroll.Type2Money._ class PayrollParserCombinatorsV1 extends JavaTokenParsers { def paycheck = empl ~ gross ~ deduct def empl = "paycheck" ~> "for" ~> "employee" ~> employeeName def gross = "is" ~> "salary" ~> "for" ~> duration def deduct = "minus" ~> "deductions" ~> "for" ~> "{" ~> deductItems <~ "}" // stringLiteral provided by JavaTokenParsers def employeeName = stringLiteral // decimalNumber provided by JavaTokenParsers def duration = decimalNumber ~ weeksDays def weeksDays = "weeks" | "week" | "days" | "day" def deductItems = repsep(deductItem, "," ) def deductItem = deductKind ~> deductAmount def deductKind = tax | insurance | retirement def tax = fedState <~ "income" <~ "tax" def fedState = "federal" | "state" def insurance = "insurance" ~> "premiums" def retirement = "retirement" ~> "fund" ~> "contributions" def deductAmount = percentage | amount def percentage = toBe ~> doubleNumber <~ "percent" <~ "of" <~ "gross" def amount = toBe ~> doubleNumber <~ "in" <~ "gross" <~ "currency" def toBe = "is" | "are" // floatingPointNumber provided by JavaTokenParsers def doubleNumber = floatingPointNumber }
The body of PayrollParserCombinatorsV1
looks very similar to the grammar we defined for the DSL. Each production rule becomes a method. The terminating semicolon is dropped, but since the production is a method, it would be valid Scala to leave it in.
Where we had whitespace between each production on the right hand side, we now use a combinator operator, either ∼
, ∼>
, or <∼
. The combinator for sequential composition is ∼
, used when we want to retain for further processing the results produced by both productions on the left and right sides of the ∼
. For example, when processing the paycheck
production, we want to keep all three results from empl
, gross
, and deduct
. Hence we use two ∼
operators.
def paycheck = empl ~ gross ~ deduct
We use another sequential composition combinator ∼>
when we no longer need the result of the production to the left. For example, when processing the empl
production, we only want to keep the parse result for the last production, employeeName
.
def empl = "paycheck" ~> "for" ~> "employee" ~> employeeName
Similarly, we use <∼
when we no longer need the result for the production to the right. For example, when processing the tax
production, we only want to keep the result of the first production, fedState
.
def tax = fedState <~ "income" <~ "tax"
Our heavy use of the <∼
sequential combinator in the various productions related to deductions indicates that we aren’t keeping track of the source of each deduction, just the amount of the deduction. A real paycheck application would print this information, of course. Our aim is for simplicity. As an exercise, consider how PayrollParserCombinatorsV1
and the subsequent refinements below would change if we tracked this information. Would you necessarily keep the parsed strings or track the information some other way?
The “or” case is expressed with the |
method, just as in the grammar.
def weeksDays = "weeks" | "week" | "days" | "day"
The rep
method can be used for zero or more repetitions. We actually use a similar method, repsep
, which lets us specify a separator, in our case a comma ‘,’.
def deduct = ... ~> "{" ~> repsep(deductItem, "," ) <~ "}"
Note that deduct
combines several features we have just described.
Like repetition, there is an opt
method for optional terms, which we aren’t using.
PayrollParserCombinatorsV1
inherits from JavaTokenParsers
, which inherits from RegexParsers
, which inherits from the root parser trait Parsers
. It’s well known that parsing non-trivial grammars with just regular expressions tends to break down pretty quickly. However, using regular expressions to parse individual terms inside a parsing framework can be very effective. In our example, we exploit the productions in JavaTokenParsers
to parse quoted strings (for the employee’s name), decimal literals, and floating-point literals.
Let’s try it out! Here is a specification that exercises the parser for two cases, without and with deductions.
// code-examples/DSLs/payroll/pcdsl/payroll-parser-comb-v1-spec.scala package payroll.pcdsl import scala.util.parsing.combinator._ import org.specs._ import payroll._ import payroll.Type2Money._ object PayrollParserCombinatorsV1Spec extends Specification("PayrollParserCombinatorsV1") { "PayrollParserCombinatorsV1" should { "parse rules when there are no deductions" in { val input = """paycheck for employee "Buck Trends" is salary for 2 weeks minus deductions for {}""" val p = new PayrollParserCombinatorsV1 p.parseAll(p.paycheck, input) match { case p.Success(r,_) => r.toString mustEqual """(("Buck Trends"~(2~weeks))~List())""" case x => fail(x.toString) } } "calculate the gross, net, and deductions for the pay period" in { val input = """paycheck for employee "Buck Trends" is salary for 2 weeks minus deductions for { federal income tax is 25. percent of gross, state income tax is 5. percent of gross, insurance premiums are 500. in gross currency, retirement fund contributions are 10. percent of gross }""" val p = new PayrollParserCombinatorsV1 p.parseAll(p.paycheck, input) match { case p.Success(r,_) => r.toString mustEqual """(("Buck Trends"~(2~weeks))~List(25., 5., 500., 10.))""" case x => fail(x.toString) } } } }
This part of the specification shows us how to instantiate and use the parser.
val p = new PayrollParserCombinatorsV1 p.parseAll(p.paycheck, input) match { case p.Success(r,_) => r.toString mustEqual "..." case x => fail(x.toString) }
The parseAll
method is defined in a parent class. We invoke the top-level production method, paycheck
, and pass its return value as the first argument to parseAll
and pass the string to parse as the second argument.
If the parsing process is successful, the result of the parse is returned as an instance of type p.Success[+T]
, a case class declared in the Parsers
trait. Why is there a p.
prefix? It indicates that p.Success
is a path-dependent type, which we will discuss in the section called “Path-Dependent Types” in Chapter 12, The Scala Type System. For now, just know that even though Success
is defined in the Parsers
trait, the actual type of the instance is dependent on the PayrollParserCombinatorsV1
instance we created. In other words, if we had another parser, say p2
of type MyOtherParser
, then p2.Success[String]
would be different from p.Success[String]
and one could not be substituted for the other.
The Success
instance contains two fields, the result of the parse, an instance of type T
(assigned to r
in the case
clause) and the remaining input string to parse, which will be empty after a successful parse (we will have parsed the whole string at this point). This string is assigned to _
.
If the parse fails, the returned instance is either a p.Failure
or p.Error
, which our example handles with a generic case
clause. Both are derived from p.NoSuccess
, which contains fields for an error message and the unconsumed input at the point of failure. A p.Failure
in a parser will trigger backtracking so that a retry with a different parser can be invoked by the parser framework, if possible. An Error
result does not trigger backtracking and is used to signal more serious problems.
For completeness, Both p.Success
and p.NoSuccess
derive from p.ParseResult
.
We have two big unanswered questions: what do the production methods actually return, and what is the type of the result instance returned in the p.Success
?
The production methods themselves return parsers. Most of them in our example return p.Parser[String]
(again, a path-dependent type). However, because the deduct
method handles repetition (it invokes the repsep
method), it actually returns a p.Parser[List[String]]
. When this parser is used, it will return a List[String]
, with one string corresponding to each match in the repetition.
So, our call to p.parseAll(p.paycheck, input)
above parses the input
string using the parser returned by p.paycheck
. That brings us to the second question, what is the result of a successful parse?
To see what’s returned, compile the PayrollParserCombinatorsV1
file listed at the beginning of this section and invoke the scala
interpreter with the -cp
option to include the directory where the class files were written (it will be build
if you used the build process for the code example distribution).
Once in the interpreter, enter the following expressions after the scala>
prompt. (You can also find this input the payroll-parser-comb-script.scala
file in the code example distribution.)
scala> import scala.util.parsing.combinator._ import scala.util.parsing.combinator._ scala> import payroll.pcdsl._ import payroll.pcdsl._ scala> val p = new PayrollParserCombinatorsV1 p: payroll.pcdsl.PayrollParserCombinatorsV1 = payroll.pcdsl.PayrollParserCombinatorsV1@79e84310 scala> p.empl res0: p.Parser[String] = Parser (~>) scala> p.weeksDays res2: p.Parser[String] = Parser (|) scala> p.doubleNumber res3: p.Parser[String] = Parser () scala> p.deduct res1: p.Parser[List[String]] = Parser (<~) scala> p.paycheck res4: p.Parser[p.~[p.~[String,p.~[String,String]],List[String]]] = Parser (~) scala> p.parseAll(p.weeksDays, "weeks") res5: p.ParseResult[String] = [1.6] parsed: weeks scala> val input = """paycheck for employee "Buck Trends" | is salary for 2 weeks minus deductions for {}""" input: java.lang.String = paycheck for employee "Buck Trends" is salary for 2 weeks minus deductions for {} scala> p.parseAll(p.paycheck, input) res6: p.ParseResult[p.~[p.~[String,p.~[String,String]],List[String]]] = [2.53] parsed: (("Buck Trends"~(2~weeks))~List()) scala>
We import the necessary types and create a PayrollParserCombinatorsV1
instance. Then we call several of the production methods to see what kind of Parser
each returns. The first three, empl
, weeksDays
, and doubleNumber
return p.Parser[String]
.
Note what’s written on the right-hand side in the output for the first three parsers: empl
, weeksDays
, and doubleNumber
. We see Parser (∼>)
, Parser (|)
, and Parser ()
, respectively. The Parsers returned reflect the definitions of the production rules, where empl
ends with a combinator of the form prod1 ∼> prod2
, weeksDays
returns a combinator of the form prod1 | prod2
, and doubleNumber
returns a parser for a single production.
Because deduct
consists of combinators that handle repetition, the parser returned by deduct
is of type p.Parser[List[String]]
, as we stated previously. The right-hand side of the output is Parser (<∼)
, because the definition of deduct
ends with prod1 <∼ prod2
.
Things get more interesting when we look at the top-level production, paycheck
. What is p.Parser[p.∼[p.∼[String,p.∼[String,String]],List[String]]] = Parser (∼)
supposed to mean?? Well, the right-hand side should be easy to understand now; the definition of paycheck
ends in prod1 ∼ prod2
. What is the type parameter for p.Parser
on the left-hand side of the equals sign?
The Parsers
trait also defines a case class named ∼
that represents a pair of sequential rules.
case class ~[+a, +b](_1: a, _2: b) { override def toString = "("+ _1 +"~"+ _2 +")" }
The actual path-dependent type in our example is p.∼[+a,+b]
. Hence, the type parameter T
in p.Parser[T]
is p.∼[p.∼[String,p.∼[String,String]],List[String]]
, which is a hierarchical tree of types.
Let’s break it down, working our way inside out. Note that there are three p.∼
. We’ll start with the inner-most type, p.∼[String,String]
, and map the type declaration to the output we saw in the scala
session "Buck Trends"∼(2∼weeks∼List())
.
The p.∼[String,String]
corresponds to the parser that handles expressions like 2 weeks
. Hence, the instance created when we parsed our example string was the instance p.∼("2", "weeks")
. Calling the p.∼.toString
method produces the output (2~weeks)
.
Working out one level, we have p.∼[String,p.∼[String,String]]
. This combination parses paycheck for employee "Buck Trends" is salary for 2 weeks
. Recall that we discard paycheck for employee
and is salary for
, keeping only the Buck Trends
and 2 weeks
. So we create an instance p.∼("Buck Trends", p.∼("2", "weeks"))
. Calling toString
again results in the string ("Buck Trends"∼(2∼weeks))
.
Finally, at the outer-most level, we have p.∼[p.∼[String,p.∼[String,String]],List[String]]
. We’ve already discussed everything up to the the last List[String]
, which comes from the deduct
production.
def deduct = "minus" ~> "deductions" ~> "for" ~> "{" ~> repsep(deductItem, "," ) <~ "}"
We discard everything except for the list of zero or more deductItems
. There are none in our example, so we get an empty list for which toString
returns List()
. Therefore, calling p.∼.toString
on our outer-most type, the one that parameterizes p.Parser
, returns the string "Buck Trends"∼(2∼weeks∼List())
. We’re done!
Well, not quite. We’re still not calculating an actual paycheck for ol' Buck. Let’s complete our implementation.
As we parse the DSL, we want to look up the employee by name, fetch his or her gross salary for the specified pay period and then calculate the deductions as we go. When the parser returned by paycheck
finishes, we want to return a Pair
with the Employee
instance and the completed Paycheck
.
We will reuse “domain” classes like Employee
, Money
, Paycheck
, etc. from earlier in the chapter. To do the calculations on demand, we will create a second iteration of PayrollParserCombinatorsV1
that we’ll call PayrollParserCombinators
. We’ll modify the parsers returned by some of the production methods to return new kinds of parsers. We’ll also do administrative work like storing running context data, as needed. Our implementation won’t be thread safe. You’ll want to ensure that only one thread uses a given PayrollParserCombinators
. We could make it more robust, but doing so isn’t the goal of this exercise.
Here is our final PayrollParserCombinators
.
// code-examples/DSLs/payroll/pcdsl/payroll-parser-comb.scala package payroll.pcdsl import scala.util.parsing.combinator._ import org.specs._ import payroll._ import payroll.Type2Money._ class UnknownEmployee(name: Name) extends RuntimeException(name.toString) class PayrollParserCombinators(val employees: Map[Name, Employee]) extends JavaTokenParsers { var currentEmployee: Employee = null var grossAmount: Money = Money(0) /** @return Parser[(Employee, Paycheck)] */ def paycheck = empl ~ gross ~ deduct ^^ { case e ~ g ~ d => (e, Paycheck(g, g-d, d)) } /** @return Parser[Employee] */ def empl = "paycheck" ~> "for" ~> "employee" ~> employeeName ^^ { name => val names = name.substring(1, name.length-1).split(" ") // remove "" val n = Name(names(0), names(1)); if (! employees.contains(n)) throw new UnknownEmployee(n) currentEmployee = employees(n) currentEmployee } /** @return Parser[Money] */ def gross = "is" ~> "salary" ~> "for" ~> duration ^^ { dur => grossAmount = salaryForDays(dur) grossAmount } def deduct = "minus" ~> "deductions" ~> "for" ~> "{" ~> deductItems <~ "}" /** * "stringLiteral" provided by JavaTokenParsers * @return Parser[String] */ def employeeName = stringLiteral /** * "decimalNumber" provided by JavaTokenParsers * @return Parser[Int] */ def duration = decimalNumber ~ weeksDays ^^ { case n ~ factor => n.toInt * factor } def weeksDays = weeks | days def weeks = "weeks?".r ^^ { _ => 5 } def days = "days?".r ^^ { _ => 1 } /** @return Parser[Money] */ def deductItems = repsep(deductItem, ",") ^^ { items => items.foldLeft(Money(0)) {_ + _} } /** @return Parser[Money] */ def deductItem = deductKind ~> deductAmount def deductKind = tax | insurance | retirement def tax = fedState <~ "income" <~ "tax" def fedState = "federal" | "state" def insurance = "insurance" ~> "premiums" def retirement = "retirement" ~> "fund" ~> "contributions" def deductAmount = percentage | amount /** @return Parser[Money] */ def percentage = toBe ~> doubleNumber <~ "percent" <~ "of" <~ "gross" ^^ { percentage => grossAmount * (percentage / 100.) } def amount = toBe ~> doubleNumber <~ "in" <~ "gross" <~ "currency" ^^ { Money(_) } def toBe = "is" | "are" def doubleNumber = floatingPointNumber ^^ { _.toDouble } // Support method. Assume 260 (52 * 5) paid work days/year def salaryForDays(days: Int) = (currentEmployee.annualGrossSalary / 260.0) * days }
For simplicity, we’ll use a map of “known” employees, keyed by Name
instances, that we save as a field in PayrollParserCombinators
. A real implementation would probably use a data store of some kind.
There are two other fields: currentEmployee
which remembers which employee we are processing, and grossAmount
, which remembers the gross amount of pay for the employee for the pay period. Both fields have a slight design smell. They are mutable. They are set only once per parse, but not when they are declared, only when we parse the input that allows us to calculate them. You might have also noticed that if the same PayrollParserCombinators
instance is used more than once, we don’t reset these fields to their default values. No doubt it would be possible to write scripts in the DSL that exploit this bug.
These weaknesses are not inherent to parser combinators. They reflect simplifications we used for our purposes. As an exercise, you might try improving the implementation to eliminate these weaknesses.
We have added Javadoc-style @return
annotations for most of the productions to make it clear what they are now returning. In some cases, the productions are unchanged, as the original parser instances are fine as is. Most of the changes reflect our desire to calculate the paycheck as we go.
Consider the new paycheck
production.
/** @return Parser[(Employee, Paycheck)] */ def paycheck = empl ~ gross ~ deduct ^^ { case e ~ g ~ d => (e, Paycheck(g, g-d, d)) }
Now, we return a Pair
with the Employee
and the computed Paycheck
. The empl ∼ gross ∼ deduct
combination would still return Parser[String]
(we’ll drop the path-dependent prefix for now). We have added a new combinator ^^
, e.g., prod1 ^^ func1
, where func1
is a function. If prod1
succeeds, then the result of applying func1
to the result of prod1
is returned. That is, we return func1(prod1)
.
For paycheck
, we give it a function literal that does a pattern match to extract the three results from empl
, gross
, and deduct
, respectively. We create a 2-tuple (Pair
) with e
, the Employee
, and a Paycheck
calculated from the gross salary for the pay period (in g
) and the sum of all the deductions (in d
).
It’s important to keep clear that the anonymous function passed as an argument to ^^
returns a tuple (Employee, Paycheck)
, but the production paycheck
method itself returns a Parser[(Employee, Paycheck)]
. This pattern has been true from the beginning, actually, where Strings
were always involved in our first version. It will remain true for all the production rules in PayrollParserCombinators
.
The empl
production assumes the employee’s first name and last name are given. (Obviously, this would be inadequate in a real application.)
/** @return Parser[Employee] */ def empl = "paycheck" ~> "for" ~> "employee" ~> employeeName ^^ { name => val names = name.substring(1, name.length-1).split(" ") // remove "" val n = Name(names(0), names(1)); if (! employees.contains(n)) throw new UnknownEmployee(n) currentEmployee = employees(n) currentEmployee }
To construct the name, the embedded double quotes have to be removed, which is why we start by extracting the substring that tosses the first and last characters. The name is used to look up the Employee
instance in the map, saving the value in the currentEmployee
field. In general, there is not a lot of “graceful” error handling in PayrollParserCombinators
. However, the empl
method handles the case where no employee is found with the specified name, throwing an UnknownEmployee
exception when this occurs.
The rest of the productions work similarly. Sometimes, a parser converts an input string to an Int
(e.g., duration
) or a Money
(e.g., gross
). An interesting case is deduct
. It folds the list of deductions into a single deduction amount, using addition. The foldLeft
method takes two argument lists. The first has a single argument that specifies the initial value, in this case zero Money
. The second argument list has a single function literal argument that takes two arguments, the accumulated value of the folding operation and an item from the list. In this case, we return the sum of the arguments. So, foldLeft
iterates over the items
collection, adding them together. See the section called “Traversing, Mapping, Filtering, Folding, and Reducing” in Chapter 8, Functional Programming in Scala for more information on foldLeft
and related operations.
The weeks
and days
productions remind us that we are using parser combinators based on regular-expressions. (We’re also using stringLiteral
, decimalNumber
, and floatingPointNumber
provided by JavaTokenParsers
). Note that weeks
and days
ignore the parsed string. They just return a multiplication factor used to determine total days in the pay period in the duration
production rule.
There are other combinator methods for applying functions to parser results in different ways. See the Parsers
Scaladoc page for details.
The following (somewhat incomplete) specification shows the calculation of paychecks when there are no deductions and when there are several deductions.
// code-examples/DSLs/payroll/pcdsl/payroll-parser-comb-spec.scala package payroll.pcdsl import scala.util.parsing.combinator._ import org.specs._ import payroll._ import payroll.Type2Money._ // Doesn't test "sad path" scenarios... object PayrollParserCombinatorsSpec extends Specification("PayrollParserCombinators") { val salary = Money(100000.1) // for a full year val gross = salary / 26. // for two weeks val buck = Employee(Name("Buck", "Trends"), salary) val employees = Map(buck.name -> buck) implicit def money2double(m: Money) = m.amount.doubleValue "PayrollParserCombinators" should { "calculate the gross == net when there are no deductions" in { val input = """paycheck for employee "Buck Trends" is salary for 2 weeks minus deductions for {}""" val p = new PayrollParserCombinators(employees) p.parseAll(p.paycheck, input) match { case p.Success(Pair(employee, paycheck),_) => employee mustEqual buck paycheck.gross must beCloseTo(gross, Money(.001)) paycheck.net must beCloseTo(gross, Money(.001)) // zero deductions? paycheck.deductions must beCloseTo(Money(0.), Money(.001)) case x => fail(x.toString) } } "calculate the gross, net, and deductions for the pay period" in { val input = """paycheck for employee "Buck Trends" is salary for 2 weeks minus deductions for { federal income tax is 25. percent of gross, state income tax is 5. percent of gross, insurance premiums are 500. in gross currency, retirement fund contributions are 10. percent of gross }""" val p = new PayrollParserCombinators(employees) p.parseAll(p.paycheck, input) match { case p.Success(Pair(employee, paycheck),_) => employee mustEqual buck val deductions = (gross * .4) + Money(500) val net = gross - deductions paycheck.gross must beCloseTo(gross, Money(.001)) paycheck.net must beCloseTo(net, Money(.001)) paycheck.deductions must beCloseTo(deductions, Money(.001)) case x => fail(x.toString) } } } }
If you work out what the results should be from the the input strings, you’ll see that the implementation correctly calculates the paycheck.
Besides the many small details that differ between this implementation of the external DSL and the previous implementation of the internal DSL, there is one big conceptual difference from the two implementations. Here we are computing the paycheck as we parse code written in the external DSL. In the internal DSL case, we generated a paycheck calculator when we parsed the DSL. Afterwards, we used that calculator to compute paychecks for one employee at a time. We could have generated a paycheck calculator like we did before, but we chose a simpler approach to focus on the construction of the parser itself. Also, as we discussed above, we weren’t as careful about thread-safety and other issues in the implementation.
Scala provides rich support for creating your own internal and external DSLs. However, a non-trivial DSL can be a challenge to implement and debug. For the examples in this chapter, the parser combinators implementation was easier to design and write than the implementation for the internal DSL. However, we found that debugging the internal DSL was easier.
You must also consider how robust the parser must be when handling invalid input. Depending on the level of sophistication of the users of the DSL, you may need to provide very good feedback when errors occur, especially when your users are non-programmers. The parser combinator library in Scala version 2.8 will provide improved support for error recovery and reporting, compared to the version 2.7.X library.
The version 2.8 library will also provide support for writing packrat parsers that can implement unambiguous parsing expression grammars (PEGs). The 2.8 implementation of packrat parsers also supports memoization, which helps improve performance, among other benefits. If you need a fast parser, a packrat parser will take you further before you need to consider more specialized tools, like parser generators.
It’s tempting to create DSLs with abandon. DSLs in Scala can be quite fun to work with, but don’t underestimate the effort required to create robust DSLs that meet your clients usability needs, nor long-term maintenance and support issues.
If you choose to write a DSL, you have rich options in Scala. The syntax is flexible yet powerful enough that an internal DSL may be sufficient. A internal DSL is an excellent starting point, especially if other programmers will be the primary writers of code in the DSL.
If you expect your non-programming stakeholders to read or even write code written in the DSL, it might be worth the extra effort to create an external DSL that eliminates as many of the programming-language idioms as possible. Consider whether or not the code written in the DSL will need to be processed for other purposes, like generating documentation, spreadsheets, etc. Since you will have to write a parser for the DSL anyway, it might be straightforward to write others to handle these different purposes.
In the next chapter, we’ll explore the richness of Scala’s type system. We’ve learned many of its features already. Now, we’ll explore the type system in full detail.
No comments yet
Add a comment