Domain Specific Languages #2 | From Blank Page to Syntax: Rapid DSL Design in Action

The Minimal Type Syntax: Core Concepts

Image credit: ChatGPT (OpenAI).

We will design a minimal and clean type syntax heavily influenced by GoLang 1.

1Book {
2    Title text
3    Author text
4    Pages number
5    Lendable yn
6}

Let’s split the text into TOKENS. A token is a group of characters. The lexer first reads and groups the character stream into semi-meaningful groups. Then the parser uses these to generate a syntax tree.

1TYPE_NAME  LEFT_CURLY
2    PROPERTY_NAME PROPERTY_TYPE
3    PROPERTY_NAME PROPERTY_TYPE
4    PROPERTY_NAME PROPERTY_TYPE
5    PROPERTY_NAME PROPERTY_TYPE
6RIGHT_CURLY

So our first token is TYPE_NAME, which is defined as [A-Z][A-Za-z_]*.

Our first rule is:

  • A type must start with an uppercase letter.
  • It may also contain lowercase letters.
  • It may also include underscores _.

Notice the trailing *. It means “zero or more occurrences,” in other words, optional.

So:

  • A is a valid name.
  • A_type_name_may_be_very_long is also valid.

If we had to use + instead of *, it would mean “one or more occurrences.” In that case:

  • A would not be a valid type name,
  • While Ab would be valid.

Applying a similar pattern, our first tiny and non-refactored grammar file looks like this:

1grammar Type;
2type: TYPE_NAME '{' prop* '}';
3
4prop: PROPERTY_NAME ('text' | 'number' | 'yn');
5
6PROPERTY_NAME: TYPE_NAME;
7TYPE_NAME: [A-Z][A-Za-z_]*;

Our initial core design is now complete.

See you in the next post.