The Minimal Type Syntax: Core Concepts

We will design a minimal and clean type syntax heavily influenced by GoLang 1.
1Book {
2 Title text
3 Author text
4 Pages number
5 Lendable yn
6}
Let’s split the text into TOKENS. A token is a group of characters. The lexer first reads and groups the character stream into semi-meaningful groups. Then the parser uses these to generate a syntax tree.
1TYPE_NAME LEFT_CURLY
2 PROPERTY_NAME PROPERTY_TYPE
3 PROPERTY_NAME PROPERTY_TYPE
4 PROPERTY_NAME PROPERTY_TYPE
5 PROPERTY_NAME PROPERTY_TYPE
6RIGHT_CURLY
So our first token is TYPE_NAME, which is defined as [A-Z][A-Za-z_]*.
Our first rule is:
- A type must start with an uppercase letter.
- It may also contain lowercase letters.
- It may also include underscores
_.
Notice the trailing *. It means “zero or more occurrences,” in other words, optional.
So:
Ais a valid name.A_type_name_may_be_very_longis also valid.
If we had to use + instead of *, it would mean “one or more occurrences.” In that case:
Awould not be a valid type name,- While
Abwould be valid.
Applying a similar pattern, our first tiny and non-refactored grammar file looks like this:
1grammar Type;
2type: TYPE_NAME '{' prop* '}';
3
4prop: PROPERTY_NAME ('text' | 'number' | 'yn');
5
6PROPERTY_NAME: TYPE_NAME;
7TYPE_NAME: [A-Z][A-Za-z_]*;
Our initial core design is now complete.
See you in the next post.