compilerdesign

00

Quick Start

Installation & first usage

01

Lexical Analyzer

cd.lexical_analyzer(code)

02

Symbol Table

cd.build_symbol_table(code)

03

Grammar Transforms

cd.eliminate_left_recursion(g)

04

FIRST & FOLLOW

cd.compute_first_follow(g)

05

LL(1) Table

cd.build_ll1_table(g)

06

Shift-Reduce

cd.shift_reduce_parse(p, t)

07

Leading & Trailing

cd.compute_leading_trailing(g)

08

LR(0) Items

cd.compute_lr0_items(p, start)

09

Expression Convert

cd.infix_to_postfix(expr)

10

Three-Address Code

cd.generate_three_address_code(e)

11

DAG Construction

cd.build_dag(expr)

12

Pretty-Printers

cd.show_*(result)

1
Install the package

Requires Python 3.8 or later. No external dependencies.
bash
```
pip install compilerdesign
```
2
Import as cd

Use the conventional alias cd for brevity.
python
```
import compilerdesign as cd
```

3

Try your first analysis

python

import compilerdesign as cd

# Tokenize C-style code
result = cd.lexical_analyzer("int x = 10 + y;")
print(result['summary'])

# Convert expression notation
print(cd.infix_to_postfix("a + b * c"))  # a b c * +

4
See the output as a clean table

Every function has a matching show_* pretty-printer. It prints an ASCII table / trace and returns the same object so you can chain.
python
```
# Raw dict
result = cd.lexical_analyzer("int x = 10;")

# Pretty table (same dict, printed nicely)
cd.show_lexical(result)
```

ℹ️

Grammar format: dict of non-terminal → list of production strings. Symbols are space-separated. Use ε or eps for epsilon.

💡

Two calls, two styles. cd.xxx(...) gives you the raw dict to inspect programmatically; cd.show_xxx(result) prints it as a teacher-style table. Jump straight to §12 Pretty-Printers for the full list.

01

Lexical Analyzer

Tokenize source code · count token types

cd.lexical_analyzer

lexical_analyzer(code: str) → dict

lexer tokenizer

Tokenizes C-style source code using regex-based pattern matching. Classifies each token as keyword, identifier, constant, operator, punctuation, string literal, or comment. Returns detailed counts and unique token lists.

python

code = """
int main() {
    int x = 10;
    float y = 3.14;
    return x + y;
}
"""

result = cd.lexical_analyzer(code)

print(result['summary'])
# {
#   'total_tokens': 24,
#   'total_characters': 52,
#   'total_lines': 7,
#   'keywords': 5,
#   'identifiers': 3,
#   'integer_constants': 1,
#   'float_constants': 1,
#   'operators': 3,
#   'punctuation': 8,
# }

print(result['tokens'])         # [('KEYWORD','int'), ('KEYWORD','main'), ...]
print(result['unique_identifiers']) # ['x', 'y']
print(result['keywords_used'])      # ['int', 'float', 'return', 'main']

↳ Return value keys

Key	Type	Description
tokens	list of tuples	Each element is (token_type, value)
summary	dict	Count of each token category + lines + characters
unique_identifiers	list	Deduplicated list of all identifier tokens
keywords_used	list	Deduplicated list of all keyword tokens found

💡

Token types: KEYWORD, IDENTIFIER, INTEGER_CONSTANT, FLOAT_CONSTANT, OPERATOR, PUNCTUATION, STRING_LITERAL, COMMENT, UNKNOWN

02

Symbol Table

Scope-aware declaration tracking

cd.build_symbol_table

build_symbol_table(code: str) → dict

semantic scope

Scans C-like source code and records every declared function, parameter and variable with its type, kind, scope and line number. Intended for teaching — not a full type checker.

python

code = """
int globalVar;
int add(int a, int b) {
    int sum = a + b;
    return sum;
}
"""

table = cd.build_symbol_table(code)
cd.show_symbol_table(table)   # pretty table

print(table['total'])      # 5
print(table['entries'][0])
# {'name': 'globalVar', 'type': 'int', 'kind': 'variable', 'scope': 'global', 'line': 2}

↳ Return value keys

Key	Type	Description
entries	list of dict	Each entry has `name, type, kind, scope, line`
total	int	Number of unique symbols recorded

💡

kind is one of function, parameter, variable. Scope is either global, a function name, or block@<line>.

03

Grammar Transforms

Left recursion · Left factoring · Ambiguity check

📌

Grammar format: A dict mapping each non-terminal (string) to a list of production strings. Symbols are space-separated. Use ε or eps for epsilon.

cd.eliminate_left_recursion

eliminate_left_recursion(grammar: dict) → dict

grammarLR

Eliminates immediate left recursion from each non-terminal. Handles A → Aα | β patterns by introducing A' non-terminals.

python

grammar = {
    'E': ['E + T', 'T'],
    'T': ['T * F', 'F'],
    'F': ['( E )', 'id']
}

new_g = cd.eliminate_left_recursion(grammar)
# {
#   'E':  ["T E'"],
#   "E'": ["+ T E'", 'ε'],
#   'T':  ["F T'"],
#   "T'": ["* F T'", 'ε'],
#   'F':  ['( E )', 'id']
# }

cd.left_factoring

left_factoring(grammar: dict) → dict

grammarLL

Applies left factoring to eliminate common prefixes in productions. Introduces new non-terminals with primes (') to factor out shared prefixes.

python

grammar = {'A': ['a b', 'a c', 'd']}

factored = cd.left_factoring(grammar)
# {'A': ["a A'", 'd'],  "A'": ['b', 'c']}

cd.check_ambiguity

check_ambiguity(grammar: dict) → dict

grammarvalidation

Heuristic ambiguity detection. Flags duplicate productions and productions sharing the same leading symbol. Not a complete decidability check.

python

dup_grammar = {'A': ['a', 'a']}
result = cd.check_ambiguity(dup_grammar)
# {'ambiguous': True, 'issues': ["A has duplicate productions."]}

clean_grammar = {'A': ['a b', 'c']}
result = cd.check_ambiguity(clean_grammar)
# {'ambiguous': False, 'issues': []}

04

FIRST & FOLLOW

Set computation for predictive parsing

cd.compute_first_follow

compute_first_follow(grammar: dict, start: str = None) → dict

setsLL(1)

Computes both FIRST and FOLLOW sets in one call. The start symbol defaults to the first key in the grammar dict. Returns sorted lists for each non-terminal.

python

grammar = {
    'E': ['T R'],    'R': ['+ T R', 'ε'],
    'T': ['F Y'],    'Y': ['* F Y', 'ε'],
    'F': ['( E )', 'i']
}

result = cd.compute_first_follow(grammar, start='E')

print(result['FIRST'])
# {'E': ['(', 'i'], 'R': ['+', 'ε'], 'T': ['(', 'i'],
#  'Y': ['*', 'ε'], 'F': ['(', 'i']}

print(result['FOLLOW'])
# {'E': ['$', ')'], 'R': ['$', ')'], 'T': ['$', ')', '+'],
#  'Y': ['$', ')', '+'], 'F': ['$', ')', '*', '+']}

↳ Individual functions

python

grammar = {
    'E': ['T R'], 'R': ['+ T R', 'ε'],
    'T': ['F Y'], 'Y': ['* F Y', 'ε'],
    'F': ['( E )', 'i']
}
first = cd.compute_first(grammar)
follow = cd.compute_follow(grammar, start='E')

05

LL(1) Predictive Parsing

Table construction · top-down parsing

cd.build_ll1_table

build_ll1_table(grammar: dict, start: str = None) → dict

LL(1)table

Builds the LL(1) predictive parsing table from the grammar. Detects conflicts and reports whether the grammar is LL(1).

python

grammar = {
    'E': ['T R'], 'R': ['+ T R', 'ε'],
    'T': ['F Y'], 'Y': ['* F Y', 'ε'],
    'F': ['( E )', 'i']
}
r = cd.build_ll1_table(grammar, start='E')

print(r['is_ll1'])      # True
print(r['conflicts'])  # [] (empty if LL(1))
print(r['table'])      # {('E','('): 'T R', ('E','i'): 'T R', ...}

cd.ll1_parse

ll1_parse(grammar: dict, tokens: list, start: str = None) → dict

LL(1)parser

Parses a list of input tokens using the LL(1) table. Returns step-by-step trace of the parse including stack state, remaining input, and action taken at each step.

python

grammar = {
    'E': ['T R'], 'R': ['+ T R', 'ε'],
    'T': ['F Y'], 'Y': ['* F Y', 'ε'],
    'F': ['( E )', 'i']
}
result = cd.ll1_parse(
    grammar,
    tokens=['i', '+', 'i'],
    start='E'
)

print(result['accepted'])  # True

for step in result['steps']:
    print(step['stack'], '|', step['input'], '|', step['action'])

↳ Each step dict contains

Key	Type	Description
stack	list	Current parse stack (top-first)
input	list	Remaining input tokens
action	str	Action taken: Match X, A → prod, ACCEPT, or ERROR

06

Shift-Reduce Parsing

Bottom-up parsing simulation

cd.shift_reduce_parse

shift_reduce_parse(productions: list, tokens: list) → dict

LRbottom-up

Simulates shift-reduce parsing using handle-finding. Productions are passed as a list of (lhs, rhs) tuples where rhs is a list of symbols. The first production's LHS is treated as the start symbol.

python

productions = [
    ('E', ['E', '+', 'T']),
    ('E', ['T']),
    ('T', ['T', '*', 'F']),
    ('T', ['F']),
    ('F', ['id'])
]

result = cd.shift_reduce_parse(productions, tokens=['id', '+', 'id'])

print(result['accepted'])  # True

for step in result['steps']:
    print(f"Stack: {step['stack']}  Input: {step['input']}  Action: {step['action']}")

07

Leading & Trailing

Operator precedence parsing sets

cd.compute_leading_trailing

compute_leading_trailing(grammar: dict) → dict

setsoperator precedence

Computes LEADING and TRAILING sets used in operator-precedence (Floyd) parsing. LEADING(A) = terminals that can appear leftmost in any derivation of A. TRAILING(A) = terminals that can appear rightmost.

python

grammar = {
    'E': ['T R'], 'R': ['+ T R', 'ε'],
    'T': ['F Y'], 'Y': ['* F Y', 'ε'],
    'F': ['( E )', 'i']
}
result = cd.compute_leading_trailing(grammar)

print(result['LEADING'])
# {'E': ['(', '+', '*', 'i'], 'R': ['+'], ...}

print(result['TRAILING'])
# {'E': [')', '+', '*', 'i'], ...}

# Individual functions:
leading  = cd.compute_leading(grammar)
trailing = cd.compute_trailing(grammar)

08

LR(0) Items

Canonical collection · closure · goto

cd.compute_lr0_items

compute_lr0_items(productions: list, start: str) → dict

LR(0)automatastates

Constructs the canonical collection of LR(0) item sets. Automatically adds the augmented production S' → S. Returns all states with their items and the GOTO transition table between states.

python

productions = [
    ('E', ['E', '+', 'T']),
    ('E', ['T']),
    ('T', ['id'])
]

result = cd.compute_lr0_items(productions, start='E')

print(f"Total states: {result['num_states']}")  # Total states: 6

for state in result['states']:
    print(f"\nState {state['id']}:")
    for item in state['items']:
        print(f"  {item}")  # e.g. "E' -> . E"

for (frm, sym, to) in result['transitions']:
    print(f"  State {frm} --{sym}--> State {to}")

↳ Return value keys

Key	Type	Description
states	list of dicts	Each dict has `id` (int) and `items` (list of strings)
transitions	list of tuples	(from_state_id, symbol, to_state_id)
num_states	int	Total number of LR(0) states
augmented_start	str	The added augmented start symbol (e.g. `E'`)

09

Expression Conversion

Infix ↔ Postfix ↔ Prefix

Expression Converters

All 6 pairwise conversions supported

ICGpostfixprefixinfix

Convert arithmetic expressions between infix (standard), postfix (RPN), and prefix (Polish notation). Supports + - * / % ^ with correct operator precedence and right-associativity for ^.

python

# Infix → Postfix (Reverse Polish Notation)
cd.infix_to_postfix("a + b * c")     # "a b c * +"
cd.infix_to_postfix("(a + b) * c")   # "a b + c *"
cd.infix_to_postfix("a ^ b ^ c")     # "a b c ^ ^"  (right-assoc)

# Infix → Prefix (Polish Notation)
cd.infix_to_prefix("a + b * c")      # "+ a * b c"

# Postfix → Infix
cd.postfix_to_infix("a b c * +")      # "(a + (b * c))"

# Prefix → Infix
cd.prefix_to_infix("+ a * b c")      # "(a + (b * c))"

# Postfix → Prefix
cd.postfix_to_prefix("a b c * +")    # "+ a * b c"

# Prefix → Postfix
cd.prefix_to_postfix("+ a * b c")    # "a b c * +"

# Universal converter
cd.convert_expression(
    "a + b * c",
    from_notation="infix",
    to_notation="postfix"
)  # "a b c * +"

↳ Operator precedence table

Operator	Precedence	Associativity
+ -	1 (lowest)	Left
* / %	2	Left
^	3 (highest)	Right

10

Three-Address Code

Quadruples from an infix expression

cd.generate_three_address_code

generate_three_address_code(expr: str) → dict

IR 3AC

Converts an infix arithmetic expression into a linear sequence of quadruples. Each quadruple has an operator and at most three addresses: result = arg1 op arg2. Temporaries are named t1, t2, ….

python

tac = cd.generate_three_address_code("a + b * c - d")
cd.show_three_address_code(tac)   # pretty table

for line in tac['pretty']:
    print(line)
# t1 = b * c
# t2 = a + t1
# t3 = t2 - d

print(tac['final'])   # 't3' (where the result lives)

↳ Return value keys

Key	Type	Description
quadruples	list of dict	Each has `op, arg1, arg2, result`
pretty	list of str	Ready-to-print `t_k = x op y` lines
final	str	Name of the final temporary holding the result

11

DAG Construction

Directed Acyclic Graph of an expression

cd.build_dag

build_dag(expr: str) → dict

optimization CSE

Builds a DAG for an arithmetic expression. Common sub-expressions share a single node automatically, making redundant computations easy to spot.

python

dag = cd.build_dag("a + a * b")
cd.show_dag(dag)

# 'a' appears only once in the node list — shared leaf
print([n['label'] for n in dag['nodes']])
# ['a', 'b', '*', '+']

↳ Return value keys

Key	Type	Description
nodes	list of dict	Each node: `id, label, children` (indices)
root	int	Index of the root node (whole expression)

12

Pretty-Printers

Clean, tabular output for every result

ℹ️

Every core function has a matching show_* helper that prints an ASCII table / parse trace and returns the same object it received — so you can chain calls naturally: cd.show_lexical(cd.lexical_analyzer(code)).

Pretty-printer cheat sheet

show_*(result) → same object

output teaching

Topic	Core function	Pretty-printer
Lexical	`lexical_analyzer`	`show_lexical`
Symbol table	`build_symbol_table`	`show_symbol_table`
Grammar transforms	`eliminate_left_recursion` / `left_factoring`	`show_grammar`
Ambiguity	`check_ambiguity`	`show_ambiguity`
FIRST / FOLLOW	`compute_first_follow`	`show_first_follow`
LEADING / TRAILING	`compute_leading_trailing`	`show_leading_trailing`
LL(1) table	`build_ll1_table`	`show_ll1_table`
Parse trace	`ll1_parse` / `shift_reduce_parse`	`show_parse_trace`
LR(0) items	`compute_lr0_items`	`show_lr0`
Three-address code	`generate_three_address_code`	`show_three_address_code`
DAG	`build_dag`	`show_dag`

python

# End-to-end example — every step shows a clean table
import compilerdesign as cd

src = "int add(int a, int b) { int c = a + b * 2; return c; }"

cd.show_lexical(cd.lexical_analyzer(src))
cd.show_symbol_table(cd.build_symbol_table(src))
cd.show_three_address_code(cd.generate_three_address_code("a + b * 2"))
cd.show_dag(cd.build_dag("a + b * 2"))

g = {'E': ['E + T', 'T'], 'T': ['T * F', 'F'], 'F': ['( E )', 'id']}
g = cd.eliminate_left_recursion(g)
cd.show_grammar(g, "NO LEFT RECURSION")
cd.show_first_follow(cd.compute_first_follow(g, start='E'))
cd.show_ll1_table(cd.build_ll1_table(g, start='E'))
cd.show_parse_trace(cd.ll1_parse(g, ['id', '+', 'id'], start='E'))

13

Project Links

Meet the developers · package release

Developers

A

Aditya

Library Author · Core API

◎ Portfolio portfolio-ig10.vercel.app ↗ ⎔ GitHub github.com/aditya-ig10 ↗

D

Devansh Singh

Docs · Web & Tooling

◎ Portfolio devanshsingh.dev ↗ ⎔ GitHub github.com/DevanshSrajput ↗

Package

PyPI Release

compilerdesign 1.0.2

pypi.org/project/compilerdesign/1.0.2/

↗

Both developer profiles are listed above — visit either portfolio for more work, or the GitHub accounts for source, issues and contributions.

Quick Start

Install the package

Import as cd

Try your first analysis

See the output as a clean table

Lexical Analyzer

Symbol Table

Grammar Transforms

FIRST & FOLLOW

LL(1) Predictive Parsing

Shift-Reduce Parsing

Leading & Trailing

LR(0) Items

Expression Conversion

Three-Address Code

DAG Construction

Pretty-Printers

Project Links

Import as `cd`