TP04 : Un Scanner pour Oberon-0
Pour la suite de ce cours, nous réaliserons on compilateur complet pour le langage Oberon-0.
Créez un nouveau projet Python oberon0_compiler selon les bonnes pratiques vues dans les TP précédents.
La structure du projet devrait ressembler à ceci :
├── .github
│ ├── actions
│ │ …
│ └── workflows
│ …
├── .pre-commit-config.yaml
├── docs
│ …
├── LICENSES
│ └── MIT.txt
├── pyproject.toml
├── README.md
├── REUSE.toml
├── src
│ └── oberon0_compiler
│ ├── __init__.py
│ ├── ast.py
│ ├── code_gen.py
│ ├── parser.py
│ ├── scanner.py
│ ├── sym_table.py
│ ├── systemcalls.py
│ ├── token.py
│ ├── type_checker.py
│ └── types.py
└── tests
…
Grammaire de Oberon-0
Pour rappel, voici la grammaire de Oberon-0 :
ident = letter {letter | digit}.
integer = digit {digit}.
number = integer.
factor = ident (ActualParameters) |
number | "(" expression ")" | "~" factor.
term = factor {("*" | "DIV" | "MOD" | "&") factor}.
SimpleExpression = ["+"|"-"] term {("+"|"-" | "OR") term}.
expression = SimpleExpression
[("=" | "#" | "<" | "<=" | ">" | ">=") SimpleExpression].
assignment = ident ":=" expression.
ActualParameters = "(" [expression {"," expression}] ")" .
ProcedureCall = ident [ActualParameters].
IfStatement = "IF" expression "THEN" StatementSequence
{"ELSIF" expression "THEN" StatementSequence}
["ELSE" StatementSequence] "END".
WhileStatement = "WHILE" expression "DO" StatementSequence "END".
RepeatStatement = "REPEAT" StatementSequence "UNTIL" expression.
statement = [assignment | ProcedureCall | IfStatement |
WhileStatement | RepeatStatement].
StatementSequence = statement {";" statement}.
IdentList = ident {"," ident}.
type = ident.
FPSection = ["VAR"] IdentList ":" type.
FormalParameters = "(" [FPSection {";" FPSection}] ")".
ProcedureHeading = "PROCEDURE" ident ["*"] [FormalParameters].
ProcedureBody = declarations ["BEGIN" StatementSequence] "END" ident.
ProcedureDeclaration = ProcedureHeading ";" ProcedureBody.
declarations =
["CONST" {ident "=" expression ";"}]
["VAR" {IdentList ":" type ";"}]
{ProcedureDeclaration ";"}.
module = "MODULE" ident ";" declarations "END" ident "." .
pyproject.toml
Voici un exemple de pyproject.toml que vous pouvez utiliser comme point de départ pour votre projet :
[project]
name = "oberon0-compiler"
version = "0.1.2"
description = "Oberon-0 compiler implemented in Python"
authors = [{ name = "Jacques Supcik", email = "jacques.supcik@hefr.ch" }]
keywords = ["oberon", "compiler", "interpreter"]
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
"loguru>=0.7.3",
"rich>=14.3.2",
"typer>=0.21.2",
"wasm-gen @ git+https://github.com/heiafr-isc/wasm-gen-py.git",
]
[dependency-groups]
dev = [
"black>=26.1.0",
"bump-my-version>=1.2.7",
"furo>=2025.12.19",
"mypy>=1.19.1",
"pre-commit>=4.5.1",
"pyright>=1.1.408",
"pytest>=9.0.2",
"ruff>=0.15.0",
"sphinx>=9.1.0",
"sphinx-design>=0.7.0",
"sphinxcontrib-napoleon>=0.7",
"sphinxcontrib-typer>=0.8.0",
]
[build-system]
requires = ["uv_build>=0.10.1,<0.11.0"]
build-backend = "uv_build"
[tool.ruff.lint]
select = ["E", "F", "B", "I"]
[tool.pyright]
venvPath = "."
venv = ".venv"
[project.scripts]
oberon0-compiler = 'oberon0_compiler:app'
[tool.bumpversion]
current_version = "0.1.2"
parse = "(?P<major>\\d+)\\.(?P<minor>\\d+)\\.(?P<patch>\\d+)"
serialize = ["{major}.{minor}.{patch}"]
search = "{current_version}"
replace = "{new_version}"
regex = false
ignore_missing_version = false
tag = true
sign_tags = false
tag_name = "v{new_version}"
tag_message = "Bump version: {current_version} → {new_version}"
allow_dirty = false
commit = true
message = "Bump version: {current_version} → {new_version}"
commit_args = ""
pre_commit_hooks = ["uv sync", "git add uv.lock"]
[[tool.bumpversion.files]]
filename = "docs/conf.py"
search = "release = \"{current_version}\""
replace = "release = \"{new_version}\""
[[tool.bumpversion.files]]
filename = "src/oberon0_compiler/__init__.py"
search = "__version__ = \"{current_version}\""
replace = "__version__ = \"{new_version}\""
pre-commit
Voici un exemple de configuration pre-commit que vous pouvez utiliser pour votre projet :
# SPDX-FileCopyrightText: 2026 Jacques Supcik <jacques.supcik@hefr.ch>
#
# SPDX-License-Identifier: MIT
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/psf/black
rev: 26.1.0
hooks:
- id: black
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.0
hooks:
- id: ruff-check
- repo: https://github.com/RobertCraigie/pyright-python
rev: v1.1.408
hooks:
- id: pyright
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.19.1
hooks:
- id: mypy
- repo: https://github.com/fsfe/reuse-tool
rev: v6.2.0
hooks:
- id: reuse
Le scanner
Commencez par écrire scanner : src/oberon0_compiler/scanner.py.
Nous vous proposons d’utiliser les Tokens suivants :
# SPDX-FileCopyrightText: 2026 Jacques Supcik <jacques.supcik@hefr.ch>
#
# SPDX-License-Identifier: MIT
"""
Oberon-0 tokens
"""
from enum import Enum
class Token(str, Enum):
NULL = "null"
TIMES = "*"
DIV = "DIV"
MOD = "MOD"
AND = "&"
PLUS = "+"
MINUS = "-"
OR = "OR"
EQL = "="
NEQ = "#"
LSS = "<"
LEQ = "<="
GTR = ">"
GEQ = ">="
PERIOD = "."
NOT = "~"
LPAREN = "("
IDENT = "identifier"
NUMBER = "number"
IF = "IF"
WHILE = "WHILE"
REPEAT = "REPEAT"
COMMA = ","
COLON = ":"
BECOMES = ":="
RPAREN = ")"
THEN = "THEN"
OF = "OF"
DO = "DO"
SEMICOLON = ";"
END = "END"
ELSE = "ELSE"
ELSIF = "ELSIF"
UNTIL = "UNTIL"
CONST = "CONST"
VAR = "VAR"
PROCEDURE = "PROCEDURE"
BEGIN = "BEGIN"
MODULE = "MODULE"
EOF = "eof"
OTHER = "unknown"
def __str__(self) -> str:
return self.value
Implémentez la méthode get_next_symbol de la classe Scanner :
# SPDX-FileCopyrightText: 2026 Jacques Supcik <jacques.supcik@hefr.ch>
#
# SPDX-License-Identifier: MIT
"""
Oberon-0 scanner
"""
import io
from dataclasses import dataclass
from enum import Enum
from pathlib import Path
from .token import Token
@dataclass
class Position:
file_name: str
line_no: int
col_no: int
class ScannerError(Exception):
def __init__(self, message: str, position: Position) -> None:
super().__init__(message)
self.position = position
def __str__(self) -> str:
p = self.position
return (
f"{self.args[0]} (File {p.file_name}, Line {p.line_no}, Column {p.col_no})"
)
# @typing.no_type_check
@dataclass
class Scanner:
eof: bool = False
sym: Enum | None = None # Next Symbol
value: str = ""
file_name: Path | None = None
line_no: int = 0
col_no: int = 0
_ch: str = ""
_text: io.TextIOBase | None = None
_text_line: str = ""
_keyword = {str(i): i for i in Token if str(i).isupper()}
_symbol = {
str(i): i for i in Token if not str(i).isupper() and not str(i).islower()
}
def open(self, text: io.TextIOBase) -> None:
self._text = text
if isinstance(text, io.TextIOWrapper):
self.file_name = Path(text.name)
else:
self.file_name = None
self.get_next_char()
def position(self) -> Position:
return Position(
file_name=str(self.file_name) if self.file_name else "",
line_no=self.line_no,
col_no=self.col_no,
)
def skip_space(self) -> None:
while self._ch.isspace():
self.get_next_char()
def skip_comment(self) -> None:
while True:
self.get_next_char()
if self.eof:
raise ScannerError("Unterminated comment", self.position())
if self._ch == "(":
self.get_next_char()
if self._ch == "*":
self.get_next_char()
self.skip_comment()
if self._ch == "*":
self.get_next_char()
if self._ch == ")":
self.get_next_char()
return
def get_next_char(self) -> None:
assert self._text is not None
while not self.eof and self._text_line == "":
self._text_line = self._text.readline()
self.line_no += 1
self.col_no = 0
if self._text_line == "":
self.eof = True
break
self._text_line = self._text_line.rstrip()
if self.eof:
self._ch = ""
else:
assert self._text_line != ""
self._ch = self._text_line[0]
self._text_line = self._text_line[1:]
self.col_no += 1
def get_next_symbol(self) -> None: # noqa: C901
"""This method should set the following attributes:
- `self.sym` to the next symbol (of type `Token`)
- `self.value` to the value of the symbol (if applicable, e.g. for identifiers and numbers)
- `self.eof` to `True` if the end of file is reached, `False` otherwise
"""
# TODO(Student) : Implémentez la méthode `get_next_symbol`
Pour tester votre scanner, vous pouvez utiliser le programme principal suivant :
# SPDX-FileCopyrightText: 2026 Jacques Supcik <jacques.supcik@hefr.ch>
#
# SPDX-License-Identifier: MIT
"""
Oberon-0 compiler
"""
import sys
from pathlib import Path
from typing import Annotated, TypeAlias
import typer
from loguru import logger
from rich.console import Console
from .scanner import Scanner
console = Console()
app = typer.Typer()
FilterDict: TypeAlias = dict[str | None, str | int | bool]
__version__ = "0.1.2"
def version_callback(value: bool) -> None:
if value:
print(f"Oberon0 compiler version: {__version__}")
raise typer.Exit()
@app.command(context_settings={"ignore_unknown_options": False})
def main( # noqa: PLR0913
source: Annotated[Path, typer.Argument(help="Oberon-0 source file (.mod)")],
version: Annotated[
bool | None,
typer.Option("--version", callback=version_callback, is_eager=True),
] = None,
debug: bool = False,
debug_scanner: bool = False,
) -> None:
"Oberon-0 compiler"
logger.remove()
level_per_module: FilterDict = {"": "INFO"}
if debug:
level_per_module[""] = "DEBUG"
if debug_scanner:
level_per_module["oberon0_compiler.scanner"] = "DEBUG"
logger.add(sys.stdout, filter=level_per_module, level=0)
scanner = Scanner()
try:
source_file = source.open("r")
scanner.open(source_file)
except OSError as e:
logger.error(f"Cannot open source file {source}: {e}")
raise typer.Exit(code=1) from e
if __name__ == "__main__":
app()
Tests unitaires
Il est aussi grand temps de commencer à écrire des tests pour votre compilateur. Vous pouvez vous inspirer du code suivant :
# SPDX-FileCopyrightText: 2026 Jacques Supcik <jacques.supcik@hefr.ch>
#
# SPDX-License-Identifier: MIT
import io
import typing
from oberon0_compiler.scanner import Scanner
from oberon0_compiler.token import Token
def test_assignment() -> None:
src = "VAR i := 12;"
scanner = Scanner()
scanner.open(io.StringIO(src))
scanner.get_next_symbol()
assert scanner.sym == Token.VAR
scanner.get_next_symbol()
next_sym = typing.cast(Token, scanner.sym)
assert next_sym == Token.IDENT
assert scanner.value == "i"
scanner.get_next_symbol()
next_sym = typing.cast(Token, scanner.sym)
assert next_sym == Token.BECOMES
scanner.get_next_symbol()
next_sym = typing.cast(Token, scanner.sym)
assert next_sym == Token.NUMBER
assert scanner.value == "12"
scanner.get_next_symbol()
next_sym = typing.cast(Token, scanner.sym)
assert next_sym == Token.SEMICOLON
def test_compare_leq() -> None:
src = "i <= -5"
scanner = Scanner()
scanner.open(io.StringIO(src))
scanner.get_next_symbol()
assert scanner.sym == Token.IDENT
assert scanner.value == "i"
scanner.get_next_symbol()
next_sym = typing.cast(Token, scanner.sym)
assert next_sym == Token.LEQ
assert scanner.value == "<="
scanner.get_next_symbol()
next_sym = typing.cast(Token, scanner.sym)
assert next_sym == Token.MINUS
scanner.get_next_symbol()
next_sym = typing.cast(Token, scanner.sym)
assert next_sym == Token.NUMBER
assert scanner.value == "5"
def test_compare_less() -> None:
src = "i < 0"
scanner = Scanner()
scanner.open(io.StringIO(src))
scanner.get_next_symbol()
assert scanner.sym == Token.IDENT
assert scanner.value == "i"
scanner.get_next_symbol()
next_sym = typing.cast(Token, scanner.sym)
assert next_sym == Token.LSS
assert scanner.value == "<"
scanner.get_next_symbol()
next_sym = typing.cast(Token, scanner.sym)
assert next_sym == Token.NUMBER
assert scanner.value == "0"
Utilisez le framework de test pytest pour exécuter vos tests.
Lisez la documentation de pytest pour en savoir plus sur comment écrire des tests en Python