This natural language compiler translates words, phrases, and sentences from a subset of English into their lambda calculus and predicate logic representations (and determines their semantic types). It demonstrates the correctness of this semantic model and applies the theory to implement a linguistic modelling tool with practical pedagogical applications.
Technical aspects:
- lexing and parsing using ocamllex and menhir
- a novel compiler that automates the computation of lambda calculus types, denotations, and derivations for fundamental syntax structures
- a lambda calculus interpreter that automates lambda reductions
- a command-line user interface implemented using the Core library
Predicate logic is a formal logical language whose vocabulary includes individual constants, individual variables, predicates, logical operators, and quantifiers. Here are examples of how this system can model natural language sentences:
- Individual constants refer to specific entities
-
$a$ could represent an entity named Alex -
$b$ could represent an entity named Bob - etc.
-
- Predicates refer to the attributes and actions of individuals
-
$swims(a)$ represents the sentence "Alex swims" -
$knows(a,b)$ represents "Alex knows Bob" -
$old(a)$ respresents "Alex is old" - etc.
-
- Logical operators include ¬ (NOT), ∧ (AND), ∨ (OR), → (IF...THEN)
-
$swims(a) ∧ old(a)$ represents "Alex swims and Alex is old" -
$swims(a) ∨ swims(b)$ represents "Alex swims or Bob swims", i.e. "Either Alex or Bob swims" -
$old(a) → old(b)$ represents "If Alex is old, then Bob is old" -
$¬knows(a,b)$ represents "Alex does not know Bob"
-
- Quantifiers include
$∀$ (for all) and$∃$ (there exists). They can be used with individual variables to model more general statements as follows-
$∀x(swims(x))$ represents "For all$x$ ,$x$ swims", i.e. "Everyone swims" -
$∃y(knows(b,y))$ represents "There exists$y$ such that Bob knows$y$ ", i.e. "Bob knows someone" -
$∀p(∃q(knows(p,q)))$ represents "For all$p$ , there exists$q$ such that$p$ knows$q$ ", i.e. "Everyone knows someone" -
$∀p(politician(p)→evil(p))$ represents "For all$p$ , if$p$ is a politician then$p$ is evil", i.e. "Every politician is evil"
-
These logical symbols can all be combined to create complex sentences. For example, consider the PredL sentence
This represents the sentence "For all
Predicate logic allows us to model complete sentences, but we can use the lambda calculus to model even more types of phrases. The syntax of the lambda calculus includes all predicate logic sentences as well as new constructions called lambda abstractions and applications:
- Abstractions are of the form
$[λx.e]$ , representing a function that takes$x$ as an input and outputs$e$ - Applications are of the form
$P (Q)$ , representing the application of the function$P$ to the input$Q$
For example, the abstraction
This project implements a unified semantic model for computing the types and lambda calculus denotations for natural language phrases. As a simple illustration of how lambda calculus can be applied to natural language semantics, consider the sentence "Alex swims". We can represent each word of this sentence as lambda abstractions as follows:
- "Alex" ->
$[λP.P(a)]$ , where$a$ is the individual constant representing the entity Alex - "swims" ->
$[λx.swim(x)]$
To obtain the denotation of "Alex swims", we simply apply that of "Alex" to that of "swims", to obtain the lambda application
- "Alex swims" is represented by a predicate logic complete sentence, so it has type
$t$ , which stands for truth value - "swims" is represented by the function
$[λx.swim(x)]$ that takes in an entity$x$ and produces a predicate logic sentence$swims(x)$ of type$t$ (as stated above), so it has type$< e, t >$ denoting a function with an$e$ (entity) input and$t$ (sentence) output - "Alex" is represented by the function
$[λP.P(a)]$ that takes in a function$P$ of type$< e, t >$ (as shown above) and produces a predicate logic sentence$P(a)$ , so it has type$<< e, t >, t >$ denoting a function with an input of type$< e, t >$ and output of type$t$
In this way, lambda calculus allows us to determine logical representations of intermediate words and phrases, as well combine them in order to represent compound phrases and eventually sentences. For a full specification of this model, see this paper (or experiment with the compiler to see how different phrases are represented with lambda calculus).
This project automates the translation of natural language phrases into the formal language of predicate logic and lambda calculus, but translating the entire English language is outside its scope. Instead, we will work with a small subset of English that includes proper nouns, common nouns, determiners, relative clauses, intransitive verbs, transitive verbs, and adjectives. This toy language is formally specified by the following context-free grammar:
S -> NP VP
NP -> PN | DET CN | DET RCN
VP -> IV | TV NP | is ADJ | is a CN | is a RCN
RCN -> CN that VP | CN that NP TV
CN -> mathematician | filmmaker | lawyer | engineer | ADJ CN
IV -> studies | sleeps | eats | swims
TV -> loves | hates | teaches | knows
ADJ -> clever | sleepy | funny | grumpy
DET -> a | every | some | no
PN -> Alex | Caleb | Christie | Lauren
The following table lists the various phrases included in the language and examples of how they can be constructed from the grammar above.
Phrase Type | Abbreviation | Example |
---|---|---|
proper noun | PN | Alex |
determiner | DET | every |
adjective | ADJ | clever |
instransitive verb | IV | swims |
transitive verb | TV | loves |
common noun | CN | filmmaker, ADJ CN: funny lawyer |
relative clause common noun | RCN | CN that VP: mathematician that sleeps, lawyer that loves Lauren CN that NP VP: lawyer that Lauren loves |
noun phrase | NP | PN: Christie DET CN: every filmmaker DET RCN: a lawyer that swims |
verb phrase | VP | IV: sleeps TV NP: teaches Caleb is ADJ: is grumpy is a CN: is a filmmaker is a RCN: is a lawyer that eats |
sentence | S | NP VP: Alex swims |
Observe that this is an infinitely large language, and that we can construct arbitrarily long and complex phrases, such as Every funny clever mathematician that Alex loves hates a filmmaker that is a lawyer that swims
. The compiler is also highly extensible, and can be modified easily to add words to the toy vocabulary (eg. adding different names or common nouns) or expand the language to include more syntactic structures (eg. conjunctions, negation, pronouns, modal verbs, punctuation, prepositions, etc.).
- To install
opam
, find platform-specific installation instructions here OR copy the following into your terminal:
bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)"
-
To initialize opam, run
opam init
and entery
when prompted (you should be asked a question similar toDo you want opam to modify ~/.zshrc? [N/y/f]
). -
To set up the OCaml compiler, run both of the following:
opam switch create 4.14.1
eval $(opam env)
- To install
core
, a required library, run:
opam install core
eval $(opam env)
- Finally, to run the program, either
- clone this repo and run with
dune
ormake run
, OR - download a bytecode executable version of this project here and run with
ocamlrun main.bc
.
Running the program launches a command-line interface and prints a set of simple instructions to the terminal:
As stated in the instructions, the user can enter grammar
in order to display the toy language grammar and an example sentence that can be translated:
When a phrase is entered, the lambda calculus denotation as well as the semantic type of the phrase are printed:
Entering a phrase that violates the syntactic rules of the toy language or uses words that are either misspelled or not in the toy language will produce the corresponding error messages (with the latter case taking precedence, so a synactically nonsensical phrase with a misspelled word will produce an invalid word error):
In addition to full sentences, individual words or phrases can also be translated: