▲Unification (2018)eli.thegreenplace.net

69 points by asplake 3 days ago | 14 comments

primitivesuave 3 days ago [-]

Brought back memories to over a decade ago with Mathematica. It's quite easy to pattern match simple expressions like:

``` expr = foo[bar[k], baz[V]]; expr /. foo[x_, baz[y_]] :> {x, y} ```

But in real-world use cases, this paradigm quickly exposes some unhandled edge cases, or cases where the evaluation becomes prohibitively expensive. Nowadays I use TypeScript, and this has ignited some curiosity into finding out if TS does anything to optimize for this in type inference.

asplake 3 days ago [-]

That’s kinda what brought unification to my attention. For my own education I’m writing a compiler for a simple ML-style language. Enjoying Pierce’s Types and Programming Languages meanwhile.

piinbinary 3 days ago [-]

Unification is the core of Algorithm W, aka Hindley–Milner type inference. It's at the core of the type inference algorithms for languages like Haskell, OCaml, and standard ML.

cubefox 3 days ago [-]

Where do you need full type unification rather only type pattern matching?

housecarpenter 3 days ago [-]

Consider the identity function f, which just takes an argument and returns it unchanged, and has the polymorphic type a -> a, where a is a type variable. What's the type of f(f)?

Obviously, since f(f) = f it should be a -> a as well. But to infer it without actually evaluating it, using the standard Hindley-Milner algorithm, we reason as follows: the two f's can have different specific types, so we introduce a new type variable b. The type of the first f will be a substitution instance of a -> a, the type of the second f will be a substitution instance of b -> b. We introduce a new type variable c for the return type, and solve the equation (a -> a) = ((b -> b) -> c), using unification. This gives us the substitution {a = (b -> b), c = (b -> b)}, and so we see that the return type c is b -> b.

But if we use pattern matching rather than unification, the variables in one of the two sides of the equation (a -> a) = ((b -> b) -> c) are effectively treated as constants referring to atomic types, not variables. Now if we treat the variables on the left side as constants, i.e. we treat a as a constant, we have to match b -> b to the constant a, which is impossible; the type a is atomic, the type b -> b isn't. If we treat the variables on the right side as constants, i.e. we treat b and c as constants, then we have to match a to both b -> b and c, and this means our substitution will have to make b -> b and c equal, which is impossible given that c is an atomic type and b -> b isn't.

guerrilla 3 days ago [-]

I think in dependently typed programming languages and theorem proves. As I understand it, even System F (like Haskell) benefits, no?

dunham 1 hours ago [-]

As I understand it, dependent type theory requires higher order unification, which is undecidable. So they typically use bidirectional type checking instead of Hindley Milner.

With bidirection type checking, I think it only needs the unification when solving inserted implicits. So a plain dependent type theory wouldn't need it, but once you add implicit parameters, you do need it. They usually use pattern unification, which solves a subset of higher order unification problems, for those unification problems.

cubefox 3 days ago [-]

I guess in System F, if a function f accepts a complex type A, and another function g returns a complex type B, both types could involve type variables. Then for the compiler to check whether the expression f(g) is valid, it (the compiler) needs to determine whether a unification of A and B is possible. Not sure though.

moi2388 3 days ago [-]

“ lowercase letter represents a constant (could be any kind of constant, like an integer or a string) An uppercase letter represents a variable”

I already disagree with this syntax.

brodo 3 days ago [-]

That's the default way it has been done in Prolog since forever. If you have static typeing, like in Soufflé, you can get around it.

cubefox 3 days ago [-]

It's also used in the theory of formal grammars, where terminal symbols (constants) are lower case, and non-terminal symbols (variables) are upper case: https://en.wikipedia.org/wiki/Formal_grammar#Introductory_ex...

moi2388 3 days ago [-]

I did not know this, thanks

maweki 3 days ago [-]

Datalog does not need/do unification for rule evaluation, as it is just matching variables to values in a single direction. Body literals are matched against the database and the substitutions are applied to the rest of the rule and the head.

Prolog does unification of the proof goal with the rule head. It's necessary there but not with datalog.

kevindamm 3 days ago [-]

While bottom-up evaluation is the norm in datalog, it is not a requirement and there are datalog engines that evaluate top-down or all-at-once.

But I still agree with you about the capitalization. Some formats, like KIF, use a '?' prefix instead, and I've seen some HRF notations that mix the '?' prefix with non-KIF formatting (':-' operator and '.' terminator).

Loading comments...

primitivesuave 3 days ago [-]

Brought back memories to over a decade ago with Mathematica. It's quite easy to pattern match simple expressions like:

``` expr = foo[bar[k], baz[V]]; expr /. foo[x_, baz[y_]] :> {x, y} ```

asplake 3 days ago [-]

That’s kinda what brought unification to my attention. For my own education I’m writing a compiler for a simple ML-style language. Enjoying Pierce’s Types and Programming Languages meanwhile.

piinbinary 3 days ago [-]

Unification is the core of Algorithm W, aka Hindley–Milner type inference. It's at the core of the type inference algorithms for languages like Haskell, OCaml, and standard ML.

cubefox 3 days ago [-]

Where do you need full type unification rather only type pattern matching?

housecarpenter 3 days ago [-]

Consider the identity function f, which just takes an argument and returns it unchanged, and has the polymorphic type a -> a, where a is a type variable. What's the type of f(f)?

guerrilla 3 days ago [-]

I think in dependently typed programming languages and theorem proves. As I understand it, even System F (like Haskell) benefits, no?

dunham 1 hours ago [-]

As I understand it, dependent type theory requires higher order unification, which is undecidable. So they typically use bidirectional type checking instead of Hindley Milner.

cubefox 3 days ago [-]

moi2388 3 days ago [-]

“ lowercase letter represents a constant (could be any kind of constant, like an integer or a string) An uppercase letter represents a variable”

I already disagree with this syntax.

brodo 3 days ago [-]

That's the default way it has been done in Prolog since forever. If you have static typeing, like in Soufflé, you can get around it.

cubefox 3 days ago [-]

moi2388 3 days ago [-]

I did not know this, thanks

maweki 3 days ago [-]

Prolog does unification of the proof goal with the rule head. It's necessary there but not with datalog.

kevindamm 3 days ago [-]

While bottom-up evaluation is the norm in datalog, it is not a requirement and there are datalog engines that evaluate top-down or all-at-once.