jneens web site

52-card Tichu

52 card tichu is an adaptation of the commercial game Tichu, which itself is a combination of elements from Bridge and a Chinese game commonly known in the US as “deuces” or “president”. It uses a standard 52-card deck, and requires exactly 4 players in 2 teams of 2.

I made this adaptation for two reasons - one is that I often found myself wanting to play this game but only had a regular deck of cards. The other is that the original is, quite frankly, full of racist Chinese stereotypes, and I kind of don’t feel like playing with those cards or that terminology.

I also found that I usually wanted a slightly shorter game, and one that was suitable for competitive play, so the scoring system has been reworked a bit. Only using the difference in scores makes it possible to quickly lose by overzealously calling Tichu (as opposed to simply drawing out the game).

Rules for 52 card Tichu

General Setup

Players sit around 4 sides of a table, with partners across the table, as in bridge.

Objective

Each team has a combined score. Game ends when one team is ahead by 400 points, or when 10 rounds have been played and the score isn’t tied. Since only the difference in score matters, in scoring we only keep track of which team is ahead by how many points. The scores are only updated at the end of a round.

Scoring

Teams are awarded points by:

(+100) for completing a tichu
(+100) for breaking an opposing tichu
(+200) for a skunk (in this case the piles are not counted)
(+200) for completing a grand tichu
(+200) for breaking an opponent's grand tichu

(+5) for each 5 in each partner's pile
(+10) for each 10 or Q in each partner's pile
(+25) for the `A♠` in either partner's pile
(-25) for the `A♢` in either partner's pile

Note that the pile points always net 100. (4*5 + 8*10 + 25 - 25)

Preamble

The entire deck is dealt. Players secretly exchange one card with each other player. Play continues until either:

  • Only one player has cards remaining in their hand. In this case their hand and their pile are given to the pile of the player who emptied their hand first, and the piles are scored.
  • Both members of a team have emptied their hands, and both opponents have cards remaining in their hands. This is called a skunk. In the event of a skunk, the piles are not counted.

Calling Tichu

At any point when a player has 13 cards in their hand, they may call tichu. When a player has called tichu, they are making a bet of 100 points that they (not their teammate) will empty their hand first. If successful, their team gains 100 points. If unsuccessful, the opponent’s team receives a bonus 100 points.

Note that the only condition for calling tichu is to have all 13 cards. A player can pass for as long as they like, and call tichu late in the game. A player will often call tichu as they are playing their first cards. In contrast, a player may also pre-emptively call tichu before cards are passed.

A grand tichu is the same as a regular tichu, except that it is worth 200 points, and can only be called when the player has seen only 8 of their cards. It is customary to deal each player’s first 8 cards separately, in case anyone wants to call grand tichu before picking up the remaining 6 cards.

Playing a Trick

At the start of the game, whoever has A♣ in their hand has the lead. The player with the lead plays a set of cards in one of the below formats. After the first play of the trick, the format is fixed for the rest of the trick. Play continues to the left, and each player has the opportunity to strictly beat the last play, or pass to the next player. If three players pass in a row, the last player to play takes the played cards and puts them face-down in front of them in their pile. This player has the lead for the next trick, and may choose a new format.

Formats

The formats are as follows:

  • Single cards 4, 8, J
  • Pairs 4 4, 5 5, K K
  • Triples 3 3 3, 10 10 10
  • Consecutive Pairs 4 4 5 5, 5 5 6 6 or 2 2 3 3 4 4, 9 9 10 10 J J
  • Straights of 5 or more cards 2 3 4 5 6 7
  • Full House 3 3 3 J J, Q Q Q 2 2

Notes:

  • For Straights and Consecutive Pairs, there is no limit to the length - but the length is considered part of the format. So on top of a 7-card straight, only a higher straight with exactly 7 cards can be played.

  • Full Houses are compared only by comparing the triple. For example, J J J 2 2 can be played on top of 3 3 3 K K, since J is higher than 3.

Aces

Aces are neither high nor low - they may only be played according to their individual rules:

  • A♠ (“the spade”): This card has value infinity. It can only be played as a single card - never in combination with others. If this card wins a trick (i.e. is on top of the played cards when three players pass), the player of this card must choose an opponent to give the pile to. When the points from the pile are counted, this is worth 25 points.
  • A♢ (“the diamond”): If played in combination with other cards, this is a wild card, and can take any value from 2 to K. When played as a single card, it adds 0.5 to the value below it. For example, 6 A♢ 7 is a valid sequence of single card plays. It can be played on top of a K, to form “King and a half”. A♠ beats King and a half.
  • A♡ (“the heart”): This card can only be played on an empty table, when the player has the lead. It is immediately put into the player’s teammate’s pile, and the teammate leads the next trick.
  • A♣ (“the club”): This card has value 1, and can be used either as a single card or in a straight (e.g. A♣ 2 3 4 5). The player with this card in their hand takes the first lead. When this card is played, the player may optionally make a “wish” for a value from 2 to K. The next player who can legally play that value must do so. The wish remains in effect until a card of that value is played - between tricks, rounds, and even games.

Bombs

There are two kinds of bombs: Four of a Kind (e.g. 3 3 3 3), or a Straight Flush of 5 or more cards (e.g. 2♣ 3♣ 4♣ 5♣ 6♣ 7♣). They cannot contain Aces, including A♢. Bombs can be played out of format, and out of turn. A bomb is considered higher than all non-bomb plays, and can only be beaten by a higher bomb. Bombs are compared with the following rules:

  • Straight Flushes are higher than Fours of a Kind
  • Longer Straight Flushes are higher than shorter ones, no matter the value
  • Straight Flushes of the same length are compared by their high card

Notes, Edge Cases, Conventions

  • For the card exchange at the beginning, no cards are exchanged until each player has made their selections. Usually the cards are laid out in front of each player face down, facing the player they wish to trade with, and trade between partners first before trading with opponents.

  • The wish also applies to bombs - if a player has a bomb containing the wished value, they must play it unless they can fulfill the wish another way.

  • Since bombs can be played out of turn, a player may pre-empt their own turn with a bomb to get around having to fulfill the A♣‘s wish. For example, if there is an active wish for a 2, and a player’s hand contains 2 4 4 4 4, they may play the bomb before their turn, allowing them to pass without fulfilling the wish.

  • An A♢ played on an empty table has value 0.5. A player can legally play a 2 or an A♣ on top of it.

  • If a player’s only remaining card is A♡ and they do not have the lead, it is impossible for them to ever empty their hand.

  • An efficient way to count pile points is to only count one team’s cards, double the number, and subtract from 100. That team takes a loss of that number of points (or, if negative, gains that number of points).

  • To prevent inadvertently passing the same card to an opponent and giving them a pair or a bomb, it is a loose convention to pass an even card left and an odd card right.

Comments
more »

Tulip, an Untyped Functional Language Which is Going to Be Pretty Cool (updated)

 )`(
(   ) .tulip
  |/

This language is gonna rock, I’m really excited about it. This post is an update to the previous Tulip post. Update: The latest version of this intro to Tulip can be found in docs/intro.md in the source repo.

I owe Adam Baker a huge thanks for talking through the first version of this with me in January, and Morgan Astra for polishing off the design with me in May.

So here’s tulip!

Design goals

Tulip is designed to sit in the intersection of repl tools (bash, zsh, ex, etc) and scripting languages. It’s intended to scale both down to small things typed at a repl, and up to large-ish programs, to the extent you’d want to build with ruby, python, clojure, etc.

It’s an interpreted, impure, dynamic functional language, but for the most part it enables the user to behave as if they were in a pure language with a Hindley-Milner type-checker. Open variants (called “tagwords”), paired with patterns and open methods, are the central strategy to manage polymorphism.

Importantly, tulip aims to be obvious and easy to explain. You should not need to study PLT to understand the core concepts of tulip. To that end, we try to avoid jargon and unclear language.

Finally, tulip is a language full of flowers and rainbows and unicorns, with a deliberately diverse team and a strictly enforced code of conduct. If that’s a problem you can maybe find another language.

Basic syntax

Variables are kebab-case identifiers, comments are from # to the end of the line, and values are assigned to variables with my-variable = my-value. Indentation is not semantic (as that’s not a repl-friendly feature), and newlines and semicolons (;) are syntactically equivalent. All user and library functions are identifiers, and all symbols are reserved for the language. There are no identifier keywords - all special directives will be prefixed with @ (see @method and @impl below, for example).

Chaining and Auto-Curry

For ergonomic purposes on a repl, it is critical to support typing small but powerful programs linearly - from left to right. Prefix function calls usually require seeking left every time you want to use the result of an expression. This gets very tedious very quickly. Tulip has two features that combine to make this easy.

The first is auto-currying. Users of Haskell or ML will find this familiar. If a function is called with fewer than its expected number of arguments, it returns a function expecting the rest of the arguments. For example,

(add 5) 3

calls the function (add 5) on the value 3, and returns 8. Luckily the precedence works out such that you can just type add 5 3 in this case and get on with your day.

The second is one of two infix operators in the language: >, pronounced “into”, “reverse-apply”, or simply “then”. This operator expects a value on the left and a function on the right, and simply applies the function to the value. So the example above could be rewritten as:

3 > add 5

Proper standard library design here is critical. Functions to be used on the repl must be written to take their chained argument last, or else typing becomes much more difficult. In the case that you are using a function in a way the author didn’t expect, there’s a sibling symbol to >, which is the special variable -. If a segment in a >-chain contains -, the chained argument is placed there instead. For example,

foo > bar - baz

is equivalent to bar foo baz. Alternately, you can bind a variable to a name for later within a chain:

foo > x => bar > baz > add x

Tagwords, Tuples

Tagwords are the basic building block of all tulip’s user-space data structures. Syntactically, a tagword is simply an identifier sigiled with a ., as in .foo. When a tagword is called as a function, it creates a tagged value. When a tagged value is called as a function, it appends the data to its internal list. Let’s look at an example:

.foo # => .foo
.foo 1 # => .foo 1
.foo 1 2 # => .foo 1 2

In simple terms, they construct values. Some examples of tagwords that will be commonly used in the standard library are:

# booleans
.t
.f

# lists
.cons 1 (.cons 2 (.cons 3 .nil))

# result
.ok value
.err message

# option
.some value
.none

For a more detailed explanation of this pattern, see my talk at clojure conj 2014

A tuple is multiple expressions joined with ,, as in foo bar > baz, zot. The comma has lower precedence than >, so often tuples are surrounded by parentheses: foo (bar, baz > zot). Tuples are represented with the .tuple tag, and are right-nesting, so that:

(.x, .y, .z) # => .tuple x (.tuple .y .z)
((.x, .y), .z) # => .tuple (.tuple .x .y) .z

This allows for open methods (see below) to be defined for arbitrary lengths of tuples.

Lambdas and patterns

It’s expected that lambdas will be a common thing to type on the repl, so the syntax has been kept very lightweight. A basic lambda looks like this:

[ foo => some-expr-in foo ]

where the variable foo is bound inside the block. Nested blocks have the usual properties of closures.

Things get a little more interesting with destructuring. Rather than being a simple variable binder (foo above), the binder can also match and destructure tagged values over multiple clauses:

map f = [ .nil => .nil; .cons x xs => .cons (f x) (map f xs) ]

The : symbol matches and binds multiple patterns over the same value. Patterns can also include named pattern or type checks (sigiled with %).

add = [ (x : %uint) (y : %uint) => add-uint x y ]
map (f:%callable) = ...
foo (bar : .bar _ _) = ...

All the native types will have %-sigiled type names, and there will be a way to implement your own named checks, which is still in design.

Patterns also have predicates on any part of the match that can use bound names from sub-matches:

[ .foo (x ? gt 4 -) => ...; .foo x ? is-even x => ... ]

Blocks can be used with > to destructure arguments. For example, here is tulip’s equivalent of an if expression:

condition > [ .t => true-value; .f => false-value ]

The special pattern _ acts as usual - it matches any value and binds no results. A simple guard pattern can be implemented as

[ _? cond-1 => val-1; _? cond-2 => val-2 ]

Naming all the variables and typing => every time can be a pain on the repl, so for simple blocks tulip provides a feature called autoblocks and the autovar. Autoblocks are also delimited by square brackets, but contain no pattern and only a single clause. The autovar, denoted $, is the argument to the closest-nested autoblock:

list > map [ some-fn $ arg ]

Future plans may also include support for $1, $2, etc.

Literals and Macros

I strongly dislike macros that can hide in code. I get really frustrated when I open a source file and see (foo ...) and can’t tell whether it’s a function or a macro until I read documentation. For these reasons, extensible literals and macros in tulip are all sigiled with /. Here is a basic macro: the list constructor:

/list[1; 2; 3]

The implementation syntax for these is still in design phase, but they will take after rust in that they will pattern-match against syntax, and result in new syntax. I expect list to be so common that for that special case it is permitted to leave off the macro name: /[1; 2; 3] is equivalent.

Strings are delimited with '{...}, which balances curly braces and respects \{ and \}. But since many string literals are much simpler, you can also use ' as a one-sided delimiter that scans to whitespace or one of the delimiters ], ), or >. Here are some examples:

'{foo} # => the string {foo}
'foo # => the string {foo}

For interpolation, use "{...} with $(...):

"{a string with $(some-expression) interpolated}

Let, recursion, laziness

Tulip is not a lazy language (mostly). If you want a lazy expression, simply prefix it with ~. Evaluation of lazy expressions is deferred until a pattern-match needs the value. A non-pattern-matching function called with a lazy value results in a further lazy value.

To bind variables in an ad-hoc way, open a block scope with {}:

foo = {
  bar = 1

  # defines a local function
  baz zot = 2

  quux
}

Variables defined in this way are called let-bound, as opposed to argument-bound. Let-bound function arguments can pattern match in the same way as lambda arguments. Multiple definitions of the same function in a sequence of bindings behaves the same as multiple clauses in a lambda.

Tulip supports let-recursion: a variable bound in a let-expression (name = value) can use itself in its definition. A sequence of let-expressions can also refer to each other recursively. This generally works fine when the values defined are all functions, but it can break down when the values use recursive references strictly:

# the lazy y-combinator (.o_o).
fix f = { x = f x; x }

In this case, tulip will detect the recursion, and the f x will be automatically wrapped with a lazy operator (~). Any reference loop in a sequence of lets will be closed by converting the last link to a lazy reference. This makes it simple to create graph-like structures.

Tail call optimization is supported and tail recursion encouraged.

Definitions that begin with > are desugared as follows:

foo = > bar > baz

# equivalent to
foo x = x > bar > baz

Flags, Optional Arguments, Maps

Flags are identifiers that begin with -. A flag-pair is a flag followed immediately (no space) by an : and an expression. When a flag-pair is called with another flag-pair argument, it merges that pair into its internal record. For example,

-foo: 1 # => (-foo:1)
-foo: 1 -bar: 2 # => (-foo:1 -bar:2)

Flags are used for keyword arguments. Given a function defined as:

foo -bar: x -baz: y = ...

it can be called as:

foo -baz: 3 -bar: 2

Splats for keyword arguments are still in design phase, as are optional arguments and boolean flags. The latter will automatically be converted to .some/.none or .t/.f.

I should be note here that flags - in particular optional arguments - are complex, especially when paired with auto-currying. But they are a critical tool for implementing repl interfaces. So they are supported, but are generally discouraged unless you are exporting a function to be used on a repl.

Methods, Protocols, Implementations

A common problem in general-purpose languages is to define an interface that can be implemented by users later. This is the purpose of, for example, defprotocol and multimethods in clojure, most good uses of interfaces in java, and typeclasses in Haskell. A method declaration in tulip looks like this:

@method map _ %

This is read as: map is a method that takes two arguments, and dispatches on the second argument. Some implementations of this method might look like:

@impl map f (.cons x xs) = .cons (f x) (map f xs)
@impl map f .nil = .nil
@impl map f (.some x) = .some (f x)
@impl map _ .none = .none
@impl map f (.ok x) = .ok (f x)
@impl map f (.err e) = .err e

In implementation definitions, only the dispatched argument can be pattern-matched, and only simple tag destructures, native type checks, and default (_) are allowed. Open recursion, hower is both allowed and encouraged. Multiple dispatch is also planned, but not yet designed.

Side Effects, Zero Arguments, Multiple Expressions

Tulip is an impure language - functions are allowed to have side effects - i.e. change the state of the world in some way other than the arguments passed to it.

If you want to execute a function just for its side effects and return a different value, you may use a block:

{ do-thing arg1; actual-value }

You can chain together as many calls here as you like. If you want to define a function that takes no arguments, you can use the special parameter !:

# a zero-argument definition
do-all-things! = {
  do-thing-1 arg-1
  do-thing-2 arg-2
  .ok
}

# a zero-argument call
do-all-things!

# a zero-argument lambda clause
[ ! => ... ]

Often, for lambdas whose primary purpose is side effects, you will want to simply evaluate multiple expressions without an extra layer of nesting. For this, tulip supports block-lambda sugar:

# a block-lambda
{ x => foo x; bar x }

# equivalent to...
[ x => { foo x; bar x }]

# a zero-argument block-lambda
{ ! => do-a-thing! ; do-another-thing! }

Concurrency

Tulip’s data structures are immutable. Under the hood, there are only 4 side-effecting operations: spawn, send, receive, and ffi calls.

The spawn function takes a zero-argument function and spawns an erlang-style actor with a message queue and a backup queue, and returns a handle to the process. The receive function takes a single-argument function and blocks until the queue contains a matching message. An optional -timeout: (seconds, action) flag can be passed, which will run the zero-argument function action if timeout seconds pass without a matching message. The send function takes a process handle and a message, and sends the message to the process.

Modules, Objects, Importing

A module is a collection of definitions, usually living in a file. A file can either be a script or export multiple modules.

A module is declared as follows:

@module my-module-name [
  foo = 1
  bar = 2
]

my-module-name/foo # => 1

Any name ending with - is considered private and will not be re-exported. Tagwords whose names start with two dots (as in ..foo) are namespaced to the module - pattern matches in other modules against those tagwords will need to use .my-module-name/foo to pattern-match.

For the root module of a file, the square brackets can be omitted. A module is imported into another module with the @import directive:

@module my-module

@import another-module
@import yet-another-module

A module with parameters is called an object:

@object Point x y [
  magnitude = /[x; y] > map square > sum > sqrt
  move dx dy = Point (add x dx) (add y dy)
]

center = Point 0 0 # => *<Point 0 0>
center/x # => 0
center/move 2 3 # => *<Point 2 3>

The arguments act as if every member was a function with those arguments curried in. Here, Point is an object constructor and center is an object instance.

Coming Soon

Features coming soon include:

  • dynamic variables sigiled with $
  • custom binding behavior with infix |
  • infixing user functions with `foo
  • an ffi system to RPython and C
  • macro definition syntax
  • custom @annotations, special defs

Putting it all together

Tulip is not a small language. My goal was to include enough features to be useful, add extensibility in the right places, but also guarantee a consistent look with predictable behaviors. Tulip is still in active development, and I could use a whole lot of help, both filling in the design gaps here and actually churning out the implementation. Please ping me on twitter (@jneen_) with any feedback or if you want to get involved!

Still interested? Follow @tuliplang on Twitter for more updates, and come join us in #tulip on the snek slack.

If you’re queer/trans, a person of color, and/or a woman, we’d love for you to join the very small team working on this. We currently need help with:

  • VM implementation (in RPython)
  • spec writing
  • example writing
  • the frontend implementation (parsing, etc)
  • test infrastructure
  • standard library design.

<3 <3 <3

–Jeanine

Comments
more »

Tulip, an Untyped Functional Language Which is Going to Be Pretty Cool

 )`(
(   ) .tulip
  |/

UPDATE: I’ve made some changes to the design! There’s an updated version of this post here.

This language is gonna rock, I’m really excited about it. I’ve talked to a number of people about tulip, but this is the first I’ve sat down and written out what it’s going to look like. In particular, I owe Adam Baker a huge thanks for talking through the first version of this with me in January.

So here’s tulip!

Design goals

Tulip is designed to sit in the intersection of repl tools (bash, zsh, ex, etc) and scripting languages. It’s intended to scale both down to small things typed at a repl, and up to large-ish programs, to the extent you’d want to build with ruby, python, clojure, etc.

It’s an interpreted, pure, dynamic functional language, but for the most part it enables the user to behave as if they were in a language with a Hindley-Milner type-checker. Open variants (called “tagwords”), paired with patterns and open methods, are the central strategy to manage polymorphism.

Importantly, tulip aims to be obvious and easy to explain. You should not need to study PLT to understand the core concepts of tulip. To that end, we try to avoid jargon and unclear language.

Basic syntax

Variables are kebab-case identifiers, comments are from # to the end of the line, and values are assigned to variables with + my-variable = my-value. Indentation is not semantic (as that’s not a repl-friendly feature), and newlines and semicolons (;) are syntactically equivalent. All user functions are identifiers, and all symbols are reserved for the language. There are no identifier keywords - all special directives will be prefixed with @ (see @method and @impl below, for example).

Chaining and Auto-Curry

For ergonomic purposes on a repl, it is critical to support typing small but powerful programs linearly - from left to right. Prefix function calls usually require seeking left every time you want to use the result of an expression. This gets very tedious very quickly. Tulip has two features that combine to make this easy.

The first is auto-currying. Users of Haskell or ML will find this familiar. If a function is called with fewer than its expected number of arguments, it returns a function expecting the rest of the arguments. For example,

(add 5) 3

calls the function (add 5) on the value 3, and returns 8. Luckily the precedence works out such that you can just type add 5 3 in this case and get on with your day.

The second is one of two infix operators in the language: >, pronounced “into”, “reverse-apply”, or simply “then”. This operator expects a value on the left and a function on the right, and simply applies the function to the value. So the example above could be rewritten as:

3 > add 5

Proper standard library design here is critical. Functions to be used on the repl must be written to take their chained argument last, or else typing becomes much more difficult. In the case that you are using a function in a way the author didn’t expect, there’s a sibling symbol to >, which is the special variable -. If a segment in a >-chain contains -, the chained argument is placed there instead. For example,

foo > bar - baz

is equivalent to bar foo baz. Alternately, you can bind a variable to a name for later within a chain:

foo > x => bar > baz > add x

Tagwords

Tagwords are the basic building block of all tulip’s user-space data structures. Syntactically, a tagword is simply an identifier sigiled with a ., as in .foo. When a tagword is called as a function, it creates a tagged value. When a tagged value is called as a function, it appends the data to its internal list. Let’s look at an example:

.foo # => .foo
.foo 1 # => .foo 1
.foo 1 2 # => .foo 1 2

In simple terms, they construct values. Some examples of tagwords that will be commonly used in the standard library are:

# booleans
.t
.f

# lists
.cons 1 (.cons 2 (.cons 3 .nil))

# result
.ok value
.err message

# option
.some value
.none

Blocks and patterns

It’s expected that lambda (“blocks”), will be a common thing to type on the repl, so it’s been kept very lightweight. A basic block looks like this:

[ foo => some-expr-in foo ]

where the variable foo is bound inside the block. Nested blocks have the usual properties of closures.

Things get a little more interesting with destructuring. Rather than being a simple variable binder (foo above), the binder can also match and destructure tagged values over multiple clauses:

+ map f = [ .nil => .nil; .cons x xs => .cons (f x) (map f xs) ]

Patterns can also include named pattern or type checks (sigiled with %):

+ add = [ (x : %uint) (y : %uint) => add-uint x y ]
+ map (f:%callable) = ...
+ foo (bar : .bar _ _) = ...

All the native types will have %-sigiled type names, and there will be a way to implement your own named checks, which is still in design.

Patterns also have predicates on any part of the match that can use bound names from sub-matches:

[ .foo (x ? gt 4 -) => ...; .foo x ? is-even x => ... ]

Blocks can be used with > to destructure arguments. For example, here is tulip’s equivalent of an if expression:

condition > [ .t => true-value; .f => false-value ]

The special pattern _ acts as usual - it matches any value and binds no results. A simple guard pattern can be implemented as

[ _? cond-1 => val-1; _? cond-2 => val-2 ]

Naming all the variables and typing => every time can be a pain on the repl, so for simple blocks tulip provides a feature called autoblocks and the autovar. Autoblocks are also delimited by square brackets, but contain no pattern and only a single clause. The autovar, denoted $, is the argument to the closest-nested autoblock:

list > map [ some-fn $ arg ]

Future plans may also include support for $1, $2, etc.

Literals, macros, parsing macros

I strongly dislike macros that can hide in code. I get really frustrated when I open a source file and see (foo ...) and can’t tell whether it’s a function or a macro until I read documentation. For these reasons, extensible literals and macros in tulip are all sigiled with /. Here is a basic macro: the list constructor:

/list[1 2 3]

The implementation syntax for these is still in design phase, but they will take after rust in that they will pattern-match against syntax, and result in new syntax. I expect list to be so common that for that special case it is permitted to leave off the macro name: /[1 2 3] is equivalent.

Parsing macros use {} in place of normal macros’ []. Definition syntax for these is also in design phase. Rather than matching against syntax, parsing are responsible for parsing a raw string into a data structure representation. When encountering one of these, the tulip parser simply scans forward for the matching close brace (respecting \{ and \}), and passes that string directly through to the implementation. So for example, /regex{\s+} is a regular expression literal, /string{foo} is a string literal, etc.

Strings are common enough that they are the default. Simply {...} is enough to quote a string literal. But since many string literals are much simpler, parsing macros also support ' as a one-sided delimiter that scans to whitespace or one of the delimiters ] or ). Here are some examples:

/string{foo} # => the string {foo}
'foo # => the string {foo}
/r{\w+} # a regex that matches identifiers
/r'\w+ # equivalent

An alternate syntax using " will also be provided, which will support backslash escapes as well as interpolation syntax which is still in design.

Let, recursion, laziness

Tulip is not a lazy language (mostly). If you want a lazy expression, simply prefix it with ~. Evaluation of lazy expressions is deferred until a pattern-match needs the value. A non-pattern-matching function called with a lazy value results in a further lazy value.

A let-expression looks like + name = value; expr. You can bind multiple names this way in an expression:

+ foo =
  + bar = 1
  + baz = 2
  zot bar baz

+ foo bar = baz

Let-bound function arguments can pattern match in the same way as lambda arguments. Multiple definitions of the same function in a sequence of let bindings behaves the same as multiple clauses in a lambda.

Tulip supports let-recursion: a variable bound in a let-expression (+ name = value) can use itself in its definition. A sequence of let-expressions can also refer to each other recursively. This generally works fine when the values defined are all functions, but it can break down when the values use recursive references strictly:

# the lazy y-combinator (.o_o).
+ fix f = + x = f x; x

In this case, tulip will detect the recursion, and the second x will be treated as a lazy reference. Any reference loop in a sequence of lets will be closed by converting the last link to a lazy reference. This makes it simple to create graph-like structures.

Tail call optimization is supported and encouraged.

Definitions that begin with > are desugared as follows:

+ foo = > bar > baz

# equivalent to
+ foo x = x > bar > baz

Flags, Optional Arguments, Maps

Flags are identifiers that begin with -. A flag-pair is a sequence of a flag and an expression. When a flag-pair is called with another flag-pair argument, it merges that pair into its internal record. For example,

-foo 1 # => (-foo 1)
-foo 1 -bar 2 # => (-foo 1 -bar 2)

Flags are used for keyword arguments. Given a function defined as:

+ foo -bar x -baz y = ...

it can be called as:

foo -baz 3 -bar 2

Since function calls are not resolved until all arguments are given, the expression foo -baz 3 will return a lambda that expects a -bar ... keyword argument.

Splats for keyword arguments are still in design phase, as are optional arguments and boolean flags. The latter will automatically be converted to .some/.none or .t/.f.

I should be note here that flags - in particular optional arguments - are complex, especially when paired with auto-currying. But they are a critical tool for implementing repl interfaces. So they are supported, but are generally discouraged unless you are exporting a function to be used on a repl.

Methods, Protocols, Implementations

A common problem in general-purpose languages is to define an interface that can be implemented by users later. This is the purpose of, for example, defprotocol and multimethods in clojure, most good uses of interfaces in java, and typeclasses in Haskell. A method declaration in tulip looks like this:

@method map _ %

This is read as: map is a method that takes two arguments, and dispatches on the second argument. Some implementations of this method might look like:

@impl map f (.cons x xs) = .cons (f x) (map f xs)
@impl map f .nil = .nil
@impl map f (.some x) = .some (f x)
@impl map .none = .none
@impl map f (.ok x) = .ok (f x)
@impl map f (.err e) = .err e

In implementation definitions, only the dispatched argument can be pattern-matched, and only simple tag destructures, native type checks, and default (_) are allowed. Open recursion, hower is both allowed and encouraged. Multiple dispatch is also planned, but not yet designed.

Methods contain a global registry of implementations, so implementations can only be specified at the top-level of a module. This restriction might be lifted in the future, but that could cause other complications.

Pipes, Lifting, I/O

Tulip has two infix operators - one is reverse-apply, which was discussed earlier, and the other is the overloadable pipe operator. The following desugars apply:

foo | bar
foo | x => bar x

# equivalent to
pipe bar foo
pipe-fn [ x => bar x ] foo

You can define pipe and pipe-fn on a particular data structure to support piping - pipe-fn gets called if the first argument is a function, using method dispatch as described above. This is useful for abstracting over computation models - you might pipe together parsers, streams, stateful actions, etc. This can yield something similar to Haskell’s do-syntax:

foo | foo-val =>
bar | bar-val =>
baz foo-val bar-val

But this is kind of a pain to write all the time. That’s why I stole bang notation from idris. Bang notation is a syntactic sugar that automatically lifts expressions into piped expressions. So the previous example could be written:

baz !foo !bar

Bang expressions are lifted to the nearest pipe, lambda, or let-definition.

Tulip is a pure langauge - that is, every function called with the same arguments always returns the same results. This can make side-effecting actions complicated. Tulip’s I/O model uses pipes to chain together actions. For example, to move some files from a directory to another, you would write

ls '/tmp/files | each (mv './some-dir)

The function each takes a list of functions that return actions and performs them one-by-one, and sends a list of the results into its pipe.

Concurrency

This part is pending a lot of design work, but tulip is intended to ship with erlang-style actors. Most of the decisions already made lend themselves rather easily to this approach.

Modules, Scripts, Importing, Private Definitions

A module is a collection of definitions, usually living in a file. A file can either be a script or export multiple modules.

A module is declared as follows:

@module my-module-name arg-1 arg-2 [
  + foo = 1
  + bar = 2
  @method ...
  @impl ...
]

For the root module of a file, the square brackets can be omitted. A module is imported into another module with the @import directive:

@module my-module arg-1
@import another-module arg-1
@import yet-another-module [ name other-name ]
@import last-one @all # imports everything that's exported

Module arguments must be statically resolvable, and can only depend on top-level definitions and other module arguments (but they can be functions, which allows for all sorts of interesting designs). Conflicting names in later imports shadow ones from earlier ones. Names can either be imported into the namespace or accessed by module-name/member-name.

Definitions in a module can be made private by using - instead of + in the definition syntax. Tagwords whose names start with - (as in .-foo) are also private to the module - pattern matches in other modules against those tagwords will not match.

A script is a different kind of tulip file. A script has a @script declaration at the top, and consists of a list of expressions which are all evaluated. If those expressions evaluate to actions, those actions are performed one at a time. Eventually the @script declaration will also be able to specify command-line arguments and environment imports, but that is also in design.

Putting it all together

Tulip is not a small language. My goal was to include enough features to be useful, add extensibility in the right places, but also guarantee a consistent look with predictable behaviors. Tulip is still in active development, and I could use a whole lot of help, both filling in the design gaps here and actually churning out the implementation. Please ping me on twitter (@jneen_) with any feedback or if you want to get involved!

EDIT: I’ve renamed the language from Unf to Tulip, because some folks pointed out that the old name created an unnecessarily sexualized environment, and I think tulips are pretty.

EDIT the second: I am no longer interested in feedback about the name. It’s tulip. I like it that way.

Comments
more »

clojure/conj variants errata

I gave a talk at clojure/conj about variants in clojure! (slides)

I want to follow up on a few pieces I think got left out - whether by my own lacking or by lack of time.

variants and multimethods

I got two questions I wish I could have expanded on way more. At 33:25 I was asked about how to handle extensibility with open variants, and at 35:10 I was asked how multimethods fit into the picture. The good news is this: multimethods are a fantastic way to allow others to extend open variants! After a few very fruitful conversations with @bobpoekert, @ambrosebs, and @swannodette, there seems to be a pretty straightforward strategy that will work with core.typed:

; the dispatch function is `first`, to grab the tag out of the variant
(defmulti perform-command first)

; each method has to destructure the variant, and will almost certainly
; ignore the tag, since it's already been matched as the dispatch value
(defmethod perform-command :print [[_ val]] (println val))
(defmethod perform-command :read [[_ fname]] (slurp fname))

; prints "hello world"
(perform-command [:print "hello world"])

; returns the contents of project.clj
(perform-command [:read "./project.clj"])

It’s got a few warts, but nothing that can’t be papered over with a few really simple macros:

(defmacro defvariant [name & args]
  `(defmulti ~name first ~@args))

(defmacro defcase [name [tag & binders] body]
  `(defmethod ~name ~tag [[_# ~@binders]] ~@body))

(defvariant perform-command)
(defcase perform-command [:print val] (println val))
(defcase perform-command [:read fname] (slurp fname))

This makes it a little more obvious we’re working with variant values. Further conveniences might include making sure to change the default value to something other than :default, since that’s likely to be a variant tag itself.

If someone wants to wrap this all up in a package people can start using that’d be super awesome!

match / recur

At about 20:10 I mentioned that match and recur indicate a tree traversal. What I meant to say was that match and regular non-tail recursion usually indicates a tree traversal, since traversals are almost never tail recursive.

mix and match structs and variants, part 1

A question I expected to get asked but didn’t was about what to do when your variant has too many data points in it, or when the variants all share one or more pieces of data. If your variant has a lot of data points, you run the risk of making something like this:

[:tag val1 val2 val3 val4 val5 val6 val7] ; l(-_-.) -just no

which is really hard to scan - and the positional anchoring of each value is going to be pretty hard to keep straight. Instead, I’d recommend that if you have more than, say, 3 data points, that you put a map in your variant, like so:

[:tag {:val1 1, :val2 2, :val3 3, :val4 4, ...}]

The match macro can even destructure these, and pick selective keys:

(match thing
  [:tag {:val2 val2, :val4 val4}] (do-something-with val2 val4)
  ...)

mix and match structs and variants, part 2

There was a moment in Ashton Kemerling’s talk where I was sitting in the audience waving my arms around and definitely not blurting out “that should totally be a variant!!”. He’s modeling user actions in a generative test like this:

{ :story 2198
  :type ::drag-drop
  :via ::selenium
  :args 2192 }

As he explains, :type is sort of which type the action is and :args is an arbitrary piece of data that gets attached. So let’s use a variant!

[::drag-drop 2198 ::selenium 2192]
[::comment 2198 ::phantomjs 123 ".(u_u,) -variants tho"]

(match action
  [::drag-drop story-id platform target] ...
  [::comment story-id platform user-id text] ...)
                      ; -(o_e,) wait but this looks worse

…not really the clarity win we were expecting. That’s because there are two pieces of shared data that in Ashton’s case are going to be on every action variant value. In this case, what I’d generally do is factor out those two into a wrapping struct:

{ :story 2198
  :via ::selenium
  :action [::drag-drop 2192] }

{ :story 1234
  :via ::phantomjs
  :action [::comment 56 "(~._.)~ `(._.-) (s._.)s ~(._.~)"] }

This way we have the struct parts and the variant parts doing what they’re each good at - aggregating data and varying data.

By the way, this action variant is a really good example of an open variant, and a perfect use case for destructuring with an extensible set of multimethods.

naming

After the conj, an attendee confessed to being “confused about the difference between variants and types”. I think what they meant was that we have a bit of a name clash between the values and the form they take. In fact, I actually see three distinct concepts which seem to be overloaded with the word “variant”:

  • Variant values. These are simply the vectors with the keywords.
  • Variant types or specs. These include the available tags, and some notion of what data goes with them.
  • Multimethods to destructure open variants, of which there may be many per spec. These were defined with defvariant above.

It’s possible we need better terminology around this (which would influence the naming of defvariant and defcase).

in languages that aren’t clojure

A common question after the talk was “I’m in $LANGUAGE, how do I do variants there?” Unfortunately, sometimes the sad truth is there’s really no good way (rubyyyyyyyyy). There are, however, two other techniques for dealing with this kind of situation in OO and dynamic OO languages. Martin Fowler described a pattern in OO languages to eliminate type tags. In this case the original antipattern is similar to the non-solution with a map from my talk - remember objects are glorified structs with inheritance. While this approach has some known issues - serializability loss, no ability to do ad-hoc destructuring, and literally the expression problem, it gets the job done for a lot of people.

However in more dynamic languages that have lightweight lambdas, we can do a bit better. I mentioned to a few people that I maintain a compiler written in coffeescript at work (.(o_o.)), and this is the approach I use there:

class MyVariant extends Variant
  # this call generates subclasses and static methods on MyVariant
  @variants
    tag1: ['val1', 'val2']
    tag2: ['val3']

# multiple constructors
MyVariant.tag1(val1, val2)
MyVariant.tag2(val3)

# single destructor
variant.cases
  tag1: (val1, val2) -> ...
  tag2: (val3) -> ...

The cases method simply calls into the correct function depending on the tag. And even better, I can define methods for my variants on the base class. It’s worked well enough for me and is in production - if people are interested I’d be happy to extract this out into an npm or bower package.

Thanks for reading and watching and pushing this further! I think there’s a lot of work to be done, but I hope variants help you rethink some painful code and avoid bugs in the future!

<3 -(n_n`)

jneen

P.S. Comments seem to be broken - I’ll look into that soon, but until then feel free to ping me on twitter.

Comments
more »

Creating Uninitialized Objects in Javascript

When I first started writing larger applications in Javascript, I was stumped about how to set up a prototype chain correctly. The idiomatic way runs code that isn’t strictly necessary:

function BaseClass(thing) { this.thing = thing; }
function SubClass() { BaseClass.apply(this, arguments); }
SubClass.prototype = new BaseClass;

The problem here is that the third line runs the constructor function with no arguments. In this case it’s fine, because it sets SubClass.prototype.thing to undefined. But what if the constructor has to do some kind of computation, or has required arguments, or (gasp) does I/O? Not the best coding practices in the world, sure, but I’ve worked with this kind of code.

In modern JS environments, the right thing to do is use Object.create:

SubClass.prototype = Object.create(BaseClass.prototype);

But if you want to support older browsers, a different strategy is called for.

It turns out, the Javascript runtime has no way of telling which constructor was used to create an object. Its instanceof operator only looks at the prototype chain. In environments supporting Object.getPrototypeOf we could implement instanceof ourselves:

function isInstanceOf(obj, constructor) {
  if (typeof constructor !== 'function') throw new TypeError;
  if (typeof obj !== 'object') throw new TypeError;

  // look up the prototype chain for a match
  while (true) {
    if (obj === constructor.prototype) return true;
    if (obj === null) return false;

    obj = Object.getPrototypeOf(obj);
  }
}

So in particular, we can use a completely different constructor to make instances of the same function!

function f() {}
function g() {}
f.prototype = g.prototype = {a: 1}
(new f) instanceof g; // => true
(new g) instanceof f; // => true

So in order to make a new “instance” for subclassing, we can just initialize an empty constructor that has the right prototype, and the runtime can’t tell the difference.

function create(_super) {
  function noop() {}
  noop.prototype = _super;
  return new noop;
}

SubClass.prototype = create(BaseClass.prototype);

Coffeescript uses this approach when it needs to construct objects with varargs. We’ve used a variation of this approach in the latest version of pjs. Pjs classes carry around ready-made noop constructors with the correct prototypes already set. For a pjs class, if you want an empty instance of MyClass, just type new MyClass.Bare.

Happy coding!

EDIT: Simplified the isInstanceOf implementation

– Jeanine

(discussion on HN)

Rouge, the pygments-compatible pure-ruby syntax highlighter

EDIT 11 Dec 2012: Rouge now supports many more languages. See them all here.

After a full month of rage coding, I’m happy to announce that Rouge is ready for folks to use.

Rouge is a syntax highlighter written purely in Ruby, with the goals of being the best quality lexing out there, easy to deploy (i.e. on Heroku Cedar, which still doesn’t support pygments.rb), and 100% compatible with pygments stylesheets. It’s also easy to extend, and contributions are welcome! Rouge can always use more themes, lexers, and formatters.

It currently supports the following languages, with many more to come:

Ruby

class (ruby("is"))::Hella
  def crazy!
    no?
  end
end

C++

#include<iostream>

using namespace std;

int main()
{
    cout << "Hello World" << endl;
}

C

#include "ruby/ruby.h"

static int
clone_method_i(st_data_t key, st_data_t value, st_data_t data)
{
    clone_method((VALUE)data, (ID)key, (const rb_method_entry_t *)value);
    return ST_CONTINUE;
}

XML

<?xml version="1.0" encoding="utf-8"?>
<xsl:template match="/"></xsl:template>

Haskell

-- sum
infixr 7 \/
(\/) :: Series a -> Series a -> Series a
s1 \/ s2 = \d -> s1 d ++ s2 d

ERB

(try using the :parent option!)

<title><%= @title %></title>

TeX

To write \LaTeX\ you would type \verb:\LaTeX:.

Scheme

(define (nested-set! root names val)
  (let loop ((cur root)
             (elts names))
    (if (null? (cdr elts))
        (module-set! cur (car elts) val)
        (loop (module-ref cur (car elts)) (cdr elts)))))

Common Lisp

(def-compound-type BIT-VECTOR (&optional (size '*)) (x)
  (ensure-dim BIT-VECTOR size)
  (and (bit-vector-p x)
       (or (eq size '*) (eql size (array-dimension x 0)))
  )
  (c-typep-vector 'BIT-VECTOR-P size x)
)

Java

public class java {
    public static void main(String[] args) {
        System.out.println("Hello World");
    }
}

Make

.PHONY: all
all: $(OBJ)

$(OBJ): $(SOURCE)
	@echo "compiling..."
	$(GCC) $(CFLAGS) $< > $@

Perl

#!/usr/bin/env perl
use warnings;
print "a: ";
my $a = "foo";
print $a;

PHP

<?php
  print("Hello {$world}");
?>

Python

def lex(code, lexer):
    """
    Lex ``code`` with ``lexer`` and return an iterable of tokens.
    """
    try:
        return lexer.get_tokens(code)
    except TypeError, err:
        if isinstance(err.args[0], str) and \
           'unbound method get_tokens' in err.args[0]:
            raise TypeError('lex() argument must be a lexer instance, '
                            'not a class')
        raise

Shell

ry::exec() {
  local names="$1"; shift
  if [[ "$names" == "all" ]]; then
    names="$(ry ls)"
  else
    names="$(tr , "\n" <<<"$names")"
  fi

  for name in $names; do
    PATH="$(ry fullpath "$name")" "$@"
  done
}

TCL

set lines [split [read $fh] "\n"]

HTML

<html>
  <head><title>Title!</title></head>
  <body>
    <p id="foo">Hello, World!</p>
    <script type="text/javascript">var a = 1;</script>
    <style type="text/css">#foo { font-weight: bold; }</style>
  </body>
</html>

CSS

body {
    font-size: 12pt;
    background: #fff url(temp.png) top left no-repeat;
}

Javascript

$(document).ready(function() { alert('ready!'); });

JSON

{ "foo": ["bar", { "baz": 1.0 } ] }
Comments
more »

Shell usability: What TCL is missing

I ran into TCL a few months ago, and spent the better part of a weekend playing and experimenting with it. I really enjoyed its flexibility, simplicity, and emphasis on command-line usability. I even went through a phase where I tried using it as my default shell in place of bash!

I very quickly ran into some problems, though, and it took me a while to figure out what was happening.

Here’s an example: say I’ve got an open file handle $fh, and I want to get an array of its lines. First I’ve got to read the file:

read $fh

this will return a string, which I’d like to split on newlines:

split [read $fh] "\n"

and this is my only chance to catch the return value of this command, so let’s store it in a variable:

set lines [split [read $fh] "\n"]

The problem here is that I imagined this code in the reverse of how I’ve written it. We start with the raw data on the right, and each subsequent action appears to the left. But I read and write code left-to-right. Especially in a readline-style interface, my cursor would have had to be jumping around constantly from left to right, surrounding expressions with square brackets and adding new commands to the left.

Now imagine if I could do something like this (read $_ as “it”, as in Perl):

read $fh | split $_ "\n" | set lines $_

Type that a couple of times. See how much more natural that is? You type it as you think it: “read the file, split it on newlines, store it in a variable.”

In the language I’m designing (more on that soon), I’ve optimized the flow to make it a bit easier on the keyboard. So far I’m thinking something like this:

# line breaks are optional, for readability
read .fh
  | split . \n
  | set lines .

What do you think? What other considerations are there in optimizing a language for interactive use?

– Jeanine

Discuss on HN

Comments
more »

Bash adventures: read a single character, even if it's a newline

tl;dr

getc() {
  IFS= read -r -n1 -d '' "$@"
}

getc ch
echo "$ch" # don't forget to quote it!

the explanation

So it turns out that it’s pretty difficult to read exactly one character from a stream. From the manual for read (usually in man builtins):

-n nchars
       read returns after reading nchars characters rather  than
       waiting  for a complete line of input, but honor a delim‐
       iter if fewer than nchars characters are read before  the
       delimiter.

The naive solution, then would be:

getc() {
  read -n1 "$@"
}

while getc ch; do
  echo "[$ch]";
done <<EOF
one
two
three
four
EOF

The output from this is:

[o]
[n]
[e]
[]
[t]
[w]
[o]
[]
[t]
[h]
[r]
[e]
[e]
[]
[f]
[o]
[u]
[r]
[]

Wait, what happened to the newlines – why are they returning empty characters? Well, it turns out there are two things going on here. Again from the manual:

The characters in the value of the IFS variable are used to split the
line into words.

…and the newline is in the IFS. So let’s zero out IFS:

getc() {
  IFS= read -n1 "$@"
}

unfortunately, this outputs the same thing, but for a different reason. Remember: read -n nchars will still “honor a delimiter if fewer than nchars characters are read before the delimiter”. And the default delimiter (specified with -d delim) is the newline. So the next solution is to use an empty delimiter.

Now our getc looks like:

getc() {
  IFS= read -n1 -d '' "$@"
}

(NOTE: the space between the -d and the '' is not optional. This confused me the first time I was solving the problem, and I had suggested using the EOF character. Not only that, I suggested obtaining an EOF with "$(echo -e '\004')". The empty delimiter is a better way to solve this particular problem, and $'\004' is a better way to get an EOF character.)

Now we get a very different output:

[o]
[n]
[e]
[
]
[t]
[w]
[o]
[
]
[t]
[h]
[r]
[e]
[e]
[
]
[f]
[o]
[u]
[r]
[
]

Our newlines are back! Yay!

There is, however, one more wrinkle, as helpfully pointed out by an anonymous commenter. read also likes to let you use backslashes to escape things. For example, with the current implementation of getc:

while getc ch; do
  echo "[$ch]"
done <<EOF
one\
two
EOF

Yields the following output:

[o]
[n]
[e]
[t]
[w]
[o]
[
]

Note how the backslash after one was interpreted as an escape for the newline that follows. Luckily, read comes with an -r option:

-r     Backslash does not act as an escape character.  The back‐
       slash is considered to be part of the line.  In  particu‐
       lar,  a  backslash-newline pair may not be used as a line
       continuation.

So the final solution is:

getc() {
  IFS= read -r -n1 -d '' "$@"
}

EDIT 7/19/2011: Fixes and nuances to the original solution, as provided by :).

Comments
more »
(no more posts)