Compiler ยท Jun 22, 2020

Make generated JavaScript Inline Caching friendly using types in BuckleScript version 8

Highlights of our newest changes to the internal representation and how they will benefit our users.
Hongbo Zhang
Compiler Team

An overview of new data representation in BuckleScript

In the next version of BuckleScript, we will make several major changes to tweak the data representation for various data types, making them more idiomatic and debugger friendly.

Note: since V8 or other JavaScript engines are tweaked to make idiomatic JS code run fast, these changes also results in faster running code.

Another property, for a compiled language like BuckleScript, is that we can reason about IC friendliness by just looking at the type definitions locally; this is very helpful for advanced users to write performance predictable JS code.

This is a nice introduction to inline caching if you are unfamiliar with this topic, or you can skip the section about IC and come back later when you get more familiar.

Note: this article is quite dense, so we will skip the old encoding.

Record (stable)

BuckleScript has compiled records to idiomatic JS objects since version 7. This is great for performance and debugging. We also support label renaming to shorten field names to save space.

Take the code below for example:

RE
type int64 = { loBits : int [@bs.as "lo"], hiBits : int [@bs.as "hi] } let value = {hiBits : 33 , loBits : 32 }; let rand = ({loBits; hiBits}) => loBits + hiBits;

It will generate JS output as below:

JS
var value = {lo : 32, hi : 33} function rand (param){ return param.lo + param.hi }

If users want to make it even shorter, in next version, they can choose to compile record as an array, as follows:

RE
type int64 = { [@bs.as "0"] loBits : int, [@bs.as "1"] hiBits : int } let value = {hiBits : 33 , loBits : 32 }; let rand = ({loBits, hiBits}) => loBits + hiBits;

This produces the following JS output:

JS
var value = [32,33] function rand(param){ return param[0] + param[1] }

The label renaming techniques can be applied systematically using a syntactic macro; in the future we may provide an advanced mode to apply it automatically. Another nice property is that only the type definition needs to be adapted; other parts of code remain untouched.

IC friendliness

Records are always in a perfect position for Inline Caching(IC); the compiler can ensure all generated records are of the same shape.

Variant (internal)

This encoding for variants may be subject to change in the future, but it is so simple that it makes sense for users to have a basic understanding.

Take the following type definition, for example:

RE
type t = | Black(t, int, t) | Red(t, int, t) | Empty; let empty = Empty ; let v0 = Black (empty, 3, empty); let v1 = Red (empty, 3, empty);

The generated JS code would be:

JS
var empty =/*Empty*/ 0; var v0 = {TAG : 0/*Black*/, _0 : /*Empty*/ 0 , _1 : 3 , _2 : /*Empty */ 0}; var v1 = {TAG : 1/*Red*/, _0 : /*Empty*/ 0 , _1 : 3 , _2 : /*Empty */ 0};

As you can see, variants are divided into two categories.
Variants which do not have a payload are compiled into a number starting from 0, while variants which have a payload are compiled into an object which has the first slot named TAG and the following slots named as _0, _1 ..

variant with inline records

Users can give names to the payload, and the compiler respects those names. However, we don't support user-level renaming, i.e, using bs.as, at this time.

RE
type t = | Black ({l:t, value: int, r: t}) | Red({l:t, value: int, r: t}) | Empty; let empty = Empty ; let v0 = Black ({l:empty, value: 3, r:empty}); let v1 = Red ({l:empty, value:3, r: empty});

The generated JS code would be:

JS
var empty =/*Empty*/ 0; var v0 = {TAG : 0/*Black*/, l: /*Empty*/ 0 , value : 3 , r: : /*Empty */ 0}; var v1 = {TAG : 1/*Red*/, l : /*Empty*/ 0 , value : 3 , r : /*Empty */ 0};

Special case when the number of variants which has payload is only 1.

Take the types below for example:

RE
type list = | Nil | Cons (int , list);

Since only one variant has a payload, the compiler does not need add TAG when we destructure the data for pattern matching. Thus, the following code:

RE
let u = Cons(1,Nil)

will generate the following JS output:

JS
var u = {_0: 1, _1 : /*Nil*/ 0 }; // No TAG data.

Specialized for immutable list

The list type is a built-in type; its type definition is similar to this :

RE
type t ('a) = | [] | (::) ('a , t ('a))

Without any customization, this would generate JS objects with indexes like _0, _1, etc. Since lists are so pervasive, we provide some special treatment so that

RE
let u = [0,1,2,3]

Will generate js code as below:

JS
var u = {hd : 0, {tl : {hd : 1, {tl : {hd: 2, {tl : {hd :3 , tl : /*[]*/0 }}}}}}}

This is a minor change; we changed the name of _0 to hd and _1 to tl.

IC friendliness

Types that have only one variant with a payload are in a perfect position for IC.

The number of variants which does not carry payload will not affect IC, since the pattern match will do a split first.

Types with variants that have the same number of payloads, such as the red-black-tree example above, are also in a perfect position for IC.

For other cases, it will hit a polymorphic IC in the V8 jit compiler, this is not the fastest running case.

Note that you can always tweak the variant layout to make it IC friendly. For example, you can always introduce one level of indirection to make all variants share the same number of payloads:

RE
type t = | A0 (a0) // 1 payload | A1 (a1) // 1 payload | A2 (a2) // 1 payload | C0 | C1 | C2 // This will not affect IC

Variant in debug mode

Note we only generate constructor names in comments for debugging.
When constructor names are attached to the data, it will be more useful for debugging. When debug mode is activated using -bs-g,

The generated code will be changed from below

JS
var v0 = {TAG : 0/*Black*/, _0 : /*Empty*/ 0 , _1 : 3 , _2 : /*Empty */ 0};

to

JS
var v0 = {TAG : 0, _0 : /*Empty*/ 0 , _1 : 3 , _2 : /*Empty */ 0, [Symbol.for("name")]: "Black"};

Polymorphic-variant (internal)

Polymorphic variants allow users to use the types without declaring them first:

RE
let u = 3 -> `hello

It will generate

JS
var u = {HASH : MAGIC_NUMBER, VAL: 3 }

The field of HASH is the hash of name "hello", while the VAL is the payload

IC friendliness

Polymorphic variants are always in a perfect position for IC, the compiler can ensure all generated objects are of the same shape. This is because the payload is not unpacked; it is always just one payload.

Polymorphic variant in debug mode

In debug mode, similar to variant, we carry the name in generated code for debugging,

So instead of

JS
var u = {HASH : MAGIC_NUMBER, VAL: 3 }

It will generate

JS
var u = {HASH : MAGIC_NUMBER, VAL: 3 , [Symbol.for("name")] : "hello"}

Conclusion

BuckleScript users will profit from better runtime performance and a better debugging experience for various data types, such as variants, exceptions, lazy values and more since version 8.0.

Happy Hacking!

Want to read more?
Back to Overview