What's not clear from reading TFA is whether the compiler monomorphizes TFA's `Compact_array` for the two special cases of it (array of bytes vs. array of anything else), but I'm assuming so. Perhaps if I was familiar with OCaml the answer would be blindingly obvious. What's happening here is that w/ GADTs you can have a _singular_ abstraction with multiple distinct implementations for specific types and others for generic types, and you don't have to think about it too much, except you have to remember to use these type hints in the interface definitions to get the compiler to do what you want.
> Yaron Minsky joined Jane Street back in 2002, and claims the dubious honor of having convinced the firm to start using OCaml.
That's pretty cool. And I guess Stephen Dolan ended up there due to his work on OCaml, which is pretty cool too. (I'd like to meet Stephen some day.)
I share the author's frustration with the lack of non-compiler-related examples of GADT uses. It seems like such a powerful idea, but I haven't been able to get a feel for when to reach for GADTs in Haskell
I often find them handy for locking down admissible states at compile time. Maybe ~10 years ago in a processor design class, I wrote some CPUs in Haskell/Clash for FPGA usage. A nice thing I could do was write a single top-level instruction set, but then lock down the instructions based on what stages of the processor they could exist at.
For example, something like (not an actual example from my code, just conceptually - may be misremembering details):
data Instruction stages where
MovLit :: Word64 -> Register -> Instruction '[Fetch, Decode, Execute, Writeback]
-- MovReg instruction gets rewritten to MovLit in Execute stage
MovReg :: Register -> Register -> Instruction '[Fetch, Decode, Execute]
...
And then my CPU's writeback handler block could be something like:
writeback :: (Writeback `member` stages) => Instruction stages -> WritebackState -> WritebackState
writeback (MovLit v reg) = ...
-- Compiler knows (MovReg _ _) is not required here
So you can use the type parameters to impose constraints on the allowed values, and the compiler is smart enough to use this data during exhaustiveness checks (cf "GADTs Meet Their Match")
Wow, someone else who (used to be) using Clash. I still use it for everything I can in my (hobby) FPGA projects. I'm not sure I've used GADTs, but I've certainly made use of other more "advanced" parts of the type system, like type families.
What you're doing here is pretty cool, I think I will start doing so, too. I have a number of places where I use "undefined" instead. (The "undefined" from the Clash Prelude, which translates into a "don't care" signal state.)
The example here is basically an 8-fold memory saving going from `long[]` from `byte[]` - while still retaining polymorphism (whereas in Java the two are unrelated types).
Hard to say exactly how much performance one would get, as that depends on access patterns.
The reason that a byte array is in reality layed out as a (mostly empty) long array in Java, is actually for performance.
Computers tend to have their memory aligned at 8 byte intervals and accessing such an address is faster than accessing an address that's at an offset of an 8 byte interval.
Of course it depends on your use case, in some cases a compact byte array performs better anyway, for instance because now you're able to fit it in your CPU cache.
> a byte array is in reality layed out as a (mostly empty) long array in Java
Are you saying each byte takes up a word? That is the case in the `char array` in OCaml, but not Java's `byte[]`. AFAIK The size of a byte array is rounded up to words. Byte arrays of length 1-8 all have the same size in a 64-bit machine, then length 7-16 take up one more word.
I know Jane Street love OCaml, but you have to wonder how much it's cost them in velocity and maintenance. This is a quant firm blogging about a programming language they're the most famous user of.
Why do you assume it's a drag for them and not a competitive advantage? I don't know if it's such a terrible thing to use a slightly out of mainstream language, when the standard in the business is to accumulate tens of millions of lines of C++.
There are many things to say about this, but one of those things is that I think you are making the assumption that an (e.g.) C programmer who does not want (or even cannot) get into OCaml would somehow be better for this highly specialized, high-performance, and high-correctness-affine use case, than someone who does. And I'd question that assumption.
Jane Street has been one of the most successful financial firms of the last 10 years or so, going from a niche hedge fund to a big player. Sounds like OCaml has been working out for them. Certainly I know it's helped them hire a lot of excellent programmers.
What's not clear from reading TFA is whether the compiler monomorphizes TFA's `Compact_array` for the two special cases of it (array of bytes vs. array of anything else), but I'm assuming so. Perhaps if I was familiar with OCaml the answer would be blindingly obvious. What's happening here is that w/ GADTs you can have a _singular_ abstraction with multiple distinct implementations for specific types and others for generic types, and you don't have to think about it too much, except you have to remember to use these type hints in the interface definitions to get the compiler to do what you want.
> Yaron Minsky joined Jane Street back in 2002, and claims the dubious honor of having convinced the firm to start using OCaml.
That's pretty cool. And I guess Stephen Dolan ended up there due to his work on OCaml, which is pretty cool too. (I'd like to meet Stephen some day.)
Interesting, thanks for posting.
I share the author's frustration with the lack of non-compiler-related examples of GADT uses. It seems like such a powerful idea, but I haven't been able to get a feel for when to reach for GADTs in Haskell
I often find them handy for locking down admissible states at compile time. Maybe ~10 years ago in a processor design class, I wrote some CPUs in Haskell/Clash for FPGA usage. A nice thing I could do was write a single top-level instruction set, but then lock down the instructions based on what stages of the processor they could exist at.
For example, something like (not an actual example from my code, just conceptually - may be misremembering details):
And then my CPU's writeback handler block could be something like: So you can use the type parameters to impose constraints on the allowed values, and the compiler is smart enough to use this data during exhaustiveness checks (cf "GADTs Meet Their Match")Wow, someone else who (used to be) using Clash. I still use it for everything I can in my (hobby) FPGA projects. I'm not sure I've used GADTs, but I've certainly made use of other more "advanced" parts of the type system, like type families.
What you're doing here is pretty cool, I think I will start doing so, too. I have a number of places where I use "undefined" instead. (The "undefined" from the Clash Prelude, which translates into a "don't care" signal state.)
Does anyone have some hard numbers on the expected performance uplift when using GADTs? Couldn't see any mentioned in the article.
The example here is basically an 8-fold memory saving going from `long[]` from `byte[]` - while still retaining polymorphism (whereas in Java the two are unrelated types).
Hard to say exactly how much performance one would get, as that depends on access patterns.
The reason that a byte array is in reality layed out as a (mostly empty) long array in Java, is actually for performance. Computers tend to have their memory aligned at 8 byte intervals and accessing such an address is faster than accessing an address that's at an offset of an 8 byte interval.
Of course it depends on your use case, in some cases a compact byte array performs better anyway, for instance because now you're able to fit it in your CPU cache.
> a byte array is in reality layed out as a (mostly empty) long array in Java
Are you saying each byte takes up a word? That is the case in the `char array` in OCaml, but not Java's `byte[]`. AFAIK The size of a byte array is rounded up to words. Byte arrays of length 1-8 all have the same size in a 64-bit machine, then length 7-16 take up one more word.
https://shipilev.net/jvm/objects-inside-out/
But you can load any byte by loading 8 bytes and shift (v cheap)
Related: https://github.com/ocaml/RFCs/blob/881b220adc1f358ab15f7743d...
I know Jane Street love OCaml, but you have to wonder how much it's cost them in velocity and maintenance. This is a quant firm blogging about a programming language they're the most famous user of.
It is thanks to the companies like Jane Street that believe there is something else beyond C, that we can have nice toys.
Remember if OCaml wasn't a mature programming language, maybe Rust would not have happened in first place.
It's easier to reason in FP plus the python paradox [1] [2].
[1]: https://www.paulgraham.com/pypar.html
[2]: https://blog.janestreet.com/why-ocaml/
Why do you assume it's a drag for them and not a competitive advantage? I don't know if it's such a terrible thing to use a slightly out of mainstream language, when the standard in the business is to accumulate tens of millions of lines of C++.
There are many things to say about this, but one of those things is that I think you are making the assumption that an (e.g.) C programmer who does not want (or even cannot) get into OCaml would somehow be better for this highly specialized, high-performance, and high-correctness-affine use case, than someone who does. And I'd question that assumption.
Jane Street has been one of the most successful financial firms of the last 10 years or so, going from a niche hedge fund to a big player. Sounds like OCaml has been working out for them. Certainly I know it's helped them hire a lot of excellent programmers.
Concretely how do you think it’s holding them back? Just by being niche?