July 24th, 2016

Garbage Collector in Go: will the one 'knob' to rule them all Leak in terms of Abstractions?

From some yesterday posts I got from {1 and 2} to:

  1. LLVM compiler infrastructure to:

  2. Rust language to a comment saying that Go had a GC: "Rust does not use an automated garbage collection system like those used by Go, Java or .NET." and finally to this article:

  3. Go GC: Prioritizing low latency and simplicity;

Which is from 31-Aug-2015 (and will do a follow up of more recent info later on) and says interesting things like:

To create a garbage collector for the next decade, we turned to an algorithm from decades ago. Go's new garbage collector is a concurrent, tri-color, mark-sweep collector, an idea first proposed by Dijkstra in 1978. This is a deliberate divergence from most "enterprise" grade garbage collectors of today ...

And yes, they talk about JVMs GC 'under the lines' several times, which is funny to read (not in a disrespectful way, just trivially funny to see them avoiding to mention the JVM).

They mention this GC which will do as much as possible concurrently (which is a strong characteristic of Golang) and Richard Hudson gives us a nice summarized explanation of the tri-color collector:

In a tri-color collector, every object is either white, grey, or black and we view the heap as a graph of connected objects. At the start of a GC cycle all objects are white. The GC visits all roots, which are objects directly accessible by the application such as globals and things on the stack, and colors these grey. The GC then chooses a grey object, blackens it, and then scans it for pointers to other objects. When this scan finds a pointer to a white object, it turns that object grey. This process repeats until there are no more grey objects. At this point, white objects are known to be unreachable and can be reused.

The "Devil is in the details" section that follows is very honest and elucidating too, but they are explicit running away from giving the users all kinds of configurations to play with the GC strategy of even to change it, which the Oracle's JVMs offers in a very complex|powerful way, instead they want their GC to able to handle things intelligently for the user with only one single configuration or 'knob':

At a higher level, one approach to solving performance problems is to add GC knobs, one for each performance issue. The programmer can then turn the knobs in search of appropriate settings for their application. The downside is that after a decade with one or two new knobs each year you end up with the GC Knobs Turner Employment Act. Go is not going down that path. Instead we provide a single knob, called GOGC. This value controls the total size of the heap relative to the size of reachable objects. The default value of 100 means that total heap size is now 100% bigger than (i.e., twice) the size of the reachable objects after the last collection. 200 means total heap size is 200% bigger than (i.e., three times) the size of the reachable objects. If you want to lower the total time spent in GC, increase GOGC. If you want to trade more GC time for less memory, lower GOGC.

Which is to say the only thing the user can choose is the ratio between the heap size and the number of objects in use (not elligible to be collected), which is also a very common type of GC 'knob' presented (for a long time...) by the Java HotSpot VM Options related to its GC:

  • InitialSurvivorRatio: Sets the initial survivor space ratio used by the throughput garbage collector ...

  • MaxHeapFreeRatio: Sets the maximum allowed percentage of free heap space (0 to 100) after a GC event.

  • MinHeapFreeRatio: Sets the minimum allowed percentage of free heap space (0 to 100) after a GC event.

  • NewRatio: Sets the ratio between young and old generation sizes.

  • SurvivorRatio: Sets the ratio between eden space size and survivor space size.

Not to mention the long avaiable Parallel GC option '+UseParallelGC' that as one could guess "Enables the use of the parallel scavenge garbage collector" and also +UseParallelOldGC, +UseParNewGC, etc.

Credit where it is due Golang designers: even going back to Dijkstra in 1978 there is still some Java inspiration in there, but you just come to cite (not giving names) the things of the Hotspot VM you consider bad and seeks to avoid... not fair. That said, the Go lang goal to prevent the "GC Knobs Turner Employment Act" is noble but we will always have to prepare ourselves to The Law of Leaky Abstractions as the wise Joel remembered us back in 2002.

Doing some compact code in Java 8

My good friend Rafael Naufal has posted a very interesting challenge which consists in comparing the expressiveness of different languages to a very single problem involving object containers, and as much as I can see that Java is the more verbose between his examples I could provide a more expressive version of the Java 8 solution:

   1 Map<String, List<String>> mapOfLists = new HashMap<>();
   2 mapOfLists.put("a", Arrays.asList("1", "2", "3"));
   3 mapOfLists.put("b", Arrays.asList("4", "5", "6"));
   4 mapOfLists.put("c", Arrays.asList("7"));
   6 class Pair {
   7     String key, val;
   9     public Pair(String key, String val) {
  10         this.key = key;
  11         this.val = val;
  12     }
  14     public String toString() {
  15         return String.format("[%s, %s]", key, val);
  16     }
  17 }
  19 List<Pair> pairs = new LinkedList<>();
  20 mapOfLists.forEach((k, list) -> list.forEach(e -> pairs.add(new Pair(k, e))));
  21 System.out.println(pairs)

The output is:

[[a, 1], [a, 2], [a, 3], [b, 4], [b, 5], [b, 6], [c, 7]]

The key issues are:

  1. Other languages can do the creation of the Map in pretty much a single line, Java could would be this 4 lines yes;

  2. In the absence of Tuples we must create a 'Pair' class (getting 10 more lines), although I consider the toString significantly improved;

Other than that the actual transformation (line 20 above) is just one line like in the other languages, which says a lot about Java, being statically typed and all (the comparison used Ruby and Groovy).

I also did not use the Java 8 enhanced APIs (particulary the Stream package), but it was also not necessary, but once I study it I can provide another code, being less compact or not.

I'll try to provide an example in both Kotlin and Scala in the future (I think both provide one-liners for class declarations and/or tuples).