Archive for the ‘golang’ Category

rjson – readable JSON

September 24, 2012

[Edited: after some feedback, I have renamed this project to rjson (it's really not at all Go-specific) and changed the specification so that unquoted strings are accepted only as object keys]

JSON is a fine encoding. It has a very simple data model; it’s easy to understand and to write parsers for.  But personally, I find it a bit awkward to read and to edit. All those quotes add noise, and I’m always forgetting to remove the final comma at the end of an array or an object. I’m not the only one either. I’ve chatted to people about why they are using YAML, and the reason is usually not because of the zillion features that YAML offers, but because the format is more human-friendly to edit.

A while ago I had an idea of how I might help this, and yesterday I had a free day to do it. I forked the Go json package to make rjson. It uses an idea taken from the Go syntax rules to make commas optional, and quotes become optional around object keys that look like identifiers. The rjson syntax rules are summarised in the rjson package documentation.

To make it easy to experiment with and use this format, I created a rjson command that can read and write both formats.

Here is a transcript showing how the command can be used:

% cat config.json
{
  "ui" : {
    "global" : {
      "ui-type" : "default",
      "show-indicator" : true
    }
  }
}% rjson < config.json
{
	ui: {
		global: {
			show-indicator: true
			ui-type: "default"
		}
	}
}
% rjson -indent '' < config.json
{ui:{global:{show-indicator:true,ui-type:"default"}}}
% rjson -indent '' -j < config.json
{"ui":{"global":{"show-indicator":true,"ui-type":"default"}}}
%

You might notice that the compact version of the rjson format is smaller than the equivalent JSON. On a random selection of JSON files (all that I could find in my home directory), I measured that they were about 7% smaller on average when encoded with gson -indent ”. This was a nice bonus that I had not considered.

To use rjson, you’ll need a working Go installation; then you can fetch and install the command into your go tree thus:

go get launchpad.net/rjson/cmd/rjson

Enjoy!

Logging in Reverse

March 31, 2011

A recent post on the Go mailing list asked how to use the Go logging package to log messages in reverse order, so that they could be displayed most-recent-first in a console window.

The package gives no obvious way to do this, but the flexibility of Go interfaces makes it almost trivial to do. It took about 10 minutes to write the following code. The log package can use an arbitrary io.Writer for output so I defined a type, ReverseBuffer, with a Write method that stores all the data in a reverse-ordered linked list and a Reader method that returns an io.Reader that can be used to read the data back.

It would not take much longer to implement a size or message count limit on the list if desired.

I like the way that the language features worked together to make the design almost inevitable once I had thought of it. Once you have an implementation of an interface, everything else just works.

You can run it in the Go Playground to see it working.

package main

import (
        "io"
        "os"
        "log"
)

type msg struct {
        data []byte
        next *msg
}

// ReverseBuffer is an implementation of io.Writer that stores data such
// that it can later be read with the data from all writes reversed.
type ReverseBuffer struct {
        msgs *msg
}

type reverseReader struct {
        data []byte
        msgs *msg
}

func (w *ReverseBuffer) Write(data []byte) (int, os.Error) {
        if len(data) == 0 {
                return 0, nil
        }
        w.msgs = &msg{append([]byte(nil), data...), w.msgs}
        return len(data), nil
}

// Reader returns a new io.Reader that can be used to read all the data
// written to w.  The data from more recent writes is returned first.
func (w *ReverseBuffer) Reader() io.Reader {
        return &reverseReader{nil, w.msgs}
}

// Read implements io.Reader.Read.
func (r *reverseReader) Read(data []byte) (int, os.Error) {
        if len(r.data) == 0 {
                if r.msgs == nil {
                        return 0, os.EOF
                }
                r.data = r.msgs.data
                r.msgs = r.msgs.next
        }
        n := copy(data, r.data)
        r.data = r.data[n:]
        return n, nil
}

func main() {
        w := new(ReverseBuffer)
        out := log.New(w, "", log.Ldate|log.Ltime)
        out.Printf("one")
        out.Printf("two")
        out.Printf("three")
        io.Copy(os.Stdout, w.Reader())
}

Unlimited Buffering with Low Overhead

February 10, 2010

In Go, channels have a fixed length buffer. Sometimes it is useful to add a buffer of unlimited length to a channel (here is an example). The first question is what the interface should look like. I can think of three immediate possibilities (assume T is an arbitrary type – if Go had generics, this would be a generic function):

Given a channel, make sure that no writes to that channel will
block, and return a channel from which the buffered values can be read:

func Buffer(in <-chan T) <-chan T

Given a channel, return a channel that will buffer writes
to that channel:

func Buffer(out chan<- T) chan <-T

Given two channels, connect them via a buffering process:

func Buffer(in <-chan T, out chan<- T)

Of these possibilities, on balance I think I prefer the second, as no operations will be performed on the original channel except when a value is written on the returned channel.

I’d be interested in hearing arguments for or against the other possibilities.

Here is one simple, and relatively slow implementation. It uses the doubly-linked list implementation from the Go library. I timed it at 2076ns per item transferred on my machine. Note the code that runs before the select statement each time through the loop, which works out whether we want to be sending a value, and when it is time to finish. This relies on the fact that in a Go select statement, operations on nil channels are ignored.

import "container/list"
func BufferList(out chan<- T) chan<- T {
	in := make(chan T, 100)
	go func() {
		var buf = list.New()
		for {
			outc := out
			var v T
			n := buf.Len()
			if n == 0 {
				// buffer empty: don't try to send on output
				if in == nil {
					close(out)
					return
				}
				outc = nil
			}else{
				v = buf.Front().Value.(T)
			}
			select {
			case e := <-in:
				if closed(in) {
					in = nil
				} else {
					buf.PushBack(e)
				}
			case outc <- v:
				buf.Remove(buf.Front())
			}
		}
	}()
	return in
}

The above implementation allocates a new linked list item for every value transferred. Here’s an alternative implementation that uses an array as a circular buffer, amortising allocations over time by doubling the size of the buffer when it overflows, and shrinking it when there is too much space. Although the basic structure is similar, the code is more complex, and the time saving is modest – I timed it at 1729ns per item transferred, an improvement of 17%. Removing the code to shrink the buffer does not make it significantly faster.

func BufferRingOrig(out chan<- T) chan<- T {
	in := make(chan T, 100)
	go func() {
		var zero T
		var buf = make([]T, 10)
		var i = 0 // location of first value in buffer.
		var n = 0 // number of items in buffer.
		for {
			outc := out
			switch {
			case n == 0:
				// buffer empty: don't try to send on output
				if in == nil {
					close(out)
					return
				}
				outc = nil

			case n == len(buf):
				// buffer full: expand it
				b := make([]T, n*2)
				copy(b, buf[i:])
				copy(b[n-i:], buf[0:i])
				i = 0
				buf = b

			case len(buf) > 128 && n*3 < len(buf):
				// buffer too big, shrink it
				b := make([]T, len(buf) / 2)
				j := i + n
				if j > len(buf) {
					// wrap around
					k := j - len(buf)
					j = len(buf)
					copy(b, buf[i:j])
					copy(b[j - i:], buf[0:k])
				}else{
					// contiguous
					copy(b, buf[i:j])
				}
				i = 0
				buf = b
			}
			select {
			case e := <-in:
				if closed(in) {
					in = nil
				} else {
					j := i + n
					if j >= len(buf) {
						j -= len(buf)
					}
					buf[j] = e
					n++
				}
			case outc <- buf[i]:
				buf[i] = zero
				if i++; i == len(buf) {
					i = 0
				}
				n--
			}
		}
	}()
	return in
}

I wondered if the unnecessary tests before the select statement were making any significant difference to the time taken. Although it makes it easy to preserve the invariants, there is no need to test whether the buffer is empty when a value has just been placed in it, for example.

Here is a version that only does the tests when necessary. Interestingly, this change actually made the code run marginally slower (1704ns per item)

func BufferRing(out chan<- T) chan<- T {
	in := make(chan T, 100)
	go func() {
		var zero T
		var buf = make([]T, 10)
		var i = 0 // location of first value in buffer.
		var n = 0 // number of items in buffer.
		var outc chan<- T
		for {
			select {
			case e := <-in:
				if closed(in) {
					in = nil
					if n == 0 {
						close(out)
						return
					}
				} else {
					j := i + n
					if j >= len(buf) {
						j -= len(buf)
					}
					buf[j] = e
					n++
					if n == len(buf) {
						// buffer full: expand it
						b := make([]T, n*2)
						copy(b, buf[i:])
						copy(b[n-i:], buf[0:i])
						i = 0
						buf = b
					}
					outc = out
				}
			case outc <- buf[i]:
				buf[i] = zero
				if i++; i == len(buf) {
					i = 0
				}
				n--
				if n == 0 {
					// buffer empty: don't try to send on output
					if in == nil {
						close(out)
						return
					}
					outc = nil
				}
				if len(buf) > 128 && n*3 < len(buf) {
					// buffer too big, shrink it
					b := make([]T, len(buf) / 2)
					j := i + n
					if j > len(buf) {
						// wrap around
						k := j - len(buf)
						j = len(buf)
						copy(b, buf[i:j])
						copy(b[j - i:], buf[0:k])
					}else{
						// contiguous
						copy(b, buf[i:j])
					}
					i = 0
					buf = b
				}
			}
		}
	}()
	return in
}

Although the speed improvement from the above piece of code was disappointing, the change paves the way for a change that really does make a difference. A select statement in Go is significantly more costly than a regular channel operation. In the code below, we loop receiving or sending values as long as we can do so without blocking. Here’s a version of the list-based code that does this. I measured it at 752ns per item, an improvement of 63% over the original, or 2.7x faster.

func BufferListCont(out chan<- T) chan<- T {
	in := make(chan T, 100)
	go func() {
		var buf = list.New()
		var outc chan<- T
		var v T
		for {
			select {
			case e := <-in:
				if buf.Len() == 0 && !closed(in) {
					outc = out
					v = e
				}
				for {
					if closed(in) {
						in = nil
						if buf.Len() == 0 {
							close(out)
							return
						}
						break
					}
					buf.PushBack(e)
					var ok bool
					if e, ok = <-in; !ok {
						break
					}
				}
	
			case outc <- v:
				for {
					buf.Remove(buf.Front())
					if buf.Len() == 0 {
						// buffer empty: don't try to send on output
						if in == nil {
							close(out)
							return
						}
						outc = nil
						break
					}
					v = buf.Front().Value.(T)
					if ok := outc <- v; !ok {
						break
					}
				}
			}
		}
	}()
	return in
}

One objection to the above code is that in theory if there was a fast enough producer on another processor, the buffer process could spend forever feeding values into the buffer, without ever trying to write them out. Although I believe that in practice the risk is negligible, it’s easy to guard against anyway, by only adding a fixed maximum number of values before returning to the select statement.

Here’s my final implementation, using the looping technique and with the guard added in.

I timed it at 427ns per item transferred, an improvement of 79% over the original version, or almost 5x faster. Using a buffered channel directly is only 2.4x faster than this.

func BufferRingContCheck(out chan<- T) chan<- T {
	in := make(chan T, 100)
	go func() {
		var zero T
		var buf = make([]T, 10)
		var i = 0 // location of first value in buffer.
		var n = 0 // number of items in buffer.
		var outc chan<- T
		for {
			select {
			case e := <-in:
				for added := 0; added < 1000; added++ {
					if closed(in) {
						in = nil
						if n == 0 {
							close(out)
							return
						}
						break
					}
					j := i + n
					if j >= len(buf) {
						j -= len(buf)
					}
					buf[j] = e
					n++
					outc = out		// enable output
					if n == len(buf) {
						// buffer full: expand it
						b := make([]T, n*2)
						copy(b, buf[i:])
						copy(b[n-i:], buf[0:i])
						i = 0
						buf = b
					}
					var ok bool
					if e, ok = <-in; !ok {
						break
					}
				}
			case outc <- buf[i]:
				for {
					buf[i] = zero
					if i++; i == len(buf) {
						i = 0
					}
					n--
					if n == 0 {
						// buffer empty: don't try to send on output
						if in == nil {
							close(out)
							return
						}
						outc = nil
						break
					}
					if len(buf) > 128 && n*3 < len(buf) {
						// buffer too big, shrink it
						b := make([]T, len(buf) / 2)
						j := i + n
						if j > len(buf) {
							// wrap around
							k := j - len(buf)
							j = len(buf)
							copy(b, buf[i:j])
							copy(b[j - i:], buf[0:k])
						}else{
							// contiguous
							copy(b, buf[i:j])
						}
						i = 0
						buf = b
					}
					if ok := outc <- buf[i]; !ok {
						break
					}
				}
			}
		}
	}()
	return in
}

Obviously the final code is significantly bigger and more complex than the original. Which implementation should we choose? Lacking generics, this code cannot usefully be put into a library, as most channels are not of type chan interface{}.

Given this, in most instances, perhaps the first version is to be preferred, as it’s smaller to cut and paste, and easier to understand. In cases where performance is crucial, the final version can easily be substituted.

Perhaps there’s another faster technique that I haven’t found yet. I’d be interested to hear any ideas on the matter.

The code with all the tests and benchmarks can be found here.

Select functions for Go

January 4, 2010

In this thread, Rowan Davies wrote:

There’s no STM implementation for Go, as far as I know. STM doesn’t fit so well with message passing channels, which Go adopts as another quite good alternative to locks. When used well they can avoid the compositionality issues with locks, and have less of a performance cost compared to STM – but STM arguably scales better to complicated situations in terms of the ease of avoiding deadlocks and the like.

STM isn’t as expressive as channels – you can’t build channels within STM (although you can layer them on top of it, which then loses some of the compositionality benefits)

Channels have their own compositionality issues – if you want to be able to alt on something, you need to have access to the raw channel, but the interface might require other things to happen after the channel operation – these must be exposed by the interface, which is undesirable.

For example, in some example code I wrote in a previous post, there’s some code to read a value:

func (r *Receiver) Read() interface{} {
	b := <-r.C
	v := b.v
	r.C <- b
	r.C = b.c
	return v
}

It would be nice if we could have this Read as part of a select statement. Currently, the only way to do this is to make the channel publicly readable, and have a function that the user must remember to call with the value read from the channel:

func (r *Receiver) DoneRead(b broadcast) interface{} {
	v := b.v
	r.C <- b
	r.C = b.c
	return v
}

select {
case b := <-r.C:
    v := r.DoneRead(b)
    ...
....
}

This is error-prone – and more importantly, it breaks encapsulation by exposing the internal-only “broadcast” type.

For a nice way around these problems, I’ve been thinking that something like the select functions provided by the XC language might work well in Go.

A select function would be similar to a normal function except that its top level contains arms of a select statement:

selectfunc read(r *Receiver) interface{} {
    	case b := <-r.C:
	    v := b.v
	    r.C <- b
	    r.C = b.c
	    return v
}

Then you could do:

select {
case v := read(&r):
    .... do something with v
}

i.e. a select function can be used in place of a channel operation in any select arm. Using the select function in an expression would just block as normal.

Select functions seem enough like normal functions that one might ask whether they could be methods too. Given the current syntax. which doesn’t use the func keyword inside interface declarations, I’d say no. And they’re not strictly speaking necessary either. Something not too far from the original interface can be obtained by returning a selectfunc as a closure:

func (r *Receiver) Reader() (selectfunc () interface{}) {
 	return selectfunc() interface{} {
    		case b := <-r.C:
			v := b.v
			r.C <- b
			r.C = b.c
			return v
	}
}

Then you’d do:

select {
case v := r.Reader()() {
	...
}

Not entirely ideal, but quite feasible.

I think there are quite a few benefits to be gained from implementing something like this. And the implementation should be reasonably straightforward – the main implication, as far as I can see, is that the number of arms in an alt statement would not always be statically determinable.

What do people think?

Concurrent Idioms #1: Broadcasting values in Go with linked channels.

December 1, 2009

Channels in Go are powerful things, but it’s not always obvious how to get them to accomplish certain tasks. One of those tasks is one-to-many communication. Channels work very well if lots of writers are funneling values to a single reader, but it’s not immediately clear how multiple readers can all wait for values from a single writer.

Here’s what a Go API for doing this might look like:

type Broadcaster ...

func NewBroadcaster() Broadcaster
func (b Broadcaster) Write(v interface{})
func (b Broadcaster) Listen() chan interface{}

The broadcast channel is created with NewBroadcaster, and values can be written to it with Write. To listen on the channel, we call Listen, which gives us a new channel from which we can receive the values written.

The solution that immediately comes to mind (which I’ve used in the past) is to have an intermediate process that keeps a registry of all the reading processes. Each call to Listen adds a new channel to the registry, and the central process’s main loop might look something like this:

for {
        select {
        case v := <-inc:
                for _, c := range(listeners) {
                        c <- v
                }
        case c := <-registryc:
                listeners.push(c);
        }
}

This is the conventional way of doing things. It’s also probably the most sensible method. The process writing values will block until all the readers have read the previous value. An alternative might be to maintain an output buffer for each reader, which could either grow as necessary, or we could discard values when the buffer has filled up.

But this post isn’t about the sensible way of doing things. This post is about an implementation where the writer never blocks; a slow reader with a fast writer can fill all of memory if it goes on for long enough. And it’s not particularly efficient.

But I don’t care much, because I think this is cool. And I might find a genuine use for it some day.

Here’s the heart of it:

type broadcast struct {
	c	chan broadcast;
	v	interface{};
}

This is what I call a “linked channel” (analagously to a linked list). But even more than a linked list, it’s an Ouroboros data structure. That is, an instance of the structure can be sent down the channel that is inside itself.

Or the other way around. If I have a value of type chan broadcast around, then I can read a broadcast value b from it, giving me the arbitrary value b.v, and another value of the original type, b.c, allowing me to repeat the process.

The other part of the puzzle comes from the way that a buffered channel can used as a one-use one-to-many broadcast object. If I’ve got a buffered channel of some type T:

var c = make(chan T, 1)

then any process reading from it will block until a value is written.
When we want to broadcast a value, we simply write it to the channel. This value will only go to a single reader, however we observe the convention that if you read a value from the channel, you always put it back immediately.

func wait(c chan T) T {
	v := <-c
	c <- v;
	return v;
}

Putting the two pieces together, we can see that if the channel inside the broadcast struct is buffered in the above way, then we can get an endless stream of one-use one-to-many broadcast channels, each with an associated value.

Here’s the code:

package broadcast

type broadcast struct {
	c	chan broadcast;
	v	interface{};
}

type Broadcaster struct {
	// private fields:
	Listenc	chan chan (chan broadcast);
	Sendc	chan<- interface{};
}

type Receiver struct {
	// private fields:
	C chan broadcast;
}

// create a new broadcaster object.
func NewBroadcaster() Broadcaster {
	listenc := make(chan (chan (chan broadcast)));
	sendc := make(chan interface{});
	go func() {
		currc := make(chan broadcast, 1);
		for {
			select {
			case v := <-sendc:
				if v == nil {
					currc <- broadcast{};
					return;
				}
				c := make(chan broadcast, 1);
				b := broadcast{c: c, v: v};
				currc <- b;
				currc = c;
			case r := <-listenc:
				r <- currc
			}
		}
	}();
	return Broadcaster{
		Listenc: listenc,
		Sendc: sendc,
	};
}

// start listening to the broadcasts.
func (b Broadcaster) Listen() Receiver {
	c := make(chan chan broadcast, 0);
	b.Listenc <- c;
	return Receiver{<-c};
}

// broadcast a value to all listeners.
func (b Broadcaster) Write(v interface{})	{ b.Sendc <- v }

// read a value that has been broadcast,
// waiting until one is available if necessary.
func (r *Receiver) Read() interface{} {
	b := <-r.C;
	v := b.v;
	r.C <- b;
	r.C = b.c;
	return v;
}

This implementastion has the nice property that there’s no longer any need for a central registry of listeners. A Receiver value encapsulates a place in the stream of values and can be copied at will – each copy will receive an identical copy of the stream. There’s no need for an Unregister function either. Of course, if the readers don’t keep up with the writers, memory can be used indefinitely, but… isn’t this quite neat?

Here’s some example code using it:

package main

import (
	"fmt";
	"broadcast";
	"time";
)

var b = broadcast.NewBroadcaster();

func listen(r broadcast.Receiver) {
	for v := r.Read(); v != nil; v = r.Read() {
		go listen(r);
		fmt.Println(v);
	}
}

func main() {
	r := b.Listen();
	go listen(r);
	for i := 0; i  < 10; i++ {
		b.Write(i);
	}
	b.Write(nil);

	time.Sleep(3 * 1e9);
}

Exercise for the reader: without running the code, how many lines will this code print?


Follow

Get every new post delivered to your Inbox.