Discussion:
golang scheduler and audio/media
(too old to reply)
Scott Cotton
2018-09-09 10:07:31 UTC
Permalink
Hi all,

I wanted to bring your attention to this discussion
<https://lists.columbia.edu/pipermail/portaudio/2018-September/001508.html> on
the port audio mailing list regarding the role of OS real time priority
threads and go scheduling.

It is somewhat related also to this golang wiki page
<https://github.com/golang/go/wiki/LockOSThread> about game libraries.

And this issue <https://github.com/golang/go/issues/21827> about scheduling
performance using LockOsThread.

Given these things, I am curious: are there any thoughts about the
possibility of treating threads with real time priority differently? Am I
perhaps wrong to say that using cgo to call back to go on a "foreign"
thread has some problems or overhead associated with it when that thread
has some special OS scheduling properties, like real-time priority?

Any thoughts appreciated,

Best,
Scott
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ian Lance Taylor
2018-09-09 12:38:45 UTC
Permalink
I wanted to bring your attention to this discussion on the port audio
mailing list regarding the role of OS real time priority threads and go
scheduling.
It is somewhat related also to this golang wiki page about game libraries.
And this issue about scheduling performance using LockOsThread.
Given these things, I am curious: are there any thoughts about the
possibility of treating threads with real time priority differently? Am I
perhaps wrong to say that using cgo to call back to go on a "foreign" thread
has some problems or overhead associated with it when that thread has some
special OS scheduling properties, like real-time priority?
I'm not aware of any proposals in this area.

Using cgo to call back to Go does have some overhead currently. Some
of that could be eliminated without too much work, but some would
remain. As far as I know these problems are independent of whether
the thread has special OS scheduling properties.

Ian
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-09 13:33:05 UTC
Permalink
I would like to chime in here an state that any HPC of HFT needs “thread” pinning for optimum performance. The simplicity of goroutines are great, but in practice they are lacking several critical features. (yes, I’ve read the FAQ, and disagree).

They need names - and the runtime can append .N to avoid name collision. This is solely for debugger and logging - it is VERY difficult to debug/postmortem complex, highly concurrent Go applications.

You should be able to group goroutines to share the same thread, and you need to be able to pin threads to cores.

I realize this flies in the face of “cloud computing” because you usually don’t (and can’t) worry about these details, but to be a great systems language, Go needs these features.

while we’re at it, it needs volatile variables with ‘happens before’ relationships like Java. It makes code sharing much easier - and I know… don’t share data - just not feasible for really high performance systems.
Post by Ian Lance Taylor
I wanted to bring your attention to this discussion on the port audio
mailing list regarding the role of OS real time priority threads and go
scheduling.
It is somewhat related also to this golang wiki page about game libraries.
And this issue about scheduling performance using LockOsThread.
Given these things, I am curious: are there any thoughts about the
possibility of treating threads with real time priority differently? Am I
perhaps wrong to say that using cgo to call back to go on a "foreign" thread
has some problems or overhead associated with it when that thread has some
special OS scheduling properties, like real-time priority?
I'm not aware of any proposals in this area.
Using cgo to call back to Go does have some overhead currently. Some
of that could be eliminated without too much work, but some would
remain. As far as I know these problems are independent of whether
the thread has special OS scheduling properties.
Ian
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ian Lance Taylor
2018-09-09 13:57:47 UTC
Permalink
Post by robert engels
They need names - and the runtime can append .N to avoid name collision. This is solely for debugger and logging - it is VERY difficult to debug/postmortem complex, highly concurrent Go applications.
This is quite unlikely to change. For debugging and logging purposes,
I think that current best practice is to pass a context.Context value.
Post by robert engels
You should be able to group goroutines to share the same thread, and you need to be able to pin threads to cores.
I agree that this seems likely to be required, though the details are unclear.
Post by robert engels
while we’re at it, it needs volatile variables with ‘happens before’ relationships like Java. It makes code sharing much easier - and I know… don’t share data - just not feasible for really high performance systems.
As far as I can tell Java-style volatile variables don't give you
anything you can't already get using the sync/atomic package. What am
I missing?

Ian
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-09 14:13:25 UTC
Permalink
Yes, the atomic works, but you need to use it on the reader and the writer and the code becomes pretty obtuse and possibly error prone. Can’t use it with bools, so you need to create atomic.Bool class, which is essentially what you do for all of them, so then at least you can use read, increment, write methods on the struct for consistency.

But yea, it works, just not simply, but in retrospect, the stdlib could just offer these wrapper structs - although the problem is that they won’t be inlined, as I don’t think Go does more than a single level, and the struct methods would be making another method call (call atomic.AddXXX) which stops inlining… but I could be wrong here.
Post by Ian Lance Taylor
Post by robert engels
They need names - and the runtime can append .N to avoid name collision. This is solely for debugger and logging - it is VERY difficult to debug/postmortem complex, highly concurrent Go applications.
This is quite unlikely to change. For debugging and logging purposes,
I think that current best practice is to pass a context.Context value.
Post by robert engels
You should be able to group goroutines to share the same thread, and you need to be able to pin threads to cores.
I agree that this seems likely to be required, though the details are unclear.
Post by robert engels
while we’re at it, it needs volatile variables with ‘happens before’ relationships like Java. It makes code sharing much easier - and I know… don’t share data - just not feasible for really high performance systems.
As far as I can tell Java-style volatile variables don't give you
anything you can't already get using the sync/atomic package. What am
I missing?
Ian
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-09 14:15:14 UTC
Permalink
Also, the Context works, but anyone doing any complex application is going to need to do this. When something has that degree of reach, it should be a prime candidate for handling in the language/platform.
Post by Ian Lance Taylor
Post by robert engels
They need names - and the runtime can append .N to avoid name collision. This is solely for debugger and logging - it is VERY difficult to debug/postmortem complex, highly concurrent Go applications.
This is quite unlikely to change. For debugging and logging purposes,
I think that current best practice is to pass a context.Context value.
Post by robert engels
You should be able to group goroutines to share the same thread, and you need to be able to pin threads to cores.
I agree that this seems likely to be required, though the details are unclear.
Post by robert engels
while we’re at it, it needs volatile variables with ‘happens before’ relationships like Java. It makes code sharing much easier - and I know… don’t share data - just not feasible for really high performance systems.
As far as I can tell Java-style volatile variables don't give you
anything you can't already get using the sync/atomic package. What am
I missing?
Ian
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-09 13:58:28 UTC
Permalink
Hi Ian,

Good to know cgo->go overhead can be reduced in any case :)

For audio/media it seems the issue is variance in scheduling latency which
is traditionally handled by projects like Apple AudioUnits, JACK, and
googles Android oboe by invoking higher or "real" time priority OS threads.

It seems very much more a case of reducing timing jitter than the actual
number of CPU cycles to get something, like cgo->go, done. I would think
the CPU cycles would often be relatively deterministic.

I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign
specially scheduled or real-time thread. It would also apparently have the
problems in the issue cited below.

For audio, an additional complication is that all the suppliers of systems
like above which use real-time threads say you can't do much in them,
including for example schedule something in Go. I am
a bit skeptical of these claims because from them I can deduce, for
example, you can't safely record anything in any language because it may
eventually either allocate memory or write to disk in the real time thread
or under the control of the timing constraints implied by real time thread;
but recorders based on these systems exist. So some of it may be hype :)

Scott
Post by Scott Cotton
I wanted to bring your attention to this discussion on the port audio
mailing list regarding the role of OS real time priority threads and go
scheduling.
It is somewhat related also to this golang wiki page about game
libraries.
And this issue about scheduling performance using LockOsThread.
Given these things, I am curious: are there any thoughts about the
possibility of treating threads with real time priority differently? Am
I
perhaps wrong to say that using cgo to call back to go on a "foreign"
thread
has some problems or overhead associated with it when that thread has
some
special OS scheduling properties, like real-time priority?
I'm not aware of any proposals in this area.
Using cgo to call back to Go does have some overhead currently. Some
of that could be eliminated without too much work, but some would
remain. As far as I know these problems are independent of whether
the thread has special OS scheduling properties.
Ian
--
Scott Cotton
President, IRI France SAS
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ian Lance Taylor
2018-09-09 14:05:03 UTC
Permalink
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign specially
scheduled or real-time thread. It would also apparently have the problems
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.

Ian
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-13 00:39:51 UTC
Permalink
Thank Ian,

For audio, there is a tendency to have user land but OS privileged layer
code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and
Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it
is "real time" according to Apple) do this. There is a strong consensus
that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)

At any rate, there are different levels of interaction with Go implied by
this.

At the level of unprivileged access, Go would need to operate on threads
supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian:
Was wondering
if the improvements you suggested were related to setting up the Goroutine
on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?

At the level of privileged access, Go could potentially eventually offer a
replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to start
goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked
to threads, and
probably would violate some safety assumptions put on for foreign threads
in other types of applications.
But in this case, Go would control the "foreign" thread creation.

The M:N idea would in my estimation also be useful if applied in the case
of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type
behaviour.

My question to golang-dev as a whole is if it seems feasible to try to make
interoperability with
OS special scheduling characteristics of threads better, perhaps along the
lines above, and if anyone knows of other applications that fall in the
category of special OS thread scheduling (not cpu affinity) that would
benefit?

Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ian Lance Taylor
2018-09-13 01:10:28 UTC
Permalink
Post by Scott Cotton
Was wondering
if the improvements you suggested were related to setting up the Goroutine
on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
I was referring to setting up the goroutine on the non-Go thread. I
believe that could be faster. For example, see the TODO in the
comment for dropm in runtime/proc.go.

Ian
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-13 01:24:02 UTC
Permalink
As a comparison, Java threads have a platform agnostic “priority”. On some platforms, you can map this priority to an OS priority via config. Without that, JNI code is used to call pthread_setschedparam()

In these cases Java has one native thread per Java thread (original Java used green threads like Go).

I think for go to support this, as I stated before, you need to be able to assign goroutines to “groups”, and then set the cpu mask, and thread priority for the group.

This could be done with stdlib functions, and nothing changed in the language syntax,

runtime.AssignCurrentGo(group string)
runtime.AssignCpuMark(group string,mask []int)
runtime.AssignCpuPriority(group string,priority int)

most likely priority should be logical, with a mapping performed by the runtime. For Go’s simplicity you might be able to get away with LOW,NORMAL,HIGH,REALTIME
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it is "real time" according to Apple) do this. There is a strong consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio. It could use the native interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to make interoperability with
OS special scheduling characteristics of threads better, perhaps along the lines above, and if anyone knows of other applications that fall in the category of special OS thread scheduling (not cpu affinity) that would benefit?
Best
Scott
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign specially
scheduled or real-time thread. It would also apparently have the problems
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-13 08:28:11 UTC
Permalink
Post by robert engels
As a comparison, Java threads have a platform agnostic “priority”. On some
platforms, you can map this priority to an OS priority via config. Without
that, JNI code is used to call pthread_setschedparam()
In these cases Java has one native thread per Java thread (original Java
used green threads like Go).
I think for go to support this, as I stated before, you need to be able to
assign goroutines to “groups”, and then set the cpu mask, and thread
priority for the group.
OK, apologies I didn't get that the idea of groups was what I was looking
for before.
Post by robert engels
This could be done with stdlib functions, and nothing changed in the language syntax,
runtime.AssignCurrentGo(group string)
runtime.AssignCpuMark(group string,mask []int)
runtime.AssignCpuPriority(group string,priority int)
most likely priority should be logical, with a mapping performed by the
runtime. For Go’s simplicity you might be able to get away with
LOW,NORMAL,HIGH,REALTIME
This sounds fine to me now for the audio stuff (this stuff takes some time
for me to digest).

I think a review of scheduling with all GOOS values could help answer the
question of the best way to represent priority in this API.

Looking at go tool dist list, there are the following platforms for which I
have no idea how that would work or where to look:
plan9, nacl, js, windows,

I understand there is some talk of raspberry pi being added as well. Is
that so? If so is it effectively linux w.r.t. OS thread scheduling?

Other platforms? like iOS?

Scott
Post by robert engels
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it
is "real time" according to Apple) do this. There is a strong consensus
that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign threads
in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to
make interoperability with
OS special scheduling characteristics of threads better, perhaps along the
lines above, and if anyone knows of other applications that fall in the
category of special OS thread scheduling (not cpu affinity) that would
benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Robert Engels
2018-09-13 12:36:25 UTC
Permalink
No apologies - that was just my idea - it doesn’t exist! I don’t think it was well received but you never know. It’s hard to write really high performance applications without it though - that’s why the OS facilities are there in the first place.
Post by robert engels
As a comparison, Java threads have a platform agnostic “priority”. On some platforms, you can map this priority to an OS priority via config. Without that, JNI code is used to call pthread_setschedparam()
In these cases Java has one native thread per Java thread (original Java used green threads like Go).
I think for go to support this, as I stated before, you need to be able to assign goroutines to “groups”, and then set the cpu mask, and thread priority for the group.
OK, apologies I didn't get that the idea of groups was what I was looking for before.
Post by robert engels
This could be done with stdlib functions, and nothing changed in the language syntax,
runtime.AssignCurrentGo(group string)
runtime.AssignCpuMark(group string,mask []int)
runtime.AssignCpuPriority(group string,priority int)
most likely priority should be logical, with a mapping performed by the runtime. For Go’s simplicity you might be able to get away with LOW,NORMAL,HIGH,REALTIME
This sounds fine to me now for the audio stuff (this stuff takes some time for me to digest).
I think a review of scheduling with all GOOS values could help answer the question of the best way to represent priority in this API.
plan9, nacl, js, windows,
I understand there is some talk of raspberry pi being added as well. Is that so? If so is it effectively linux w.r.t. OS thread scheduling?
Other platforms? like iOS?
Scott
Post by robert engels
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it is "real time" according to Apple) do this. There is a strong consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio. It could use the native interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to make interoperability with
OS special scheduling characteristics of threads better, perhaps along the lines above, and if anyone knows of other applications that fall in the category of special OS thread scheduling (not cpu affinity) that would benefit?
Best
Scott
Post by Ian Lance Taylor
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign specially
scheduled or real-time thread. It would also apparently have the problems
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Bryan C. Mills' via golang-dev
2018-09-13 12:53:41 UTC
Permalink
I would expect that you could set up the following structure fairly easily:

- From a privileged C thread, invoke a cgo-exported Go function. The Go
function can loop (without returning) to perform whatever real-time work is
needed, using buffered channels to communicate with the rest of the program
(and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does not need
real-time scheduling (such as pre-rendering or decoding chunks of audio).

FWIW, I have done a couple of experiments with real-time audio in Go in the
past. In 2013 it was possible to get acceptable latency characteristics for
interactive performance on a Linux desktop machine (using the ALSA C API)
without any special scheduling, provided that the main loop did not
allocate. Given the GC latency improvements since then, I would be
surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer
code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and
Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it
is "real time" according to Apple) do this. There is a strong consensus
that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by
this.
At the level of unprivileged access, Go would need to operate on threads
supplied
Was wondering
if the improvements you suggested were related to setting up the Goroutine
on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a
replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked
to threads, and
probably would violate some safety assumptions put on for foreign threads
in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case
of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type
behaviour.
My question to golang-dev as a whole is if it seems feasible to try to
make interoperability with
OS special scheduling characteristics of threads better, perhaps along the
lines above, and if anyone knows of other applications that fall in the
category of special OS thread scheduling (not cpu affinity) that would
benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Robert Engels
2018-09-13 13:03:58 UTC
Permalink
I would be surprised it would be needed for audio as well, given the GC pause times, and also that the OS drivers are buffered.

I was referring more to HPC systems and cache locality. Hard to achieve with thousands of Goroutines if you can’t group and isolate them.
From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
In other goroutines, perform any background work that does not need real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in the past. In 2013 it was possible to get acceptable latency characteristics for interactive performance on a Linux desktop machine (using the ALSA C API) without any special scheduling, provided that the main loop did not allocate. Given the GC latency improvements since then, I would be surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it is "real time" according to Apple) do this. There is a strong consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio. It could use the native interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to make interoperability with
OS special scheduling characteristics of threads better, perhaps along the lines above, and if anyone knows of other applications that fall in the category of special OS thread scheduling (not cpu affinity) that would benefit?
Best
Scott
Post by Ian Lance Taylor
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign specially
scheduled or real-time thread. It would also apparently have the problems
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-13 13:16:33 UTC
Permalink
I think there is some over-hype in audio real time requirements. Most
places, such as this post
<http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing>,
and github.com/google/oboe, cite restrictions which are more or less
equivalent to common restrictions for hardware implementability of software.

However, in reality, variance in CPU cycles for memory lookup due to memory
cache, variance in compiler optimisations, branch prediction, etc can yield
huge variance in the real time it takes to get something done even with
restrictions such as in the post above.

That said, last I checked (which is quite a while ago) a JIFFY in linux was
1/100 second, which by itself
is plenty to cause a glitch in a low latency audio APP.

I think Go's low latency GC is great for audio, but it does run in the
context of OS scheduling, and so must be subject to OS thread scheduling
latency limitations in any case.


Scott
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC
pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve
with thousands of Goroutines if you can’t group and isolate them.
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
- From a privileged C thread, invoke a cgo-exported Go function. The
Go function can loop (without returning) to perform whatever real-time work
is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does not need
real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in
the past. In 2013 it was possible to get acceptable latency characteristics
for interactive performance on a Linux desktop machine (using the ALSA C
API) without any special scheduling, provided that the main loop did not
allocate. Given the GC latency improvements since then, I would be
surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO)
and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but
it is "real time" according to Apple) do this. There is a strong consensus
that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer
a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign threads
in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to
make interoperability with
OS special scheduling characteristics of threads better, perhaps along
the lines above, and if anyone knows of other applications that fall in the
category of special OS thread scheduling (not cpu affinity) that would
benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ralph Corderoy
2018-09-13 15:16:58 UTC
Permalink
Hi Scott,
Post by Scott Cotton
That said, last I checked (which is quite a while ago) a JIFFY in
linux was 1/100 second, which by itself is plenty to cause a glitch in
a low latency audio APP.
time(7) on Linux here says

The value of HZ varies across kernel versions and hardware platforms.
On i386 the situation is as follows: on kernels up to and including
2.4.x, HZ was 100, giving a jiffy value of 0.01 seconds; starting with
2.6.0, HZ was raised to 1000, giving a jiffy of 0.001 seconds. Since
kernel 2.6.13, the HZ value is a kernel configuration parameter and can
be 100, 250 (the default) or 1000, yielding a jiffies value of,
respectively, 0.01, 0.004, or 0.001 seconds. Since kernel 2.6.20, a
further frequency is available: 300, a number that divides evenly for
the common video frame rates (PAL, 25 HZ; NTSC, 30 HZ).

...

Before Linux 2.6.21, the accuracy of timer and sleep system calls
(see below) was also limited by the size of the jiffy.

Since Linux 2.6.21, Linux supports high-resolution timers (HRTs),
optionally configurable via CONFIG_HIGH_RES_TIMERS. On a system
that supports HRTs, the accuracy of sleep and timer system calls is
no longer constrained by the jiffy, but instead can be as accurate
as the hardware allows (microsecond accuracy is typical of modern
hardware). You can determine whether high-resolution timers are
supported by checking the resolution returned by a call to
clock_getres(2) or looking at the "resolution" entries in
/proc/timer_list.

Seems it's 300 here by default on Linux 4.18.6-arch1-1-ARCH x86_64.

$ sudo -i sh -c 'grep ^jiff /proc/timer_list; sleep 1; grep ^jiff /proc/timer_list'
jiffies: 4328100819
jiffies: 4328100819
jiffies: 4328100819
jiffies: 4328100819
jiffies: 4328101124
jiffies: 4328101124
jiffies: 4328101124
jiffies: 4328101124
$ dc -e '4328101124 4328100819-p'
305
$

/proc/timer_list also shows hrtimer_interrupt is the event_handler, with
the notional resolution of 1 ns backing that up.
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-13 17:23:32 UTC
Permalink
Hi Ralph,

Thanks for the update.

Looks like linux is taking care to adopt its scheduling to media.

Also 250 is the default, which is no longer sufficient to create a glitch
in a 5ms latency app
but is close.

Scott
Post by Ralph Corderoy
Hi Scott,
Post by Scott Cotton
That said, last I checked (which is quite a while ago) a JIFFY in
linux was 1/100 second, which by itself is plenty to cause a glitch in
a low latency audio APP.
time(7) on Linux here says
The value of HZ varies across kernel versions and hardware platforms.
On i386 the situation is as follows: on kernels up to and including
2.4.x, HZ was 100, giving a jiffy value of 0.01 seconds; starting with
2.6.0, HZ was raised to 1000, giving a jiffy of 0.001 seconds. Since
kernel 2.6.13, the HZ value is a kernel configuration parameter and can
be 100, 250 (the default) or 1000, yielding a jiffies value of,
respectively, 0.01, 0.004, or 0.001 seconds. Since kernel 2.6.20, a
further frequency is available: 300, a number that divides evenly for
the common video frame rates (PAL, 25 HZ; NTSC, 30 HZ).
...
Before Linux 2.6.21, the accuracy of timer and sleep system calls
(see below) was also limited by the size of the jiffy.
Since Linux 2.6.21, Linux supports high-resolution timers (HRTs),
optionally configurable via CONFIG_HIGH_RES_TIMERS. On a system
that supports HRTs, the accuracy of sleep and timer system calls is
no longer constrained by the jiffy, but instead can be as accurate
as the hardware allows (microsecond accuracy is typical of modern
hardware). You can determine whether high-resolution timers are
supported by checking the resolution returned by a call to
clock_getres(2) or looking at the "resolution" entries in
/proc/timer_list.
Seems it's 300 here by default on Linux 4.18.6-arch1-1-ARCH x86_64.
$ sudo -i sh -c 'grep ^jiff /proc/timer_list; sleep 1; grep ^jiff /proc/timer_list'
jiffies: 4328100819
jiffies: 4328100819
jiffies: 4328100819
jiffies: 4328100819
jiffies: 4328101124
jiffies: 4328101124
jiffies: 4328101124
jiffies: 4328101124
$ dc -e '4328101124 4328100819-p'
305
$
/proc/timer_list also shows hrtimer_interrupt is the event_handler, with
the notional resolution of 1 ns backing that up.
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-13 23:36:35 UTC
Permalink
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC
pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve
with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time duration
below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency +
GC pause times. The GC latency improvements are great, and enable a lot,
but the GC operates within the context of OS thread scheduling. A Go app
with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS need
to prioritise something else first.

In other languages (Java, C) the likelihood of 0.004 sec latency resulting
from scheduling happens less often because of support for thread
scheduling. That is, they are more reliable than full Go with goroutines.
It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a callback
given to a host sound system to run on a high priority thread.

I like your idea of adding to the runtime grouping, scheduling class, and
affinity very much for audio.

Best,
Scott
Post by Robert Engels
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
- From a privileged C thread, invoke a cgo-exported Go function. The
Go function can loop (without returning) to perform whatever real-time work
is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does not need
real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in
the past. In 2013 it was possible to get acceptable latency characteristics
for interactive performance on a Linux desktop machine (using the ALSA C
API) without any special scheduling, provided that the main loop did not
allocate. Given the GC latency improvements since then, I would be
surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO)
and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but
it is "real time" according to Apple) do this. There is a strong consensus
that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer
a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign threads
in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to
make interoperability with
OS special scheduling characteristics of threads better, perhaps along
the lines above, and if anyone knows of other applications that fall in the
category of special OS thread scheduling (not cpu affinity) that would
benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-14 03:46:19 UTC
Permalink
The linux kernel can perform context switches in under 5 usecs on “standard hardware”. In the case of equal priorities, I believe the standard time slice is 100 ms (although 64 ms in many systems). So without scheduling priority control, if a program needs anything under 100 ms it is not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency + GC pause times. The GC latency improvements are great, and enable a lot, but the GC operates within the context of OS thread scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency resulting from scheduling happens less often because of support for thread scheduling. That is, they are more reliable than full Go with goroutines. It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class, and affinity very much for audio.
Best,
Scott
From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
In other goroutines, perform any background work that does not need real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in the past. In 2013 it was possible to get acceptable latency characteristics for interactive performance on a Linux desktop machine (using the ALSA C API) without any special scheduling, provided that the main loop did not allocate. Given the GC latency improvements since then, I would be surprised if the “do not allocate” proviso is even still needed.
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it is "real time" according to Apple) do this. There is a strong consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio. It could use the native interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to make interoperability with
OS special scheduling characteristics of threads better, perhaps along the lines above, and if anyone knows of other applications that fall in the category of special OS thread scheduling (not cpu affinity) that would benefit?
Best
Scott
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign specially
scheduled or real-time thread. It would also apparently have the problems
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-14 20:35:00 UTC
Permalink
Hi Robert and All,

Ralph gave us info on the jiffy in linux scheduling. Although your
conclusions are in line with his data, the numbers are a little bit off.
The default jiffy (roughly: the time that the scheduler gives one thread to
occupy a cpu core without interruption) is 0.004 seconds. It was 0.01
seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a
bit, but that was too fast, so they backed off to 0.004 shortly thereafter.

This is why reliable low latency audio uses special thread priorities. It
is used in Android AAudio, and Apple CoreAudio for this purpose, which I
would classify as widespread use in real applications. Although you can
get low latency in benchmarks on unloaded machines where you don't have to
wait for a jiffy, this is not considered reliable in audio systems unless
you have dedicated hardware. For similar reasons, it is also not considered
reliable to have sys calls like sigaltstack in cgo callbacks on audio
processing/rendering threads. this portaudio mailing list message
<https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is
a good description.

In answer to the question of how far we can go in audio without scheduling
priorities in the go runtime, it seems to me there are the following known
limitations:
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys
call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in
any way.

A simple conclusion is that low latency audio apps in Go are unreliable on
most platforms as the runtime+cgo mechanism stands today. It doesn't
matter how well something is programmed in pure Go or how smartly the work
is divided between cgo and go. It doesn't even matter if someone
benchmarks their system and claims to have had reliable low latency (like
me and Brian), because these issues are caused by the relationship between
go and the host OS thread scheduling and the widespread need for special
thread priorities in audio systems.

I have a goal of making reliable low latency audio apps in Go. I think it
is a reasonable goal since Go is a good general purpose language, with both
low and high level features. But I think these issues together are a
stopper for reliable latency under about twice a jiffy, which doesn't
really quite fall into the low latency category.

Ian suggested the TODO in dropm in runtime/proc.go. This would help issue
1. I have started looking at it and it seems to so far (the code is pretty
deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with the
improvement suggested in the TODO. Any runtime/proc.go gurus willing to
comment?

Robert suggested adding runtime functions to define thread priorities and
affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving 1.
Ian agreed that something like that was necessary but details were unclear.

I have started looking at how to make progress on that more concrete. I
have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this
functionality shouldn't be supported)
with no response as of yet.

Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on
“standard hardware”. In the case of equal priorities, I believe the
standard time slice is 100 ms (although 64 ms in many systems). So without
scheduling priority control, if a program needs anything under 100 ms it is
not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC
pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve
with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time duration
below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency
+ GC pause times. The GC latency improvements are great, and enable a lot,
but the GC operates within the context of OS thread scheduling. A Go app
with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS
need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency resulting
from scheduling happens less often because of support for thread
scheduling. That is, they are more reliable than full Go with goroutines.
It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a
callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class, and
affinity very much for audio.
Best,
Scott
Post by Robert Engels
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
- From a privileged C thread, invoke a cgo-exported Go function. The
Go function can loop (without returning) to perform whatever real-time work
is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does not need
real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in
the past. In 2013 it was possible to get acceptable latency characteristics
for interactive performance on a Linux desktop machine (using the ALSA C
API) without any special scheduling, provided that the main loop did not
allocate. Given the GC latency improvements since then, I would be
surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO)
and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but
it is "real time" according to Apple) do this. There is a strong consensus
that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls.
Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually offer
a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the
case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to
make interoperability with
OS special scheduling characteristics of threads better, perhaps along
the lines above, and if anyone knows of other applications that fall in the
category of special OS thread scheduling (not cpu affinity) that would
benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Robert Engels
2018-09-15 02:57:29 UTC
Permalink
I don’t think your numbers are correct. The minimum is 1 ms but the default is closer to 100 ms, but there are a lot of settings to control it.

This is a great read https://notes.shichao.io/lkd/ch4/#timeslice

Sent from my iPhone
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your conclusions are in line with his data, the numbers are a little bit off. The default jiffy (roughly: the time that the scheduler gives one thread to occupy a cpu core without interruption) is 0.004 seconds. It was 0.01 seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities. It is used in Android AAudio, and Apple CoreAudio for this purpose, which I would classify as widespread use in real applications. Although you can get low latency in benchmarks on unloaded machines where you don't have to wait for a jiffy, this is not considered reliable in audio systems unless you have dedicated hardware. For similar reasons, it is also not considered reliable to have sys calls like sigaltstack in cgo callbacks on audio processing/rendering threads. this portaudio mailing list message is a good description.
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable on most platforms as the runtime+cgo mechanism stands today. It doesn't matter how well something is programmed in pure Go or how smartly the work is divided between cgo and go. It doesn't even matter if someone benchmarks their system and claims to have had reliable low latency (like me and Brian), because these issues are caused by the relationship between go and the host OS thread scheduling and the widespread need for special thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think it is a reasonable goal since Go is a good general purpose language, with both low and high level features. But I think these issues together are a stopper for reliable latency under about twice a jiffy, which doesn't really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help issue 1. I have started looking at it and it seems to so far (the code is pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with the improvement suggested in the TODO. Any runtime/proc.go gurus willing to comment?
Robert suggested adding runtime functions to define thread priorities and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving 1. Ian agreed that something like that was necessary but details were unclear.
I have started looking at how to make progress on that more concrete. I have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on “standard hardware”. In the case of equal priorities, I believe the standard time slice is 100 ms (although 64 ms in many systems). So without scheduling priority control, if a program needs anything under 100 ms it is not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency + GC pause times. The GC latency improvements are great, and enable a lot, but the GC operates within the context of OS thread scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency resulting from scheduling happens less often because of support for thread scheduling. That is, they are more reliable than full Go with goroutines. It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class, and affinity very much for audio.
Best,
Scott
Post by Robert Engels
From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
In other goroutines, perform any background work that does not need real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in the past. In 2013 it was possible to get acceptable latency characteristics for interactive performance on a Linux desktop machine (using the ALSA C API) without any special scheduling, provided that the main loop did not allocate. Given the GC latency improvements since then, I would be surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it is "real time" according to Apple) do this. There is a strong consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio. It could use the native interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to make interoperability with
OS special scheduling characteristics of threads better, perhaps along the lines above, and if anyone knows of other applications that fall in the category of special OS thread scheduling (not cpu affinity) that would benefit?
Best
Scott
Post by Ian Lance Taylor
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign specially
scheduled or real-time thread. It would also apparently have the problems
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-15 11:59:03 UTC
Permalink
Post by Robert Engels
I don’t think your numbers are correct. The minimum is 1 ms but the
default is closer to 100 ms, but there are a lot of settings to control it.
The numbers came from
https://groups.google.com/d/msg/golang-dev/EVwSXv8JTsk/a8AzEl8tCAAJ and
appear correct to me. 0.1 second time slice does not make sense to me.
Post by Robert Engels
This is a great read https://notes.shichao.io/lkd/ch4/#timeslice
Thanks for the link, it is a nice read. Nothing in there that I see says
anything about 0.1 second time slicing.
Post by Robert Engels
Sent from my iPhone
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your
conclusions are in line with his data, the numbers are a little bit off.
The default jiffy (roughly: the time that the scheduler gives one thread to
occupy a cpu core without interruption) is 0.004 seconds. It was 0.01
seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a
bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities. It
is used in Android AAudio, and Apple CoreAudio for this purpose, which I
would classify as widespread use in real applications. Although you can
get low latency in benchmarks on unloaded machines where you don't have to
wait for a jiffy, this is not considered reliable in audio systems unless
you have dedicated hardware. For similar reasons, it is also not considered
reliable to have sys calls like sigaltstack in cgo callbacks on audio
processing/rendering threads. this portaudio mailing list message
<https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is
a good description.
In answer to the question of how far we can go in audio without scheduling
priorities in the go runtime, it seems to me there are the following known
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys
call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable on
most platforms as the runtime+cgo mechanism stands today. It doesn't
matter how well something is programmed in pure Go or how smartly the work
is divided between cgo and go. It doesn't even matter if someone
benchmarks their system and claims to have had reliable low latency (like
me and Brian), because these issues are caused by the relationship between
go and the host OS thread scheduling and the widespread need for special
thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think it
is a reasonable goal since Go is a good general purpose language, with both
low and high level features. But I think these issues together are a
stopper for reliable latency under about twice a jiffy, which doesn't
really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help issue
1. I have started looking at it and it seems to so far (the code is pretty
deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with the
improvement suggested in the TODO. Any runtime/proc.go gurus willing to
comment?
Robert suggested adding runtime functions to define thread priorities and
affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving 1.
Ian agreed that something like that was necessary but details were unclear.
I have started looking at how to make progress on that more concrete. I
have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this
functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on
“standard hardware”. In the case of equal priorities, I believe the
standard time slice is 100 ms (although 64 ms in many systems). So without
scheduling priority control, if a program needs anything under 100 ms it is
not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC
pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve
with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time
duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency
+ GC pause times. The GC latency improvements are great, and enable a lot,
but the GC operates within the context of OS thread scheduling. A Go app
with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS
need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency
resulting from scheduling happens less often because of support for thread
scheduling. That is, they are more reliable than full Go with goroutines.
It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a
callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class, and
affinity very much for audio.
Best,
Scott
Post by Robert Engels
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
- From a privileged C thread, invoke a cgo-exported Go function. The
Go function can loop (without returning) to perform whatever real-time work
is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does not
need real-time scheduling (such as pre-rendering or decoding chunks of
audio).
FWIW, I have done a couple of experiments with real-time audio in Go in
the past. In 2013 it was possible to get acceptable latency characteristics
for interactive performance on a Linux desktop machine (using the ALSA C
API) without any special scheduling, provided that the main loop did not
allocate. Given the GC latency improvements since then, I would be
surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO)
and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but
it is "real time" according to Apple) do this. There is a strong consensus
that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls.
Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually
offer a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the
case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to
make interoperability with
OS special scheduling characteristics of threads better, perhaps along
the lines above, and if anyone knows of other applications that fall in the
category of special OS thread scheduling (not cpu affinity) that would
benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-15 20:33:25 UTC
Permalink
Also, the report states:

"Linux's CFS scheduler does not directly assign timeslices to processes, but assigns processes a proportion of the processor. The amount of processor time that a process receives is a function of the load of the system. This assigned proportion is further affected by each process's nice value. The nice value acts as a weight, changing the proportion of the processor time each process receives. Processes with higher nice values (a lower priority) receive a deflationary weight, yielding them a smaller proportion of the processor, and vice versa.

With the CFS scheduler, whether the process runs immediately (preempting the currently running process) is a function of how much of a proportion of the processor the newly runnable processor has consumed. If it has consumed a smaller proportion of the processor than the currently executing process, it runs immediately"

If you run the numbers with everything running at default priority and default nice value, and two CPU bound SCHED_NORMAL processes, you will see that the timeslice is very close to 100 ms as I stated - meaning a process will run for 100 ms before it is pre-empted to run the other - leading to 100 ms scheduling delays.

That being said, most process are not CPU bound, but IO bound, thus the scheduler attempts the run the process in a “fair” fashion, while trying to minimize context switches, because they are inefficient, and lead to poor use of the CPU cache.

From the linux source code:

/*
* default timeslice is 100 msecs (used only for SCHED_RR tasks).
* Timeslices get refilled after they expire.
*/
#define RR_TIMESLICE (100 * HZ / 1000

So for two SCHED_RR programs at equal priority, it is 100 ms.
I don’t think your numbers are correct. The minimum is 1 ms but the default is closer to 100 ms, but there are a lot of settings to control it.
The numbers came from https://groups.google.com/d/msg/golang-dev/EVwSXv8JTsk/a8AzEl8tCAAJ and appear correct to me. 0.1 second time slice does not make sense to me.
This is a great read https://notes.shichao.io/lkd/ch4/#timeslice
Thanks for the link, it is a nice read. Nothing in there that I see says anything about 0.1 second time slicing.
Sent from my iPhone
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your conclusions are in line with his data, the numbers are a little bit off. The default jiffy (roughly: the time that the scheduler gives one thread to occupy a cpu core without interruption) is 0.004 seconds. It was 0.01 seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities. It is used in Android AAudio, and Apple CoreAudio for this purpose, which I would classify as widespread use in real applications. Although you can get low latency in benchmarks on unloaded machines where you don't have to wait for a jiffy, this is not considered reliable in audio systems unless you have dedicated hardware. For similar reasons, it is also not considered reliable to have sys calls like sigaltstack in cgo callbacks on audio processing/rendering threads. this portaudio mailing list message is a good description.
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable on most platforms as the runtime+cgo mechanism stands today. It doesn't matter how well something is programmed in pure Go or how smartly the work is divided between cgo and go. It doesn't even matter if someone benchmarks their system and claims to have had reliable low latency (like me and Brian), because these issues are caused by the relationship between go and the host OS thread scheduling and the widespread need for special thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think it is a reasonable goal since Go is a good general purpose language, with both low and high level features. But I think these issues together are a stopper for reliable latency under about twice a jiffy, which doesn't really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help issue 1. I have started looking at it and it seems to so far (the code is pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with the improvement suggested in the TODO. Any runtime/proc.go gurus willing to comment?
Robert suggested adding runtime functions to define thread priorities and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving 1. Ian agreed that something like that was necessary but details were unclear.
I have started looking at how to make progress on that more concrete. I have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
The linux kernel can perform context switches in under 5 usecs on “standard hardware”. In the case of equal priorities, I believe the standard time slice is 100 ms (although 64 ms in many systems). So without scheduling priority control, if a program needs anything under 100 ms it is not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency + GC pause times. The GC latency improvements are great, and enable a lot, but the GC operates within the context of OS thread scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency resulting from scheduling happens less often because of support for thread scheduling. That is, they are more reliable than full Go with goroutines. It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class, and affinity very much for audio.
Best,
Scott
• From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
• In other goroutines, perform any background work that does not need real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in the past. In 2013 it was possible to get acceptable latency characteristics for interactive performance on a Linux desktop machine (using the ALSA C API) without any special scheduling, provided that the main loop did not allocate. Given the GC latency improvements since then, I would be surprised if the “do not allocate” proviso is even still needed.
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it is "real time" according to Apple) do this. There is a strong consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio. It could use the native interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to make interoperability with
OS special scheduling characteristics of threads better, perhaps along the lines above, and if anyone knows of other applications that fall in the category of special OS thread scheduling (not cpu affinity) that would benefit?
Best
Scott
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign specially
scheduled or real-time thread. It would also apparently have the problems
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-15 22:38:45 UTC
Permalink
So there's something to the 0.1 time slicing after all. My linux scheduler
knowledge is quite dated, last time I looked there was a JIFFY
and it was 0.01 seconds for CPU bound switching.

I don't think that an interactive music or voip app will appear on a linux
kernel with 2 processes very often. And as load factors into the equation I
would guess that means that as the number of processes/threads increases,
switching rate increases, making the time allocated to a process/thread
smaller. Maybe JIFFY now means the lower bound of that time allocation?

Although these numbers are relevant to the analysis, I think that whatever
the numbers are in practice, the bottom line is that
OS privileged threads like SCHED_FIFO in AAudio are higher priority, widely
used, and the result is increased reliability. How to make that work with
Go is not clear, as any delegation of work to unprivileged Go process will
no longer be scheduled by the OS at high priority.

Scott
Post by robert engels
"Linux's CFS scheduler does not directly assign timeslices to processes,
but assigns processes a proportion of the processor. The amount of
processor time that a process receives is a function of the load of the
system. This assigned proportion is further affected by each process's nice
value. The nice value acts as a weight, changing the proportion of the
processor time each process receives. Processes with higher nice values (a
lower priority) receive a deflationary weight, yielding them a smaller
proportion of the processor, and vice versa.
With the CFS scheduler, whether the process runs immediately (preempting
the currently running process) is a function of how much of a proportion of
the processor the newly runnable processor has consumed. If it has consumed
a smaller proportion of the processor than the currently executing process,
it runs immediately"
If you run the numbers with everything running at default priority and
default nice value, and two CPU bound SCHED_NORMAL processes, you will see
that the timeslice is very close to 100 ms as I stated - meaning a process
will run for 100 ms before it is pre-empted to run the other - leading to
100 ms scheduling delays.
That being said, most process are not CPU bound, but IO bound, thus the
scheduler attempts the run the process in a “fair” fashion, while trying to
minimize context switches, because they are inefficient, and lead to poor
use of the CPU cache.
/*
* default timeslice is 100 msecs (used only for SCHED_RR tasks).
* Timeslices get refilled after they expire.
*/
#define RR_TIMESLICE (100 * HZ / 1000
So for two SCHED_RR programs at equal priority, it is 100 ms.
Post by Robert Engels
I don’t think your numbers are correct. The minimum is 1 ms but the
default is closer to 100 ms, but there are a lot of settings to control it.
Post by Robert Engels
The numbers came from
https://groups.google.com/d/msg/golang-dev/EVwSXv8JTsk/a8AzEl8tCAAJ and
appear correct to me. 0.1 second time slice does not make sense to me.
Post by Robert Engels
This is a great read https://notes.shichao.io/lkd/ch4/#timeslice
Thanks for the link, it is a nice read. Nothing in there that I see
says anything about 0.1 second time slicing.
Post by Robert Engels
Sent from my iPhone
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your
conclusions are in line with his data, the numbers are a little bit off.
The default jiffy (roughly: the time that the scheduler gives one thread to
occupy a cpu core without interruption) is 0.004 seconds. It was 0.01
seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a
bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
Post by Robert Engels
Post by Scott Cotton
This is why reliable low latency audio uses special thread priorities.
It is used in Android AAudio, and Apple CoreAudio for this purpose, which
I would classify as widespread use in real applications. Although you can
get low latency in benchmarks on unloaded machines where you don't have to
wait for a jiffy, this is not considered reliable in audio systems unless
you have dedicated hardware. For similar reasons, it is also not considered
reliable to have sys calls like sigaltstack in cgo callbacks on audio
processing/rendering threads. this portaudio mailing list message is a
good description.
Post by Robert Engels
Post by Scott Cotton
In answer to the question of how far we can go in audio without
scheduling priorities in the go runtime, it seems to me there are the
Post by Robert Engels
Post by Scott Cotton
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys
call) in callbacks on host supplied realtime threads.
Post by Robert Engels
Post by Scott Cotton
2. Go's runtime can't distinguish OS thread scheduling differences them
in any way.
Post by Robert Engels
Post by Scott Cotton
A simple conclusion is that low latency audio apps in Go are unreliable
on most platforms as the runtime+cgo mechanism stands today. It doesn't
matter how well something is programmed in pure Go or how smartly the work
is divided between cgo and go. It doesn't even matter if someone
benchmarks their system and claims to have had reliable low latency (like
me and Brian), because these issues are caused by the relationship between
go and the host OS thread scheduling and the widespread need for special
thread priorities in audio systems.
Post by Robert Engels
Post by Scott Cotton
I have a goal of making reliable low latency audio apps in Go. I think
it is a reasonable goal since Go is a good general purpose language, with
both low and high level features. But I think these issues together are a
stopper for reliable latency under about twice a jiffy, which doesn't
really quite fall into the low latency category.
Post by Robert Engels
Post by Scott Cotton
Ian suggested the TODO in dropm in runtime/proc.go. This would help
issue 1. I have started looking at it and it seems to so far (the code is
pretty deep to learn overnight, so take with a grain of salt): in any event
Post by Robert Engels
Post by Scott Cotton
the cgo->go and go->cgo directions would involve sigaltstack even with
the improvement suggested in the TODO. Any runtime/proc.go gurus willing
to comment?
Post by Robert Engels
Post by Scott Cotton
Robert suggested adding runtime functions to define thread priorities
and affinities for groups of goroutines.
Post by Robert Engels
Post by Scott Cotton
This would solve issue 2 and to some extent obviate a need for solving
1. Ian agreed that something like that was necessary but details were
unclear.
Post by Robert Engels
Post by Scott Cotton
I have started looking at how to make progress on that more concrete.
I have asked for help w.r.t.
Post by Robert Engels
Post by Scott Cotton
plan9 and windows and the various js host targets (where I guess this
functionality shouldn't be supported)
Post by Robert Engels
Post by Scott Cotton
with no response as of yet.
Best,
Scott
The linux kernel can perform context switches in under 5 usecs on
“standard hardware”. In the case of equal priorities, I believe the
standard time slice is 100 ms (although 64 ms in many systems). So without
scheduling priority control, if a program needs anything under 100 ms it is
not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the
GC pause times, and also that the OS drivers are buffered.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
I was referring more to HPC systems and cache locality. Hard to
achieve with thousands of Goroutines if you can’t group and isolate them.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
The OS driver buffers for low latency audio represent a real time
duration below the default OS thread scheduling latency. For Go,
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
I believe the up front latency cost would be OS thread scheduling
latency + GC pause times. The GC latency improvements are great, and
enable a lot, but the GC operates within the context of OS thread
scheduling. A Go app with 1ms GC latency would have
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
by default on linux a 0.004 sec latency + the GC latency should the OS
need to prioritise something else first.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
In other languages (Java, C) the likelihood of 0.004 sec latency
resulting from scheduling happens less often because of support for thread
scheduling. That is, they are more reliable than full Go with goroutines.
It is unfortunate to have to do CGO with no goroutines,
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
and have sigaltstack overhead and what not associated with it on a
callback given to a host sound system to run on a high priority thread.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
I like your idea of adding to the runtime grouping, scheduling class,
and affinity very much for audio.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Best,
Scott
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
Post by 'Bryan C. Mills' via golang-dev
I would expect that you could set up the following structure fairly
• From a privileged C thread, invoke a cgo-exported Go
function. The Go function can loop (without returning) to perform whatever
real-time work is needed, using buffered channels to communicate with the
rest of the program (and thereby avoid blocking the privileged thread).
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
• In other goroutines, perform any background work that does
not need real-time scheduling (such as pre-rendering or decoding chunks of
audio).
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
FWIW, I have done a couple of experiments with real-time audio in Go
in the past. In 2013 it was possible to get acceptable latency
characteristics for interactive performance on a Linux desktop machine
(using the ALSA C API) without any special scheduling, provided that the
main loop did not allocate. Given the GC latency improvements since then, I
would be surprised if the “do not allocate” proviso is even still needed.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
Thank Ian,
For audio, there is a tendency to have user land but OS privileged
layer code
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
that uses special thread scheduling. For example, AAudio
(SCHED_FIFO) and Apple CoreAudio
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
(not sure about the details of how it relates to darwins scheduler,
but it is "real time" according to Apple) do this. There is a strong
consensus that this is necessary for reliable
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go
implied by this.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
At the level of unprivileged access, Go would need to operate on
threads supplied
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
by the above systems. Presumably, this would be via cgo->go calls.
Ian: Was wondering
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
if the improvements you suggested were related to setting up the
Goroutine on the
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
At the level of privileged access, Go could potentially eventually
offer a replacement for
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
To my understanding, this is not currently possible with Goroutines
locked to threads, and
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the
case of unprivileged
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
access. It would I guess mostly take the form of old GOMAXPROCS=1
type behaviour.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
My question to golang-dev as a whole is if it seems feasible to try
to make interoperability with
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
OS special scheduling characteristics of threads better, perhaps
along the lines above, and if anyone knows of other applications that fall
in the category of special OS thread scheduling (not cpu affinity) that
would benefit?
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
Best
Scott
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would
be
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
Post by Scott Cotton
impossible to do without risk on the first scheduling of a foreign
specially
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
Post by 'Bryan C. Mills' via golang-dev
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
Post by Robert Engels
Post by Scott Cotton
Post by Robert Engels
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
Post by Robert Engels
Post by Scott Cotton
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
Post by Robert Engels
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-15 03:28:29 UTC
Permalink
One other note, if you use a Go thread/routine - it is still going to be subject to GC pauses - which can vary greatly. even with OS scheduling support.

This is a completely different problem, but if it can’t be solved, the OS priority changes won’t matter.

I think for very low-latency audio, you need native threads, with no dynamic memory allocation, that communicate with the Go threads via shared memory/queue.

Or Go needs a lower pause GC, like Azul Zing, but that is proprietary.

The Go folks might want to investigate the Shenandoah collector in the OpenJDK - because in early tests its pretty amazing and open source :)
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your conclusions are in line with his data, the numbers are a little bit off. The default jiffy (roughly: the time that the scheduler gives one thread to occupy a cpu core without interruption) is 0.004 seconds. It was 0.01 seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities. It is used in Android AAudio, and Apple CoreAudio for this purpose, which I would classify as widespread use in real applications. Although you can get low latency in benchmarks on unloaded machines where you don't have to wait for a jiffy, this is not considered reliable in audio systems unless you have dedicated hardware. For similar reasons, it is also not considered reliable to have sys calls like sigaltstack in cgo callbacks on audio processing/rendering threads. this portaudio mailing list message <https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is a good description.
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable on most platforms as the runtime+cgo mechanism stands today. It doesn't matter how well something is programmed in pure Go or how smartly the work is divided between cgo and go. It doesn't even matter if someone benchmarks their system and claims to have had reliable low latency (like me and Brian), because these issues are caused by the relationship between go and the host OS thread scheduling and the widespread need for special thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think it is a reasonable goal since Go is a good general purpose language, with both low and high level features. But I think these issues together are a stopper for reliable latency under about twice a jiffy, which doesn't really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help issue 1. I have started looking at it and it seems to so far (the code is pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with the improvement suggested in the TODO. Any runtime/proc.go gurus willing to comment?
Robert suggested adding runtime functions to define thread priorities and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving 1. Ian agreed that something like that was necessary but details were unclear.
I have started looking at how to make progress on that more concrete. I have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
The linux kernel can perform context switches in under 5 usecs on “standard hardware”. In the case of equal priorities, I believe the standard time slice is 100 ms (although 64 ms in many systems). So without scheduling priority control, if a program needs anything under 100 ms it is not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency + GC pause times. The GC latency improvements are great, and enable a lot, but the GC operates within the context of OS thread scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency resulting from scheduling happens less often because of support for thread scheduling. That is, they are more reliable than full Go with goroutines. It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class, and affinity very much for audio.
Best,
Scott
From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
In other goroutines, perform any background work that does not need real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in the past. In 2013 it was possible to get acceptable latency characteristics for interactive performance on a Linux desktop machine (using the ALSA C API) without any special scheduling, provided that the main loop did not allocate. Given the GC latency improvements since then, I would be surprised if the “do not allocate” proviso is even still needed.
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it is "real time" according to Apple) do this. There is a strong consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio. It could use the native interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to make interoperability with
OS special scheduling characteristics of threads better, perhaps along the lines above, and if anyone knows of other applications that fall in the category of special OS thread scheduling (not cpu affinity) that would benefit?
Best
Scott
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign specially
scheduled or real-time thread. It would also apparently have the problems
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-15 08:43:27 UTC
Permalink
Hi Robert,
Post by robert engels
One other note, if you use a Go thread/routine - it is still going to be
subject to GC pauses - which can vary greatly. even with OS scheduling
support.
This is a completely different problem, but if it can’t be solved, the OS
priority changes won’t matter.
GC Pauses are solvable by program design.
Post by robert engels
I think for very low-latency audio, you need native threads, with no
dynamic memory allocation, that communicate with the Go threads via shared
memory/queue.
One can't communicate between native threads and go threads without
invoking OS scheduling latency,
so this has the limitations stated before between thread priorities.
Post by robert engels
Or Go needs a lower pause GC, like Azul Zing, but that is proprietary.
To my knowledge, Go has the best GC in terms of latency there is, but I've
not studied the GC's to which you are referring.
Post by robert engels
The Go folks might want to investigate the Shenandoah collector in the
OpenJDK - because in early tests its pretty amazing and open source :)
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your
conclusions are in line with his data, the numbers are a little bit off.
The default jiffy (roughly: the time that the scheduler gives one thread to
occupy a cpu core without interruption) is 0.004 seconds. It was 0.01
seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a
bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities. It
is used in Android AAudio, and Apple CoreAudio for this purpose, which I
would classify as widespread use in real applications. Although you can
get low latency in benchmarks on unloaded machines where you don't have to
wait for a jiffy, this is not considered reliable in audio systems unless
you have dedicated hardware. For similar reasons, it is also not considered
reliable to have sys calls like sigaltstack in cgo callbacks on audio
processing/rendering threads. this portaudio mailing list message
<https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is
a good description.
In answer to the question of how far we can go in audio without scheduling
priorities in the go runtime, it seems to me there are the following known
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys
call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable on
most platforms as the runtime+cgo mechanism stands today. It doesn't
matter how well something is programmed in pure Go or how smartly the work
is divided between cgo and go. It doesn't even matter if someone
benchmarks their system and claims to have had reliable low latency (like
me and Brian), because these issues are caused by the relationship between
go and the host OS thread scheduling and the widespread need for special
thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think it
is a reasonable goal since Go is a good general purpose language, with both
low and high level features. But I think these issues together are a
stopper for reliable latency under about twice a jiffy, which doesn't
really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help issue
1. I have started looking at it and it seems to so far (the code is pretty
deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with the
improvement suggested in the TODO. Any runtime/proc.go gurus willing to
comment?
Robert suggested adding runtime functions to define thread priorities and
affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving 1.
Ian agreed that something like that was necessary but details were unclear.
I have started looking at how to make progress on that more concrete. I
have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this
functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on
“standard hardware”. In the case of equal priorities, I believe the
standard time slice is 100 ms (although 64 ms in many systems). So without
scheduling priority control, if a program needs anything under 100 ms it is
not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC
pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve
with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time
duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency
+ GC pause times. The GC latency improvements are great, and enable a lot,
but the GC operates within the context of OS thread scheduling. A Go app
with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS
need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency
resulting from scheduling happens less often because of support for thread
scheduling. That is, they are more reliable than full Go with goroutines.
It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a
callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class, and
affinity very much for audio.
Best,
Scott
Post by Robert Engels
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
- From a privileged C thread, invoke a cgo-exported Go function. The
Go function can loop (without returning) to perform whatever real-time work
is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does not
need real-time scheduling (such as pre-rendering or decoding chunks of
audio).
FWIW, I have done a couple of experiments with real-time audio in Go in
the past. In 2013 it was possible to get acceptable latency characteristics
for interactive performance on a Linux desktop machine (using the ALSA C
API) without any special scheduling, provided that the main loop did not
allocate. Given the GC latency improvements since then, I would be
surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO)
and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but
it is "real time" according to Apple) do this. There is a strong consensus
that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls.
Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually
offer a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the
case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to
make interoperability with
OS special scheduling characteristics of threads better, perhaps along
the lines above, and if anyone knows of other applications that fall in the
category of special OS thread scheduling (not cpu affinity) that would
benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Robert Engels
2018-09-15 11:41:09 UTC
Permalink
I’m sorry but none of what you stated is true. Unless you do no dynamic memory allocation in ANY thread, all goroutines are going to be subject to the GC pause. You can delicately share memory between Go and a native thread. Just use an unsafe off heap allocation so the memory is not subject to GC. Go pause times are great but no where near state of the art.

Sent from my iPhone
Post by Scott Cotton
Hi Robert,
Post by robert engels
One other note, if you use a Go thread/routine - it is still going to be subject to GC pauses - which can vary greatly. even with OS scheduling support.
This is a completely different problem, but if it can’t be solved, the OS priority changes won’t matter.
GC Pauses are solvable by program design.
Post by robert engels
I think for very low-latency audio, you need native threads, with no dynamic memory allocation, that communicate with the Go threads via shared memory/queue.
One can't communicate between native threads and go threads without invoking OS scheduling latency,
so this has the limitations stated before between thread priorities.
Post by robert engels
Or Go needs a lower pause GC, like Azul Zing, but that is proprietary.
To my knowledge, Go has the best GC in terms of latency there is, but I've not studied the GC's to which you are referring.
Post by robert engels
The Go folks might want to investigate the Shenandoah collector in the OpenJDK - because in early tests its pretty amazing and open source :)
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your conclusions are in line with his data, the numbers are a little bit off. The default jiffy (roughly: the time that the scheduler gives one thread to occupy a cpu core without interruption) is 0.004 seconds. It was 0.01 seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities. It is used in Android AAudio, and Apple CoreAudio for this purpose, which I would classify as widespread use in real applications. Although you can get low latency in benchmarks on unloaded machines where you don't have to wait for a jiffy, this is not considered reliable in audio systems unless you have dedicated hardware. For similar reasons, it is also not considered reliable to have sys calls like sigaltstack in cgo callbacks on audio processing/rendering threads. this portaudio mailing list message is a good description.
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable on most platforms as the runtime+cgo mechanism stands today. It doesn't matter how well something is programmed in pure Go or how smartly the work is divided between cgo and go. It doesn't even matter if someone benchmarks their system and claims to have had reliable low latency (like me and Brian), because these issues are caused by the relationship between go and the host OS thread scheduling and the widespread need for special thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think it is a reasonable goal since Go is a good general purpose language, with both low and high level features. But I think these issues together are a stopper for reliable latency under about twice a jiffy, which doesn't really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help issue 1. I have started looking at it and it seems to so far (the code is pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with the improvement suggested in the TODO. Any runtime/proc.go gurus willing to comment?
Robert suggested adding runtime functions to define thread priorities and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving 1. Ian agreed that something like that was necessary but details were unclear.
I have started looking at how to make progress on that more concrete. I have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on “standard hardware”. In the case of equal priorities, I believe the standard time slice is 100 ms (although 64 ms in many systems). So without scheduling priority control, if a program needs anything under 100 ms it is not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency + GC pause times. The GC latency improvements are great, and enable a lot, but the GC operates within the context of OS thread scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency resulting from scheduling happens less often because of support for thread scheduling. That is, they are more reliable than full Go with goroutines. It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class, and affinity very much for audio.
Best,
Scott
Post by Robert Engels
From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
In other goroutines, perform any background work that does not need real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in the past. In 2013 it was possible to get acceptable latency characteristics for interactive performance on a Linux desktop machine (using the ALSA C API) without any special scheduling, provided that the main loop did not allocate. Given the GC latency improvements since then, I would be surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it is "real time" according to Apple) do this. There is a strong consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio. It could use the native interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to make interoperability with
OS special scheduling characteristics of threads better, perhaps along the lines above, and if anyone knows of other applications that fall in the category of special OS thread scheduling (not cpu affinity) that would benefit?
Best
Scott
Post by Ian Lance Taylor
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign specially
scheduled or real-time thread. It would also apparently have the problems
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-15 11:52:28 UTC
Permalink
Hi Robert,
Post by Robert Engels
I’m sorry but none of what you stated is true.
I don't find that statement constructive.
Post by Robert Engels
Unless you do no dynamic memory allocation in ANY thread, all goroutines
are going to be subject to the GC pause. You can delicately share memory
between Go and a native thread.
Thus one option is to avoid dynamic memory allocation, and GC pauses are
also a function of the amount of memory in the heap and the size of the
pointer graph, which something the programmer can work with.
Post by Robert Engels
Just use an unsafe off heap allocation so the memory is not subject to GC.
Go pause times are great but no where near state of the art.
https://blog.golang.org/ismmkeynote

I'll take the recent keynote at ISSM as authoritative on this question for
now.


Scott
Post by Robert Engels
Sent from my iPhone
Hi Robert,
Post by robert engels
One other note, if you use a Go thread/routine - it is still going to be
subject to GC pauses - which can vary greatly. even with OS scheduling
support.
This is a completely different problem, but if it can’t be solved, the OS
priority changes won’t matter.
GC Pauses are solvable by program design.
Post by robert engels
I think for very low-latency audio, you need native threads, with no
dynamic memory allocation, that communicate with the Go threads via shared
memory/queue.
One can't communicate between native threads and go threads without
invoking OS scheduling latency,
so this has the limitations stated before between thread priorities.
Post by robert engels
Or Go needs a lower pause GC, like Azul Zing, but that is proprietary.
To my knowledge, Go has the best GC in terms of latency there is, but I've
not studied the GC's to which you are referring.
Post by robert engels
The Go folks might want to investigate the Shenandoah collector in the
OpenJDK - because in early tests its pretty amazing and open source :)
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your
conclusions are in line with his data, the numbers are a little bit off.
The default jiffy (roughly: the time that the scheduler gives one thread to
occupy a cpu core without interruption) is 0.004 seconds. It was 0.01
seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a
bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities.
It is used in Android AAudio, and Apple CoreAudio for this purpose, which I
would classify as widespread use in real applications. Although you can
get low latency in benchmarks on unloaded machines where you don't have to
wait for a jiffy, this is not considered reliable in audio systems unless
you have dedicated hardware. For similar reasons, it is also not considered
reliable to have sys calls like sigaltstack in cgo callbacks on audio
processing/rendering threads. this portaudio mailing list message
<https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is
a good description.
In answer to the question of how far we can go in audio without
scheduling priorities in the go runtime, it seems to me there are the
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys
call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable
on most platforms as the runtime+cgo mechanism stands today. It doesn't
matter how well something is programmed in pure Go or how smartly the work
is divided between cgo and go. It doesn't even matter if someone
benchmarks their system and claims to have had reliable low latency (like
me and Brian), because these issues are caused by the relationship between
go and the host OS thread scheduling and the widespread need for special
thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think it
is a reasonable goal since Go is a good general purpose language, with both
low and high level features. But I think these issues together are a
stopper for reliable latency under about twice a jiffy, which doesn't
really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help
issue 1. I have started looking at it and it seems to so far (the code is
pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with
the improvement suggested in the TODO. Any runtime/proc.go gurus willing
to comment?
Robert suggested adding runtime functions to define thread priorities and
affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving
1. Ian agreed that something like that was necessary but details were
unclear.
I have started looking at how to make progress on that more concrete. I
have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this
functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on
“standard hardware”. In the case of equal priorities, I believe the
standard time slice is 100 ms (although 64 ms in many systems). So without
scheduling priority control, if a program needs anything under 100 ms it is
not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC
pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve
with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time
duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling
latency + GC pause times. The GC latency improvements are great, and
enable a lot, but the GC operates within the context of OS thread
scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS
need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency
resulting from scheduling happens less often because of support for thread
scheduling. That is, they are more reliable than full Go with goroutines.
It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a
callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class,
and affinity very much for audio.
Best,
Scott
Post by Robert Engels
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
- From a privileged C thread, invoke a cgo-exported Go function.
The Go function can loop (without returning) to perform whatever real-time
work is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does not
need real-time scheduling (such as pre-rendering or decoding chunks of
audio).
FWIW, I have done a couple of experiments with real-time audio in Go in
the past. In 2013 it was possible to get acceptable latency characteristics
for interactive performance on a Linux desktop machine (using the ALSA C
API) without any special scheduling, provided that the main loop did not
allocate. Given the GC latency improvements since then, I would be
surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO)
and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler,
but it is "real time" according to Apple) do this. There is a strong
consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls.
Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually
offer a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the
case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to
make interoperability with
OS special scheduling characteristics of threads better, perhaps along
the lines above, and if anyone knows of other applications that fall in the
category of special OS thread scheduling (not cpu affinity) that would
benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would
be
Post by Scott Cotton
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-15 20:12:04 UTC
Permalink
Post by Scott Cotton
Hi Robert,
I’m sorry but none of what you stated is true.
I don't find that statement constructive.
I am not sure why. I was simply stating the statement you made were not true. Honestly, your statement is more offensive if you think about it.
Post by Scott Cotton
Unless you do no dynamic memory allocation in ANY thread, all goroutines are going to be subject to the GC pause. You can delicately share memory between Go and a native thread.
Thus one option is to avoid dynamic memory allocation, and GC pauses are also a function of the amount of memory in the heap and the size of the pointer graph, which something the programmer can work with.
GC pauses in Go are not based on the the heap size. The “pause” time is based on number of active threads and stack depth (coupled with the root object set). Still, if the GC is running a lot, it will starve (compete with) the CPU from the application threads making them “slower” due to scheduling, but this is not a pause.

And I said, you can allocate off heap memory to be shared with the native thread that is not subject to GC pauses, and run these threads in real time priority. This is a common technique in many Java libraries and it works in Go as well. I personally don’t use the technique because the pause time is no longer based on heap size (it was previously). It does avoid the overhead of converting “memory layout (e.g. strings)” between the sides.
Post by Scott Cotton
Just use an unsafe off heap allocation so the memory is not subject to GC. Go pause times are great but no where near state of the art.
https://blog.golang.org/ismmkeynote <https://blog.golang.org/ismmkeynote>
I'll take the recent keynote at ISSM as authoritative on this question for now.
I read the presentation. Go currently claims pauses times in the presentation for 2018, are 2 less than 500 us sec pauses per second. The pauses times in Azul Zing are under the 100 us for more than 1 TB of heap, and typically under 10 usec. The Azul GC often requires large heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps have pauses less than 500 usec.

So I will state it again, Go GC is very, very good, but it is not state of the art. It is close.

Most importantly though, this really has nothing to do with the requirements for real-time audio. I was attempting to explain how you could do it.

If you review https://making.pusher.com/golangs-real-time-gc-in-theory-and-practice/ you will see Go has 7 ms allocation pauses. probably too much based on what you’ve stated. I’ve run those tests on my machine using Go 1.11 and I see similar 7 ms pauses times (my Java times using standard G1 are in the 28 ms range). This is a direct link to the relevant code main.go <https://gitlab.com/gasche/gc-latency-experiment/blob/master/main.go>

i was only trying to be helpful, and I don’t appreciate being called out for stating something is untrue, and I don’t think that is productive.
Post by Scott Cotton
Scott
Sent from my iPhone
Post by Scott Cotton
Hi Robert,
One other note, if you use a Go thread/routine - it is still going to be subject to GC pauses - which can vary greatly. even with OS scheduling support.
This is a completely different problem, but if it can’t be solved, the OS priority changes won’t matter.
GC Pauses are solvable by program design.
I think for very low-latency audio, you need native threads, with no dynamic memory allocation, that communicate with the Go threads via shared memory/queue.
One can't communicate between native threads and go threads without invoking OS scheduling latency,
so this has the limitations stated before between thread priorities.
Or Go needs a lower pause GC, like Azul Zing, but that is proprietary.
To my knowledge, Go has the best GC in terms of latency there is, but I've not studied the GC's to which you are referring.
The Go folks might want to investigate the Shenandoah collector in the OpenJDK - because in early tests its pretty amazing and open source :)
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your conclusions are in line with his data, the numbers are a little bit off. The default jiffy (roughly: the time that the scheduler gives one thread to occupy a cpu core without interruption) is 0.004 seconds. It was 0.01 seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities. It is used in Android AAudio, and Apple CoreAudio for this purpose, which I would classify as widespread use in real applications. Although you can get low latency in benchmarks on unloaded machines where you don't have to wait for a jiffy, this is not considered reliable in audio systems unless you have dedicated hardware. For similar reasons, it is also not considered reliable to have sys calls like sigaltstack in cgo callbacks on audio processing/rendering threads. this portaudio mailing list message <https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is a good description.
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable on most platforms as the runtime+cgo mechanism stands today. It doesn't matter how well something is programmed in pure Go or how smartly the work is divided between cgo and go. It doesn't even matter if someone benchmarks their system and claims to have had reliable low latency (like me and Brian), because these issues are caused by the relationship between go and the host OS thread scheduling and the widespread need for special thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think it is a reasonable goal since Go is a good general purpose language, with both low and high level features. But I think these issues together are a stopper for reliable latency under about twice a jiffy, which doesn't really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help issue 1. I have started looking at it and it seems to so far (the code is pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with the improvement suggested in the TODO. Any runtime/proc.go gurus willing to comment?
Robert suggested adding runtime functions to define thread priorities and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving 1. Ian agreed that something like that was necessary but details were unclear.
I have started looking at how to make progress on that more concrete. I have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
The linux kernel can perform context switches in under 5 usecs on “standard hardware”. In the case of equal priorities, I believe the standard time slice is 100 ms (although 64 ms in many systems). So without scheduling priority control, if a program needs anything under 100 ms it is not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency + GC pause times. The GC latency improvements are great, and enable a lot, but the GC operates within the context of OS thread scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency resulting from scheduling happens less often because of support for thread scheduling. That is, they are more reliable than full Go with goroutines. It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class, and affinity very much for audio.
Best,
Scott
From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
In other goroutines, perform any background work that does not need real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in the past. In 2013 it was possible to get acceptable latency characteristics for interactive performance on a Linux desktop machine (using the ALSA C API) without any special scheduling, provided that the main loop did not allocate. Given the GC latency improvements since then, I would be surprised if the “do not allocate” proviso is even still needed.
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it is "real time" according to Apple) do this. There is a strong consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio. It could use the native interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to make interoperability with
OS special scheduling characteristics of threads better, perhaps along the lines above, and if anyone knows of other applications that fall in the category of special OS thread scheduling (not cpu affinity) that would benefit?
Best
Scott
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign specially
scheduled or real-time thread. It would also apparently have the problems
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-15 21:17:33 UTC
Permalink
Hi Robert,
Post by Scott Cotton
Hi Robert,
Post by Robert Engels
I’m sorry but none of what you stated is true.
I don't find that statement constructive.
I am not sure why. I was simply stating the statement you made were not
true. Honestly, your statement is more offensive if you think about it.
I don't read this interchange that way. You said that "none of what" I
said is true. In the context of this thread, the scope of "what I said"
may refer to many many things, as there are lots of interchanges and I said
many things. You may thus have been referring to everything I've brought
up. I don't know.

In any event, when I might think what someone else says is not true, I
would word the response as "I don't understand that", or "that doesn't make
sense to me". Because claiming what another person says is not true,
moreover maybe even all of it, asserts authority over "the" truth over the
other person. To me, there is always a possibility of a miscommunication or
ambiguities in communication, especially with a stranger in a public form,
and such assertions aren't helpful.

For these reasons, I did not find this statement constructive. I am sorry
if stating that offended you. I also don't know what you mean by my
statement being more offensive. I'd like to invite you to explain more off
line, as my intention is only to direct the discussion toward something
fruitful, and any tension between you and I seems a distraction from the
ultimate goal and purpose of this thread and list.

Of course, I appreciate that our views differ and learn what I can from
it. I would appreciate it if that sentiment went many ways.

[...]
Post by Scott Cotton
i was only trying to be helpful, and I don’t appreciate being called out
for stating something is untrue, and I don’t think that is productive.
Again, I'm sorry you didn't find that to be productive. I'll continue the
discussion on my end without reference to this interchange in hopes you
will either accept my response above, or work it out offline.

Best,
Scott
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-15 21:46:56 UTC
Permalink
I was only referring to the three claims you made in the email I was responding to. I could of said it differently and maybe this exchange would of been avoided. Sometimes that happens in email, I am sorry, and I’ll try and keep that in mind for the future.
Post by Scott Cotton
Hi Robert,
Post by Scott Cotton
Hi Robert,
I’m sorry but none of what you stated is true.
I don't find that statement constructive.
I am not sure why. I was simply stating the statement you made were not true. Honestly, your statement is more offensive if you think about it.
I don't read this interchange that way. You said that "none of what" I said is true. In the context of this thread, the scope of "what I said" may refer to many many things, as there are lots of interchanges and I said many things. You may thus have been referring to everything I've brought up. I don't know.
In any event, when I might think what someone else says is not true, I would word the response as "I don't understand that", or "that doesn't make sense to me". Because claiming what another person says is not true, moreover maybe even all of it, asserts authority over "the" truth over the other person. To me, there is always a possibility of a miscommunication or ambiguities in communication, especially with a stranger in a public form, and such assertions aren't helpful.
For these reasons, I did not find this statement constructive. I am sorry if stating that offended you. I also don't know what you mean by my statement being more offensive. I'd like to invite you to explain more off line, as my intention is only to direct the discussion toward something fruitful, and any tension between you and I seems a distraction from the ultimate goal and purpose of this thread and list.
Of course, I appreciate that our views differ and learn what I can from it. I would appreciate it if that sentiment went many ways.
[...]
i was only trying to be helpful, and I don’t appreciate being called out for stating something is untrue, and I don’t think that is productive.
Again, I'm sorry you didn't find that to be productive. I'll continue the discussion on my end without reference to this interchange in hopes you will either accept my response above, or work it out offline.
Best,
Scott
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-15 21:57:21 UTC
Permalink
Post by Scott Cotton
Hi Robert,
[...]
Unless you do no dynamic memory allocation in ANY thread, all goroutines
Post by Robert Engels
are going to be subject to the GC pause. You can delicately share memory
between Go and a native thread.
Thus one option is to avoid dynamic memory allocation, and GC pauses are
also a function of the amount of memory in the heap and the size of the
pointer graph, which something the programmer can work with.
GC pauses in Go are not based on the the heap size. The “pause” time is
based on number of active threads and stack depth (coupled with the root
object set). Still, if the GC is running a lot, it will starve (compete
with) the CPU from the application threads making them “slower” due to
scheduling, but this is not a pause.
I am not a GC expert, but my point is only the programmer has a pretty
reasonable amount of control over
the work presented to the GC, especially in contexts where memory can be
pre-allocated and the program has a dedicated task.
Post by Scott Cotton
And I said, you can allocate off heap memory to be shared with the native
thread that is not subject to GC pauses, and run these threads in real time
priority. This is a common technique in many Java libraries and it works in
Go as well. I personally don’t use the technique because the pause time is
no longer based on heap size (it was previously). It does avoid the
overhead of converting “memory layout (e.g. strings)” between the sides.
This may be worth looking at. My impression is still that the relationship
between Go runtime and OS priveleged special thread scheduling is the main
thing that needs to be considered. It is not clear to me that any
communication between an OS priveleged special thread and a user
goroutine, by sharing memory as above or otherwise addresses the scheduling
problem.
Post by Scott Cotton
Post by Robert Engels
Just use an unsafe off heap allocation so the memory is not subject to
GC. Go pause times are great but no where near state of the art.
https://blog.golang.org/ismmkeynote
I'll take the recent keynote at ISSM as authoritative on this question for now.
I read the presentation. Go currently claims pauses times in the
presentation for 2018, are 2 less than 500 us sec pauses per second. The
pauses times in Azul Zing are under the 100 us for more than 1 TB of heap,
and typically under 10 usec. The Azul GC often requires large
heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the
typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps
have pauses less than 500 usec.
So I will state it again, Go GC is very, very good, but it is not state of
the art. It is close.
Close enough for me.
Post by Scott Cotton
Most importantly though, this really has nothing to do with the
requirements for real-time audio. I was attempting to explain how you could
do it.
Just curious, have you done it?
Post by Scott Cotton
If you review https://making.pusher.com/golangs-real-time-gc-in-
theory-and-practice/ you will see Go has 7 ms allocation pauses. probably
too much based on what you’ve stated. I’ve run those tests on my machine
using Go 1.11 and I see similar 7 ms pauses times (my Java times using
standard G1 are in the 28 ms range). This is a direct link to the relevant
code main.go
<https://gitlab.com/gasche/gc-latency-experiment/blob/master/main.go>
I have read that some time ago, it is interesting. Thanks for bringing it
up.

Scott
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-15 22:35:52 UTC
Permalink
Yes, if you have an isolated program/service that is possible. In the context of a complex ‘midi’ player, with a GUI and lots of services - it is very hard to write GC free code - people try that in Java too and it usually leads to very subtle hard to find bugs especially in concurrent systems. The basic technique is object pool/re-use - very difficult to do error free in a concurrent environment. This is why platforms like LMAX disrupter exists, but even then, as soon as you use a char[] and incorrectly retain a reference, it will get stomped on.

I believe the best solution, and it would probably work well enough is the ‘group/real-time’ enhancements I presented before, but I wouldn’t count on the timeline - thus I offered the solution you could do now.

Even if Go adopted my simple API, it is not that simple... When the goroutines/threads have varying priorities it can lead to starvation, and threads/routines not able to reach a safe point (for stack recording). So often implementations try to run all of the internal threads at a higher-priority than all user threads, but then the GC work blocks the application mutators instead of running concurrently
 So there needs to be a way to temporarily boost their priority when needed
 Sounds simple but there can be lots of race conditions.

I was using real-time threads in Java without JRTS (Java real-time system) very early on, maybe one of the earliest to do so, and needed to work with the Azul staff A LOT tracking down very subtle, but devastating bugs/crashes.
Post by robert engels
Post by Scott Cotton
Hi Robert,
[...]
Post by Scott Cotton
Unless you do no dynamic memory allocation in ANY thread, all goroutines are going to be subject to the GC pause. You can delicately share memory between Go and a native thread.
Thus one option is to avoid dynamic memory allocation, and GC pauses are also a function of the amount of memory in the heap and the size of the pointer graph, which something the programmer can work with.
GC pauses in Go are not based on the the heap size. The “pause” time is based on number of active threads and stack depth (coupled with the root object set). Still, if the GC is running a lot, it will starve (compete with) the CPU from the application threads making them “slower” due to scheduling, but this is not a pause.
I am not a GC expert, but my point is only the programmer has a pretty reasonable amount of control over
the work presented to the GC, especially in contexts where memory can be pre-allocated and the program has a dedicated task.
And I said, you can allocate off heap memory to be shared with the native thread that is not subject to GC pauses, and run these threads in real time priority. This is a common technique in many Java libraries and it works in Go as well. I personally don’t use the technique because the pause time is no longer based on heap size (it was previously). It does avoid the overhead of converting “memory layout (e.g. strings)” between the sides.
This may be worth looking at. My impression is still that the relationship between Go runtime and OS priveleged special thread scheduling is the main thing that needs to be considered. It is not clear to me that any communication between an OS priveleged special thread and a user goroutine, by sharing memory as above or otherwise addresses the scheduling problem.
Post by Scott Cotton
Just use an unsafe off heap allocation so the memory is not subject to GC. Go pause times are great but no where near state of the art.
https://blog.golang.org/ismmkeynote <https://blog.golang.org/ismmkeynote>
I'll take the recent keynote at ISSM as authoritative on this question for now.
I read the presentation. Go currently claims pauses times in the presentation for 2018, are 2 less than 500 us sec pauses per second. The pauses times in Azul Zing are under the 100 us for more than 1 TB of heap, and typically under 10 usec. The Azul GC often requires large heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps have pauses less than 500 usec.
So I will state it again, Go GC is very, very good, but it is not state of the art. It is close.
Close enough for me.
Most importantly though, this really has nothing to do with the requirements for real-time audio. I was attempting to explain how you could do it.
Just curious, have you done it?
If you review https://making.pusher.com/golangs-real-time-gc-in-theory-and-practice/ <https://making.pusher.com/golangs-real-time-gc-in-theory-and-practice/> you will see Go has 7 ms allocation pauses. probably too much based on what you’ve stated. I’ve run those tests on my machine using Go 1.11 and I see similar 7 ms pauses times (my Java times using standard G1 are in the 28 ms range). This is a direct link to the relevant code main.go <https://gitlab.com/gasche/gc-latency-experiment/blob/master/main.go>
I have read that some time ago, it is interesting. Thanks for bringing it up.
Scott
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-16 09:47:22 UTC
Permalink
Post by robert engels
Yes, if you have an isolated program/service that is possible. In the
context of a complex ‘midi’ player, with a GUI and lots of services - it is
very hard to write GC free code - people try that in Java too and it
usually leads to very subtle hard to find bugs especially in concurrent
systems. The basic technique is object pool/re-use - very difficult to do
error free in a concurrent environment. This is why platforms like LMAX
disrupter exists, but even then, as soon as you use a char[] and
incorrectly retain a reference, it will get stomped on.
I believe the best solution, and it would probably work well enough is the
‘group/real-time’ enhancements I presented before, but I wouldn’t count on
the timeline - thus I offered the solution you could do now.
Thanks. Not so worried about the timeline at this point so much as
negative feedback to the priority over time. If there are 10x more (pure)
go cloud users than performance media today, it doesn't mean there would be
tomorrow if it worked well. But citing surveys to justify priorities
doesn't so much allow for such reasoning.
The problem is somewhat related to this article
<https://www.linkedin.com/pulse/ranking-system-fallacies-scott-cotton/> as
a way of ranking needs.
Post by robert engels
Even if Go adopted my simple API, it is not that simple... When the
goroutines/threads have varying priorities it can lead to starvation, and
threads/routines not able to reach a safe point (for stack recording). So
often implementations try to run all of the internal threads at a
higher-priority than all user threads, but then the GC work blocks the
application mutators instead of running concurrently
 So there needs to be
a way to temporarily boost their priority when needed
 Sounds simple but
there can be lots of race conditions.
Indeed it is not simple under the hood. Another potential example of
difficulty I'm wondering about is whether sched_yield() would be necessary
for using high priority threads.
Post by robert engels
I was using real-time threads in Java without JRTS (Java real-time system)
very early on, maybe one of the earliest to do so, and needed to work with
the Azul staff A LOT tracking down very subtle, but devastating
bugs/crashes.
I didn't know Azul had staff or ventured into real time stuff. Sounds like
you've got a lot of experience most of us (including me) might not. Would
love to know more.

Best
Scott
Post by robert engels
Post by Scott Cotton
Hi Robert,
[...]
Unless you do no dynamic memory allocation in ANY thread, all goroutines
Post by Robert Engels
are going to be subject to the GC pause. You can delicately share memory
between Go and a native thread.
Thus one option is to avoid dynamic memory allocation, and GC pauses are
also a function of the amount of memory in the heap and the size of the
pointer graph, which something the programmer can work with.
GC pauses in Go are not based on the the heap size. The “pause” time is
based on number of active threads and stack depth (coupled with the root
object set). Still, if the GC is running a lot, it will starve (compete
with) the CPU from the application threads making them “slower” due to
scheduling, but this is not a pause.
I am not a GC expert, but my point is only the programmer has a pretty
reasonable amount of control over
the work presented to the GC, especially in contexts where memory can be
pre-allocated and the program has a dedicated task.
Post by Scott Cotton
And I said, you can allocate off heap memory to be shared with the native
thread that is not subject to GC pauses, and run these threads in real time
priority. This is a common technique in many Java libraries and it works in
Go as well. I personally don’t use the technique because the pause time is
no longer based on heap size (it was previously). It does avoid the
overhead of converting “memory layout (e.g. strings)” between the sides.
This may be worth looking at. My impression is still that the
relationship between Go runtime and OS priveleged special thread scheduling
is the main thing that needs to be considered. It is not clear to me that
any communication between an OS priveleged special thread and a user
goroutine, by sharing memory as above or otherwise addresses the scheduling
problem.
Post by Scott Cotton
Post by Robert Engels
Just use an unsafe off heap allocation so the memory is not subject to
GC. Go pause times are great but no where near state of the art.
https://blog.golang.org/ismmkeynote
I'll take the recent keynote at ISSM as authoritative on this question for now.
I read the presentation. Go currently claims pauses times in the
presentation for 2018, are 2 less than 500 us sec pauses per second. The
pauses times in Azul Zing are under the 100 us for more than 1 TB of heap,
and typically under 10 usec. The Azul GC often requires large
heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the
typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps
have pauses less than 500 usec.
So I will state it again, Go GC is very, very good, but it is not state
of the art. It is close.
Close enough for me.
Post by Scott Cotton
Most importantly though, this really has nothing to do with the
requirements for real-time audio. I was attempting to explain how you could
do it.
Just curious, have you done it?
Post by Scott Cotton
If you review https://making.pusher.com/golangs-real-time-gc-in-the
ory-and-practice/ you will see Go has 7 ms allocation pauses. probably
too much based on what you’ve stated. I’ve run those tests on my machine
using Go 1.11 and I see similar 7 ms pauses times (my Java times using
standard G1 are in the 28 ms range). This is a direct link to the relevant
code main.go
<https://gitlab.com/gasche/gc-latency-experiment/blob/master/main.go>
I have read that some time ago, it is interesting. Thanks for bringing it up.
Scott
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
r***@golang.org
2018-09-17 18:10:31 UTC
Permalink
Post by Scott Cotton
Post by robert engels
I read the presentation. Go currently claims pauses times in the
presentation for 2018, are 2 less than 500 us sec pauses per second. The
pauses times in Azul Zing are under the 100 us for more than 1 TB of heap,
and typically under 10 usec. The Azul GC often requires large
heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the
typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps
have pauses less than 500 usec.

I had not seen these Java latency numbers in the literature, could you
provide a reference to both the Azul as well as the Shenandoah numbers?
Post by Scott Cotton
Hi Robert,
Post by robert engels
I’m sorry but none of what you stated is true.
I don't find that statement constructive.
I am not sure why. I was simply stating the statement you made were not
true. Honestly, your statement is more offensive if you think about it.
Unless you do no dynamic memory allocation in ANY thread, all goroutines
Post by robert engels
are going to be subject to the GC pause. You can delicately share memory
between Go and a native thread.
Thus one option is to avoid dynamic memory allocation, and GC pauses are
also a function of the amount of memory in the heap and the size of the
pointer graph, which something the programmer can work with.
GC pauses in Go are not based on the the heap size. The “pause” time is
based on number of active threads and stack depth (coupled with the root
object set). Still, if the GC is running a lot, it will starve (compete
with) the CPU from the application threads making them “slower” due to
scheduling, but this is not a pause.
And I said, you can allocate off heap memory to be shared with the native
thread that is not subject to GC pauses, and run these threads in real time
priority. This is a common technique in many Java libraries and it works in
Go as well. I personally don’t use the technique because the pause time is
no longer based on heap size (it was previously). It does avoid the
overhead of converting “memory layout (e.g. strings)” between the sides.
Post by robert engels
Just use an unsafe off heap allocation so the memory is not subject to
GC. Go pause times are great but no where near state of the art.
https://blog.golang.org/ismmkeynote
I'll take the recent keynote at ISSM as authoritative on this question for now.
I read the presentation. Go currently claims pauses times in the
presentation for 2018, are 2 less than 500 us sec pauses per second. The
pauses times in Azul Zing are under the 100 us for more than 1 TB of heap,
and typically under 10 usec. The Azul GC often requires large
heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the
typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps
have pauses less than 500 usec.
So I will state it again, Go GC is very, very good, but it is not state of
the art. It is close.
Most importantly though, this really has nothing to do with the
requirements for real-time audio. I was attempting to explain how you could
do it.
If you review
https://making.pusher.com/golangs-real-time-gc-in-theory-and-practice/
you will see Go has 7 ms allocation pauses. probably too much based on what
you’ve stated. I’ve run those tests on my machine using Go 1.11 and I see
similar 7 ms pauses times (my Java times using standard G1 are in the 28 ms
range). This is a direct link to the relevant code main.go
<https://gitlab.com/gasche/gc-latency-experiment/blob/master/main.go>
i was only trying to be helpful, and I don’t appreciate being called out
for stating something is untrue, and I don’t think that is productive.
Scott
Post by robert engels
Sent from my iPhone
Hi Robert,
Post by robert engels
One other note, if you use a Go thread/routine - it is still going to be
subject to GC pauses - which can vary greatly. even with OS scheduling
support.
This is a completely different problem, but if it can’t be solved, the
OS priority changes won’t matter.
GC Pauses are solvable by program design.
Post by robert engels
I think for very low-latency audio, you need native threads, with no
dynamic memory allocation, that communicate with the Go threads via shared
memory/queue.
One can't communicate between native threads and go threads without
invoking OS scheduling latency,
so this has the limitations stated before between thread priorities.
Post by robert engels
Or Go needs a lower pause GC, like Azul Zing, but that is proprietary.
To my knowledge, Go has the best GC in terms of latency there is, but
I've not studied the GC's to which you are referring.
Post by robert engels
The Go folks might want to investigate the Shenandoah collector in the
OpenJDK - because in early tests its pretty amazing and open source :)
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your
conclusions are in line with his data, the numbers are a little bit off.
The default jiffy (roughly: the time that the scheduler gives one thread to
occupy a cpu core without interruption) is 0.004 seconds. It was 0.01
seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a
bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities.
It is used in Android AAudio, and Apple CoreAudio for this purpose, which I
would classify as widespread use in real applications. Although you can
get low latency in benchmarks on unloaded machines where you don't have to
wait for a jiffy, this is not considered reliable in audio systems unless
you have dedicated hardware. For similar reasons, it is also not considered
reliable to have sys calls like sigaltstack in cgo callbacks on audio
processing/rendering threads. this portaudio mailing list message
<https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is
a good description.
In answer to the question of how far we can go in audio without
scheduling priorities in the go runtime, it seems to me there are the
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys
call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable
on most platforms as the runtime+cgo mechanism stands today. It doesn't
matter how well something is programmed in pure Go or how smartly the work
is divided between cgo and go. It doesn't even matter if someone
benchmarks their system and claims to have had reliable low latency (like
me and Brian), because these issues are caused by the relationship between
go and the host OS thread scheduling and the widespread need for special
thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think
it is a reasonable goal since Go is a good general purpose language, with
both low and high level features. But I think these issues together are a
stopper for reliable latency under about twice a jiffy, which doesn't
really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help
issue 1. I have started looking at it and it seems to so far (the code is
pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with
the improvement suggested in the TODO. Any runtime/proc.go gurus willing
to comment?
Robert suggested adding runtime functions to define thread priorities
and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving
1. Ian agreed that something like that was necessary but details were
unclear.
I have started looking at how to make progress on that more concrete. I
have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this
functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on
“standard hardware”. In the case of equal priorities, I believe the
standard time slice is 100 ms (although 64 ms in many systems). So without
scheduling priority control, if a program needs anything under 100 ms it is
not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the
GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to
achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time
duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling
latency + GC pause times. The GC latency improvements are great, and
enable a lot, but the GC operates within the context of OS thread
scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS
need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency
resulting from scheduling happens less often because of support for thread
scheduling. That is, they are more reliable than full Go with goroutines.
It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a
callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class,
and affinity very much for audio.
Best,
Scott
Post by Robert Engels
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
- From a privileged C thread, invoke a cgo-exported Go function.
The Go function can loop (without returning) to perform whatever real-time
work is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does not
need real-time scheduling (such as pre-rendering or decoding chunks of
audio).
FWIW, I have done a couple of experiments with real-time audio in Go
in the past. In 2013 it was possible to get acceptable latency
characteristics for interactive performance on a Linux desktop machine
(using the ALSA C API) without any special scheduling, provided that the
main loop did not allocate. Given the GC latency improvements since then, I
would be surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio
(SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler,
but it is "real time" according to Apple) do this. There is a strong
consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls.
Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually
offer a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the
case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1
type behaviour.
My question to golang-dev as a whole is if it seems feasible to try
to make interoperability with
OS special scheduling characteristics of threads better, perhaps
along the lines above, and if anyone knows of other applications that fall
in the category of special OS thread scheduling (not cpu affinity) that
would benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would
be
Post by Scott Cotton
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-17 18:39:57 UTC
Permalink
For Zing, https://www.azul.com/products/zing/java-performance/ and these matched our internal tests, although at times given certain work-loads the pauses were closer to 100 us.

For Shenandoah you need to look at the pause times in the logs here,
go to minute 8:18
 the slides are presented elsewhere but I could not find them off hand.

Like I’ve said before though, the ‘pause time’ is only one part of the story - the pause time might only be 1 usec, but if it happens 500,000 times a second, what is the effective “pause time”? Depends on the application, because what you are doing is lowering the progress of the application threads to a degree that they are effectively “paused".

This is why in the other GC tests I referenced, the Go “pause time” is 7-8 ms in the given case - even though the actual pauses are far smaller (probably on the order of 1 ms) - in order to complete the user operation it takes 7 - 8 ms.
Post by Scott Cotton
Post by robert engels
I read the presentation. Go currently claims pauses times in the presentation for 2018, are 2 less than 500 us sec pauses per second. The pauses times in Azul Zing are under the 100 us for more than 1 TB of heap, and typically under 10 usec. The Azul GC often requires large heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps have pauses less than 500 usec.
I had not seen these Java latency numbers in the literature, could you provide a reference to both the Azul as well as the Shenandoah numbers?
Post by Scott Cotton
Hi Robert,
I’m sorry but none of what you stated is true.
I don't find that statement constructive.
I am not sure why. I was simply stating the statement you made were not true. Honestly, your statement is more offensive if you think about it.
Post by Scott Cotton
Unless you do no dynamic memory allocation in ANY thread, all goroutines are going to be subject to the GC pause. You can delicately share memory between Go and a native thread.
Thus one option is to avoid dynamic memory allocation, and GC pauses are also a function of the amount of memory in the heap and the size of the pointer graph, which something the programmer can work with.
GC pauses in Go are not based on the the heap size. The “pause” time is based on number of active threads and stack depth (coupled with the root object set). Still, if the GC is running a lot, it will starve (compete with) the CPU from the application threads making them “slower” due to scheduling, but this is not a pause.
And I said, you can allocate off heap memory to be shared with the native thread that is not subject to GC pauses, and run these threads in real time priority. This is a common technique in many Java libraries and it works in Go as well. I personally don’t use the technique because the pause time is no longer based on heap size (it was previously). It does avoid the overhead of converting “memory layout (e.g. strings)” between the sides.
Post by Scott Cotton
Just use an unsafe off heap allocation so the memory is not subject to GC. Go pause times are great but no where near state of the art.
https://blog.golang.org/ismmkeynote <https://blog.golang.org/ismmkeynote>
I'll take the recent keynote at ISSM as authoritative on this question for now.
I read the presentation. Go currently claims pauses times in the presentation for 2018, are 2 less than 500 us sec pauses per second. The pauses times in Azul Zing are under the 100 us for more than 1 TB of heap, and typically under 10 usec. The Azul GC often requires large heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps have pauses less than 500 usec.
So I will state it again, Go GC is very, very good, but it is not state of the art. It is close.
Most importantly though, this really has nothing to do with the requirements for real-time audio. I was attempting to explain how you could do it.
If you review https://making.pusher.com/golangs-real-time-gc-in-theory-and-practice/ <https://making.pusher.com/golangs-real-time-gc-in-theory-and-practice/> you will see Go has 7 ms allocation pauses. probably too much based on what you’ve stated. I’ve run those tests on my machine using Go 1.11 and I see similar 7 ms pauses times (my Java times using standard G1 are in the 28 ms range). This is a direct link to the relevant code main.go <https://gitlab.com/gasche/gc-latency-experiment/blob/master/main.go>
i was only trying to be helpful, and I don’t appreciate being called out for stating something is untrue, and I don’t think that is productive.
Post by Scott Cotton
Scott
Sent from my iPhone
Post by robert engels
Hi Robert,
One other note, if you use a Go thread/routine - it is still going to be subject to GC pauses - which can vary greatly. even with OS scheduling support.
This is a completely different problem, but if it can’t be solved, the OS priority changes won’t matter.
GC Pauses are solvable by program design.
I think for very low-latency audio, you need native threads, with no dynamic memory allocation, that communicate with the Go threads via shared memory/queue.
One can't communicate between native threads and go threads without invoking OS scheduling latency,
so this has the limitations stated before between thread priorities.
Or Go needs a lower pause GC, like Azul Zing, but that is proprietary.
To my knowledge, Go has the best GC in terms of latency there is, but I've not studied the GC's to which you are referring.
The Go folks might want to investigate the Shenandoah collector in the OpenJDK - because in early tests its pretty amazing and open source :)
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your conclusions are in line with his data, the numbers are a little bit off. The default jiffy (roughly: the time that the scheduler gives one thread to occupy a cpu core without interruption) is 0.004 seconds. It was 0.01 seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities. It is used in Android AAudio, and Apple CoreAudio for this purpose, which I would classify as widespread use in real applications. Although you can get low latency in benchmarks on unloaded machines where you don't have to wait for a jiffy, this is not considered reliable in audio systems unless you have dedicated hardware. For similar reasons, it is also not considered reliable to have sys calls like sigaltstack in cgo callbacks on audio processing/rendering threads. this portaudio mailing list message <https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is a good description.
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable on most platforms as the runtime+cgo mechanism stands today. It doesn't matter how well something is programmed in pure Go or how smartly the work is divided between cgo and go. It doesn't even matter if someone benchmarks their system and claims to have had reliable low latency (like me and Brian), because these issues are caused by the relationship between go and the host OS thread scheduling and the widespread need for special thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think it is a reasonable goal since Go is a good general purpose language, with both low and high level features. But I think these issues together are a stopper for reliable latency under about twice a jiffy, which doesn't really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help issue 1. I have started looking at it and it seems to so far (the code is pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with the improvement suggested in the TODO. Any runtime/proc.go gurus willing to comment?
Robert suggested adding runtime functions to define thread priorities and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving 1. Ian agreed that something like that was necessary but details were unclear.
I have started looking at how to make progress on that more concrete. I have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
The linux kernel can perform context switches in under 5 usecs on “standard hardware”. In the case of equal priorities, I believe the standard time slice is 100 ms (although 64 ms in many systems). So without scheduling priority control, if a program needs anything under 100 ms it is not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency + GC pause times. The GC latency improvements are great, and enable a lot, but the GC operates within the context of OS thread scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency resulting from scheduling happens less often because of support for thread scheduling. That is, they are more reliable than full Go with goroutines. It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class, and affinity very much for audio.
Best,
Scott
From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
In other goroutines, perform any background work that does not need real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in the past. In 2013 it was possible to get acceptable latency characteristics for interactive performance on a Linux desktop machine (using the ALSA C API) without any special scheduling, provided that the main loop did not allocate. Given the GC latency improvements since then, I would be surprised if the “do not allocate” proviso is even still needed.
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it is "real time" according to Apple) do this. There is a strong consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio. It could use the native interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to make interoperability with
OS special scheduling characteristics of threads better, perhaps along the lines above, and if anyone knows of other applications that fall in the category of special OS thread scheduling (not cpu affinity) that would benefit?
Best
Scott
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign specially
scheduled or real-time thread. It would also apparently have the problems
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-17 21:35:29 UTC
Permalink
As a follow-up, I downloaded and built OpenJDK for Java11 with Shenandoah.

I ran the tests at https://github.com/WillSewell/gc-latency-experiment on a quit machine (osx, core i7, 3.4 ghz, 4 cores, 8 threads), slightly modified to perform 10 runs, and report the run time of each run.

Here are the results:

Go 1.11:

Worst push time: 5.777907ms run time 981.361142ms
Worst push time: 6.306577ms run time 752.262192ms
Worst push time: 7.438668ms run time 723.050672ms
Worst push time: 9.169415ms run time 749.984075ms
Worst push time: 7.070763ms run time 727.326469ms
Worst push time: 7.218757ms run time 728.34274ms
Worst push time: 6.865207ms run time 723.579475ms
Worst push time: 7.135745ms run time 724.002589ms
Worst push time: 9.262009ms run time 727.544747ms
Worst push time: 7.54652ms run time 729.091587ms

JDK 11 with G1GC

Worst push time: 15.90627, run time 881
Worst push time: 15.679716, run time 896
Worst push time: 12.650266, run time 738
Worst push time: 12.516753, run time 718
Worst push time: 13.078774, run time 746
Worst push time: 12.578543, run time 724
Worst push time: 11.879806, run time 744
Worst push time: 12.496375, run time 724
Worst push time: 12.188031, run time 729
Worst push time: 12.646902, run time 735

JDK 11 with Shenandoah

Worst push time: 4.316621, run time 582
Worst push time: 3.613893, run time 577
Worst push time: 4.353042, run time 517
Worst push time: 4.33344, run time 502
Worst push time: 4.069009, run time 506
Worst push time: 3.959577, run time 501
Worst push time: 3.949561, run time 516
Worst push time: 3.726912, run time 503
Worst push time: 1.304127, run time 472
Worst push time: 1.436347, run time 489

I am no longer able to test with Zing as I am no longer with the company that had the license.

As I said, Go GC is very, very good, but not state of the art.
For Zing, https://www.azul.com/products/zing/java-performance/ <https://www.azul.com/products/zing/java-performance/> and these matched our internal tests, although at times given certain work-loads the pauses were closer to 100 us.
For Shenandoah you need to look at the pause times in the logs here, http://youtu.be/qBQtbkmURiQ http://youtu.be/qBQtbkmURiQ go to minute 8:18
 the slides are presented elsewhere but I could not find them off hand.
Like I’ve said before though, the ‘pause time’ is only one part of the story - the pause time might only be 1 usec, but if it happens 500,000 times a second, what is the effective “pause time”? Depends on the application, because what you are doing is lowering the progress of the application threads to a degree that they are effectively “paused".
This is why in the other GC tests I referenced, the Go “pause time” is 7-8 ms in the given case - even though the actual pauses are far smaller (probably on the order of 1 ms) - in order to complete the user operation it takes 7 - 8 ms.
Post by Scott Cotton
Post by robert engels
I read the presentation. Go currently claims pauses times in the presentation for 2018, are 2 less than 500 us sec pauses per second. The pauses times in Azul Zing are under the 100 us for more than 1 TB of heap, and typically under 10 usec. The Azul GC often requires large heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps have pauses less than 500 usec.
I had not seen these Java latency numbers in the literature, could you provide a reference to both the Azul as well as the Shenandoah numbers?
Post by Scott Cotton
Hi Robert,
I’m sorry but none of what you stated is true.
I don't find that statement constructive.
I am not sure why. I was simply stating the statement you made were not true. Honestly, your statement is more offensive if you think about it.
Post by Scott Cotton
Unless you do no dynamic memory allocation in ANY thread, all goroutines are going to be subject to the GC pause. You can delicately share memory between Go and a native thread.
Thus one option is to avoid dynamic memory allocation, and GC pauses are also a function of the amount of memory in the heap and the size of the pointer graph, which something the programmer can work with.
GC pauses in Go are not based on the the heap size. The “pause” time is based on number of active threads and stack depth (coupled with the root object set). Still, if the GC is running a lot, it will starve (compete with) the CPU from the application threads making them “slower” due to scheduling, but this is not a pause.
And I said, you can allocate off heap memory to be shared with the native thread that is not subject to GC pauses, and run these threads in real time priority. This is a common technique in many Java libraries and it works in Go as well. I personally don’t use the technique because the pause time is no longer based on heap size (it was previously). It does avoid the overhead of converting “memory layout (e.g. strings)” between the sides.
Post by Scott Cotton
Just use an unsafe off heap allocation so the memory is not subject to GC. Go pause times are great but no where near state of the art.
https://blog.golang.org/ismmkeynote <https://blog.golang.org/ismmkeynote>
I'll take the recent keynote at ISSM as authoritative on this question for now.
I read the presentation. Go currently claims pauses times in the presentation for 2018, are 2 less than 500 us sec pauses per second. The pauses times in Azul Zing are under the 100 us for more than 1 TB of heap, and typically under 10 usec. The Azul GC often requires large heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps have pauses less than 500 usec.
So I will state it again, Go GC is very, very good, but it is not state of the art. It is close.
Most importantly though, this really has nothing to do with the requirements for real-time audio. I was attempting to explain how you could do it.
If you review https://making.pusher.com/golangs-real-time-gc-in-theory-and-practice/ <https://making.pusher.com/golangs-real-time-gc-in-theory-and-practice/> you will see Go has 7 ms allocation pauses. probably too much based on what you’ve stated. I’ve run those tests on my machine using Go 1.11 and I see similar 7 ms pauses times (my Java times using standard G1 are in the 28 ms range). This is a direct link to the relevant code main.go <https://gitlab.com/gasche/gc-latency-experiment/blob/master/main.go>
i was only trying to be helpful, and I don’t appreciate being called out for stating something is untrue, and I don’t think that is productive.
Post by Scott Cotton
Scott
Sent from my iPhone
Post by robert engels
Hi Robert,
One other note, if you use a Go thread/routine - it is still going to be subject to GC pauses - which can vary greatly. even with OS scheduling support.
This is a completely different problem, but if it can’t be solved, the OS priority changes won’t matter.
GC Pauses are solvable by program design.
I think for very low-latency audio, you need native threads, with no dynamic memory allocation, that communicate with the Go threads via shared memory/queue.
One can't communicate between native threads and go threads without invoking OS scheduling latency,
so this has the limitations stated before between thread priorities.
Or Go needs a lower pause GC, like Azul Zing, but that is proprietary.
To my knowledge, Go has the best GC in terms of latency there is, but I've not studied the GC's to which you are referring.
The Go folks might want to investigate the Shenandoah collector in the OpenJDK - because in early tests its pretty amazing and open source :)
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your conclusions are in line with his data, the numbers are a little bit off. The default jiffy (roughly: the time that the scheduler gives one thread to occupy a cpu core without interruption) is 0.004 seconds. It was 0.01 seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities. It is used in Android AAudio, and Apple CoreAudio for this purpose, which I would classify as widespread use in real applications. Although you can get low latency in benchmarks on unloaded machines where you don't have to wait for a jiffy, this is not considered reliable in audio systems unless you have dedicated hardware. For similar reasons, it is also not considered reliable to have sys calls like sigaltstack in cgo callbacks on audio processing/rendering threads. this portaudio mailing list message <https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is a good description.
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable on most platforms as the runtime+cgo mechanism stands today. It doesn't matter how well something is programmed in pure Go or how smartly the work is divided between cgo and go. It doesn't even matter if someone benchmarks their system and claims to have had reliable low latency (like me and Brian), because these issues are caused by the relationship between go and the host OS thread scheduling and the widespread need for special thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think it is a reasonable goal since Go is a good general purpose language, with both low and high level features. But I think these issues together are a stopper for reliable latency under about twice a jiffy, which doesn't really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help issue 1. I have started looking at it and it seems to so far (the code is pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with the improvement suggested in the TODO. Any runtime/proc.go gurus willing to comment?
Robert suggested adding runtime functions to define thread priorities and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving 1. Ian agreed that something like that was necessary but details were unclear.
I have started looking at how to make progress on that more concrete. I have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
The linux kernel can perform context switches in under 5 usecs on “standard hardware”. In the case of equal priorities, I believe the standard time slice is 100 ms (although 64 ms in many systems). So without scheduling priority control, if a program needs anything under 100 ms it is not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency + GC pause times. The GC latency improvements are great, and enable a lot, but the GC operates within the context of OS thread scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency resulting from scheduling happens less often because of support for thread scheduling. That is, they are more reliable than full Go with goroutines. It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class, and affinity very much for audio.
Best,
Scott
From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
In other goroutines, perform any background work that does not need real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in the past. In 2013 it was possible to get acceptable latency characteristics for interactive performance on a Linux desktop machine (using the ALSA C API) without any special scheduling, provided that the main loop did not allocate. Given the GC latency improvements since then, I would be surprised if the “do not allocate” proviso is even still needed.
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it is "real time" according to Apple) do this. There is a strong consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio. It could use the native interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to make interoperability with
OS special scheduling characteristics of threads better, perhaps along the lines above, and if anyone knows of other applications that fall in the category of special OS thread scheduling (not cpu affinity) that would benefit?
Best
Scott
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign specially
scheduled or real-time thread. It would also apparently have the problems
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-18 17:43:18 UTC
Permalink
Another thanks for bringing it up.

Looking at the test itself, it seems not to be so much a burden or worry to
avoid such behaviour if you have
control over the whole program. Clearly allocating like in the test in a
latency sensitive context w.r.t. processing
media/audio in real time would be undesirable even if the GC pulled off the
magic to reduce latencies well below
Go's current scores. But if its in a library where someone else controls
main then who knows.

Best,
Scott
Post by robert engels
As a follow-up, I downloaded and built OpenJDK for Java11 with Shenandoah.
I ran the tests at https://github.com/WillSewell/gc-latency-experiment on
a quit machine (osx, core i7, 3.4 ghz, 4 cores, 8 threads), slightly
modified to perform 10 runs, and report the run time of each run.
Worst push time: 5.777907ms run time 981.361142ms
Worst push time: 6.306577ms run time 752.262192ms
Worst push time: 7.438668ms run time 723.050672ms
Worst push time: 9.169415ms run time 749.984075ms
Worst push time: 7.070763ms run time 727.326469ms
Worst push time: 7.218757ms run time 728.34274ms
Worst push time: 6.865207ms run time 723.579475ms
Worst push time: 7.135745ms run time 724.002589ms
Worst push time: 9.262009ms run time 727.544747ms
Worst push time: 7.54652ms run time 729.091587ms
JDK 11 with G1GC
Worst push time: 15.90627, run time 881
Worst push time: 15.679716, run time 896
Worst push time: 12.650266, run time 738
Worst push time: 12.516753, run time 718
Worst push time: 13.078774, run time 746
Worst push time: 12.578543, run time 724
Worst push time: 11.879806, run time 744
Worst push time: 12.496375, run time 724
Worst push time: 12.188031, run time 729
Worst push time: 12.646902, run time 735
JDK 11 with Shenandoah
Worst push time: 4.316621, run time 582
Worst push time: 3.613893, run time 577
Worst push time: 4.353042, run time 517
Worst push time: 4.33344, run time 502
Worst push time: 4.069009, run time 506
Worst push time: 3.959577, run time 501
Worst push time: 3.949561, run time 516
Worst push time: 3.726912, run time 503
Worst push time: 1.304127, run time 472
Worst push time: 1.436347, run time 489
I am no longer able to test with Zing as I am no longer with the company
that had the license.
As I said, Go GC is very, very good, but not state of the art.
For Zing, https://www.azul.com/products/zing/java-performance/ and these
matched our internal tests, although at times given certain work-loads the
pauses were closer to 100 us.
For Shenandoah you need to look at the pause times in the logs here,
http://youtu.be/qBQtbkmURiQ go to minute 8:18
 the slides
are presented elsewhere but I could not find them off hand.
Like I’ve said before though, the ‘pause time’ is only one part of the
story - the pause time might only be 1 usec, but if it happens 500,000
times a second, what is the effective “pause time”? Depends on the
application, because what you are doing is lowering the progress of the
application threads to a degree that they are effectively “paused".
This is why in the other GC tests I referenced, the Go “pause time” is 7-8
ms in the given case - even though the actual pauses are far smaller
(probably on the order of 1 ms) - in order to complete the user operation
it takes 7 - 8 ms.
Post by Scott Cotton
Post by robert engels
I read the presentation. Go currently claims pauses times in the
presentation for 2018, are 2 less than 500 us sec pauses per second. The
pauses times in Azul Zing are under the 100 us for more than 1 TB of heap,
and typically under 10 usec. The Azul GC often requires large
heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the
typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps
have pauses less than 500 usec.
I had not seen these Java latency numbers in the literature, could you
provide a reference to both the Azul as well as the Shenandoah numbers?
Post by Scott Cotton
Hi Robert,
Post by robert engels
I’m sorry but none of what you stated is true.
I don't find that statement constructive.
I am not sure why. I was simply stating the statement you made were not
true. Honestly, your statement is more offensive if you think about it.
Unless you do no dynamic memory allocation in ANY thread, all goroutines
Post by robert engels
are going to be subject to the GC pause. You can delicately share memory
between Go and a native thread.
Thus one option is to avoid dynamic memory allocation, and GC pauses are
also a function of the amount of memory in the heap and the size of the
pointer graph, which something the programmer can work with.
GC pauses in Go are not based on the the heap size. The “pause” time is
based on number of active threads and stack depth (coupled with the root
object set). Still, if the GC is running a lot, it will starve (compete
with) the CPU from the application threads making them “slower” due to
scheduling, but this is not a pause.
And I said, you can allocate off heap memory to be shared with the native
thread that is not subject to GC pauses, and run these threads in real time
priority. This is a common technique in many Java libraries and it works in
Go as well. I personally don’t use the technique because the pause time is
no longer based on heap size (it was previously). It does avoid the
overhead of converting “memory layout (e.g. strings)” between the sides.
Post by robert engels
Just use an unsafe off heap allocation so the memory is not subject to
GC. Go pause times are great but no where near state of the art.
https://blog.golang.org/ismmkeynote
I'll take the recent keynote at ISSM as authoritative on this question for now.
I read the presentation. Go currently claims pauses times in the
presentation for 2018, are 2 less than 500 us sec pauses per second. The
pauses times in Azul Zing are under the 100 us for more than 1 TB of heap,
and typically under 10 usec. The Azul GC often requires large
heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the
typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps
have pauses less than 500 usec.
So I will state it again, Go GC is very, very good, but it is not state
of the art. It is close.
Most importantly though, this really has nothing to do with the
requirements for real-time audio. I was attempting to explain how you could
do it.
If you review
https://making.pusher.com/golangs-real-time-gc-in-theory-and-practice/
you will see Go has 7 ms allocation pauses. probably too much based on what
you’ve stated. I’ve run those tests on my machine using Go 1.11 and I see
similar 7 ms pauses times (my Java times using standard G1 are in the 28 ms
range). This is a direct link to the relevant code main.go
<https://gitlab.com/gasche/gc-latency-experiment/blob/master/main.go>
i was only trying to be helpful, and I don’t appreciate being called out
for stating something is untrue, and I don’t think that is productive.
Scott
Post by robert engels
Sent from my iPhone
Hi Robert,
Post by robert engels
One other note, if you use a Go thread/routine - it is still going to
be subject to GC pauses - which can vary greatly. even with OS scheduling
support.
This is a completely different problem, but if it can’t be solved, the
OS priority changes won’t matter.
GC Pauses are solvable by program design.
Post by robert engels
I think for very low-latency audio, you need native threads, with no
dynamic memory allocation, that communicate with the Go threads via shared
memory/queue.
One can't communicate between native threads and go threads without
invoking OS scheduling latency,
so this has the limitations stated before between thread priorities.
Post by robert engels
Or Go needs a lower pause GC, like Azul Zing, but that is proprietary.
To my knowledge, Go has the best GC in terms of latency there is, but
I've not studied the GC's to which you are referring.
Post by robert engels
The Go folks might want to investigate the Shenandoah collector in the
OpenJDK - because in early tests its pretty amazing and open source :)
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your
conclusions are in line with his data, the numbers are a little bit off.
The default jiffy (roughly: the time that the scheduler gives one thread to
occupy a cpu core without interruption) is 0.004 seconds. It was 0.01
seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a
bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities.
It is used in Android AAudio, and Apple CoreAudio for this purpose, which I
would classify as widespread use in real applications. Although you can
get low latency in benchmarks on unloaded machines where you don't have to
wait for a jiffy, this is not considered reliable in audio systems unless
you have dedicated hardware. For similar reasons, it is also not considered
reliable to have sys calls like sigaltstack in cgo callbacks on audio
processing/rendering threads. this portaudio mailing list message
<https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is
a good description.
In answer to the question of how far we can go in audio without
scheduling priorities in the go runtime, it seems to me there are the
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys
call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable
on most platforms as the runtime+cgo mechanism stands today. It doesn't
matter how well something is programmed in pure Go or how smartly the work
is divided between cgo and go. It doesn't even matter if someone
benchmarks their system and claims to have had reliable low latency (like
me and Brian), because these issues are caused by the relationship between
go and the host OS thread scheduling and the widespread need for special
thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think
it is a reasonable goal since Go is a good general purpose language, with
both low and high level features. But I think these issues together are a
stopper for reliable latency under about twice a jiffy, which doesn't
really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help
issue 1. I have started looking at it and it seems to so far (the code is
pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with
the improvement suggested in the TODO. Any runtime/proc.go gurus willing
to comment?
Robert suggested adding runtime functions to define thread priorities
and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving
1. Ian agreed that something like that was necessary but details were
unclear.
I have started looking at how to make progress on that more concrete.
I have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this
functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on
“standard hardware”. In the case of equal priorities, I believe the
standard time slice is 100 ms (although 64 ms in many systems). So without
scheduling priority control, if a program needs anything under 100 ms it is
not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the
GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to
achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time
duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling
latency + GC pause times. The GC latency improvements are great, and
enable a lot, but the GC operates within the context of OS thread
scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS
need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency
resulting from scheduling happens less often because of support for thread
scheduling. That is, they are more reliable than full Go with goroutines.
It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a
callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class,
and affinity very much for audio.
Best,
Scott
Post by Robert Engels
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
- From a privileged C thread, invoke a cgo-exported Go function.
The Go function can loop (without returning) to perform whatever real-time
work is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does not
need real-time scheduling (such as pre-rendering or decoding chunks of
audio).
FWIW, I have done a couple of experiments with real-time audio in Go
in the past. In 2013 it was possible to get acceptable latency
characteristics for interactive performance on a Linux desktop machine
(using the ALSA C API) without any special scheduling, provided that the
main loop did not allocate. Given the GC latency improvements since then, I
would be surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio
(SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler,
but it is "real time" according to Apple) do this. There is a strong
consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on
threads supplied
by the above systems. Presumably, this would be via cgo->go calls.
Ian: Was wondering
if the improvements you suggested were related to setting up the
Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually
offer a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go
to start goroutines on them.
In this case, I would imagine it would be nice to be able to have
M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the
case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1
type behaviour.
My question to golang-dev as a whole is if it seems feasible to try
to make interoperability with
OS special scheduling characteristics of threads better, perhaps
along the lines above, and if anyone knows of other applications that fall
in the category of special OS thread scheduling (not cpu affinity) that
would benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it
would be
Post by Scott Cotton
impossible to do without risk on the first scheduling of a
foreign specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-17 09:12:21 UTC
Permalink
Hi all,

After looking at runtime and some thought about the scheduling of special
priority threads, it has
occurred to me that there might be a simple solution to making the full Go
runtime work (channels, goroutines, etc) with special OS thread scheduling.


The idea would be to use pthread_attr_setinheritsched and set it to
PTHREAD_INHERIT_SCHED (maybe under some kind of option)
when creating a new thread.

Although actually getting this working in runtime looks like a hefty task,
and although this would not allow mixing different thread scheduling in one
Go runtime, it seems to me it would allow a Go program to run with all
threads specially scheduled provided that
it was launched by a thread/process with the desired OS scheduling, thus
enabling the possibility of using Go in contexts like a real-time audio
processing chain which use special OS thread scheduling.

It also seems like it would be much simpler than modifying the runtime to
know about OS scheduling priorities.

Any thoughts appreciated.

Best,
Scott
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your
conclusions are in line with his data, the numbers are a little bit off.
The default jiffy (roughly: the time that the scheduler gives one thread to
occupy a cpu core without interruption) is 0.004 seconds. It was 0.01
seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a
bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities. It
is used in Android AAudio, and Apple CoreAudio for this purpose, which I
would classify as widespread use in real applications. Although you can
get low latency in benchmarks on unloaded machines where you don't have to
wait for a jiffy, this is not considered reliable in audio systems unless
you have dedicated hardware. For similar reasons, it is also not considered
reliable to have sys calls like sigaltstack in cgo callbacks on audio
processing/rendering threads. this portaudio mailing list message
<https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is
a good description.
In answer to the question of how far we can go in audio without scheduling
priorities in the go runtime, it seems to me there are the following known
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys
call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in
any way.
A simple conclusion is that low latency audio apps in Go are unreliable on
most platforms as the runtime+cgo mechanism stands today. It doesn't
matter how well something is programmed in pure Go or how smartly the work
is divided between cgo and go. It doesn't even matter if someone
benchmarks their system and claims to have had reliable low latency (like
me and Brian), because these issues are caused by the relationship between
go and the host OS thread scheduling and the widespread need for special
thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think it
is a reasonable goal since Go is a good general purpose language, with both
low and high level features. But I think these issues together are a
stopper for reliable latency under about twice a jiffy, which doesn't
really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help issue
1. I have started looking at it and it seems to so far (the code is pretty
deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with the
improvement suggested in the TODO. Any runtime/proc.go gurus willing to
comment?
Robert suggested adding runtime functions to define thread priorities and
affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving 1.
Ian agreed that something like that was necessary but details were unclear.
I have started looking at how to make progress on that more concrete. I
have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this
functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on
“standard hardware”. In the case of equal priorities, I believe the
standard time slice is 100 ms (although 64 ms in many systems). So without
scheduling priority control, if a program needs anything under 100 ms it is
not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC
pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve
with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time
duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency
+ GC pause times. The GC latency improvements are great, and enable a lot,
but the GC operates within the context of OS thread scheduling. A Go app
with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS
need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency
resulting from scheduling happens less often because of support for thread
scheduling. That is, they are more reliable than full Go with goroutines.
It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a
callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class, and
affinity very much for audio.
Best,
Scott
Post by Robert Engels
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
- From a privileged C thread, invoke a cgo-exported Go function. The
Go function can loop (without returning) to perform whatever real-time work
is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does not
need real-time scheduling (such as pre-rendering or decoding chunks of
audio).
FWIW, I have done a couple of experiments with real-time audio in Go in
the past. In 2013 it was possible to get acceptable latency characteristics
for interactive performance on a Linux desktop machine (using the ALSA C
API) without any special scheduling, provided that the main loop did not
allocate. Given the GC latency improvements since then, I would be
surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO)
and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but
it is "real time" according to Apple) do this. There is a strong consensus
that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls.
Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually
offer a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the
case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to
make interoperability with
OS special scheduling characteristics of threads better, perhaps along
the lines above, and if anyone knows of other applications that fall in the
category of special OS thread scheduling (not cpu affinity) that would
benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Robert Engels
2018-09-17 12:47:55 UTC
Permalink
I would think you could do that now, just start the program, on linux at least, using
chrt.
See https://www.cyberciti.biz/faq/howto-set-real-time-scheduling-priority-process/

The problem there are quite a few internal threads, they should inherit the priority as well since that is the default.
The problem is that if the internal runtime is already using priorities for the scheduler all sorts of bad things might happen.
Post by Scott Cotton
Hi all,
After looking at runtime and some thought about the scheduling of special priority threads, it has
occurred to me that there might be a simple solution to making the full Go runtime work (channels, goroutines, etc) with special OS thread scheduling.
The idea would be to use pthread_attr_setinheritsched and set it to PTHREAD_INHERIT_SCHED (maybe under some kind of option)
when creating a new thread.
Although actually getting this working in runtime looks like a hefty task, and although this would not allow mixing different thread scheduling in one Go runtime, it seems to me it would allow a Go program to run with all threads specially scheduled provided that
it was launched by a thread/process with the desired OS scheduling, thus enabling the possibility of using Go in contexts like a real-time audio processing chain which use special OS thread scheduling.
It also seems like it would be much simpler than modifying the runtime to know about OS scheduling priorities.
Any thoughts appreciated.
Best,
Scott
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your conclusions are in line with his data, the numbers are a little bit off. The default jiffy (roughly: the time that the scheduler gives one thread to occupy a cpu core without interruption) is 0.004 seconds. It was 0.01 seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities. It is used in Android AAudio, and Apple CoreAudio for this purpose, which I would classify as widespread use in real applications. Although you can get low latency in benchmarks on unloaded machines where you don't have to wait for a jiffy, this is not considered reliable in audio systems unless you have dedicated hardware. For similar reasons, it is also not considered reliable to have sys calls like sigaltstack in cgo callbacks on audio processing/rendering threads. this portaudio mailing list message is a good description.
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable on most platforms as the runtime+cgo mechanism stands today. It doesn't matter how well something is programmed in pure Go or how smartly the work is divided between cgo and go. It doesn't even matter if someone benchmarks their system and claims to have had reliable low latency (like me and Brian), because these issues are caused by the relationship between go and the host OS thread scheduling and the widespread need for special thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think it is a reasonable goal since Go is a good general purpose language, with both low and high level features. But I think these issues together are a stopper for reliable latency under about twice a jiffy, which doesn't really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help issue 1. I have started looking at it and it seems to so far (the code is pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with the improvement suggested in the TODO. Any runtime/proc.go gurus willing to comment?
Robert suggested adding runtime functions to define thread priorities and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving 1. Ian agreed that something like that was necessary but details were unclear.
I have started looking at how to make progress on that more concrete. I have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on “standard hardware”. In the case of equal priorities, I believe the standard time slice is 100 ms (although 64 ms in many systems). So without scheduling priority control, if a program needs anything under 100 ms it is not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency + GC pause times. The GC latency improvements are great, and enable a lot, but the GC operates within the context of OS thread scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency resulting from scheduling happens less often because of support for thread scheduling. That is, they are more reliable than full Go with goroutines. It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class, and affinity very much for audio.
Best,
Scott
Post by Robert Engels
From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
In other goroutines, perform any background work that does not need real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in the past. In 2013 it was possible to get acceptable latency characteristics for interactive performance on a Linux desktop machine (using the ALSA C API) without any special scheduling, provided that the main loop did not allocate. Given the GC latency improvements since then, I would be surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it is "real time" according to Apple) do this. There is a strong consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio. It could use the native interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to make interoperability with
OS special scheduling characteristics of threads better, perhaps along the lines above, and if anyone knows of other applications that fall in the category of special OS thread scheduling (not cpu affinity) that would benefit?
Best
Scott
Post by Ian Lance Taylor
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign specially
scheduled or real-time thread. It would also apparently have the problems
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-17 14:48:15 UTC
Permalink
Post by Robert Engels
I would think you could do that now, just start the program, on linux at least, using
chrt.
See https://www.cyberciti.biz/faq/howto-set-real-time-
scheduling-priority-process/
The problem there are quite a few internal threads, they should inherit
the priority as well since that is the default.
Yes. My reading of pthread_attr_setinheritsched from man7.org
<http://man7.org/linux/man-pages/man3/pthread_attr_setinheritsched.3.html> is
that by default the scheduling is
not inherited, except for the case of the bug at the bottom

"""
BUGS top
<http://man7.org/linux/man-pages/man3/pthread_attr_setinheritsched.3.html#top_of_page>

As at glibc 2.8, if a thread attributes object is initialized using
pthread_attr_init(3)
<http://man7.org/linux/man-pages/man3/pthread_attr_init.3.html>, then
the scheduling policy of the attributes
object is set to *SCHED_OTHER *and the scheduling priority is set to 0.
However, if the inherit-scheduler attribute is then set to
*PTHREAD_EXPLICIT_SCHED*, then a thread created using the attribute
object wrongly inherits its scheduling attributes from the creating
thread. This bug does not occur if either the scheduling policy or
scheduling priority attribute is explicitly set in the thread
attributes object before calling pthread_create(3)
<http://man7.org/linux/man-pages/man3/pthread_create.3.html>.

"""


I think the only way to actually do it correctly and ensure the
behaviour was as desired would be to introduce calls that would make
the result independent of bugs like above.
Post by Robert Engels
The problem is that if the internal runtime is already using priorities
for the scheduler all sorts of bad things might happen.
It doesn't look to me like pthread scheduling is currently manipulated
in runtime. It is however opaque because of the optimisations and
trampolining around pthread function calls. my assessment is only
from perusing and grepping for where it would seem such things would
occur. But with all the assembly related to pthreads in runtime, I
could have missed something.

If OS scheduling inheritance were under an option or environmental
variable or build tag then it wouldn't in principle prevent future or
other efforts to introduce OS scheduling into the runtime.


Scott
Post by Robert Engels
Hi all,
After looking at runtime and some thought about the scheduling of special
priority threads, it has
occurred to me that there might be a simple solution to making the full Go
runtime work (channels, goroutines, etc) with special OS thread scheduling.
The idea would be to use pthread_attr_setinheritsched and set it to
PTHREAD_INHERIT_SCHED (maybe under some kind of option)
when creating a new thread.
Although actually getting this working in runtime looks like a hefty task,
and although this would not allow mixing different thread scheduling in one
Go runtime, it seems to me it would allow a Go program to run with all
threads specially scheduled provided that
it was launched by a thread/process with the desired OS scheduling, thus
enabling the possibility of using Go in contexts like a real-time audio
processing chain which use special OS thread scheduling.
It also seems like it would be much simpler than modifying the runtime to
know about OS scheduling priorities.
Any thoughts appreciated.
Best,
Scott
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your
conclusions are in line with his data, the numbers are a little bit off.
The default jiffy (roughly: the time that the scheduler gives one thread to
occupy a cpu core without interruption) is 0.004 seconds. It was 0.01
seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a
bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities.
It is used in Android AAudio, and Apple CoreAudio for this purpose, which I
would classify as widespread use in real applications. Although you can
get low latency in benchmarks on unloaded machines where you don't have to
wait for a jiffy, this is not considered reliable in audio systems unless
you have dedicated hardware. For similar reasons, it is also not considered
reliable to have sys calls like sigaltstack in cgo callbacks on audio
processing/rendering threads. this portaudio mailing list message
<https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is
a good description.
In answer to the question of how far we can go in audio without
scheduling priorities in the go runtime, it seems to me there are the
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys
call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable
on most platforms as the runtime+cgo mechanism stands today. It doesn't
matter how well something is programmed in pure Go or how smartly the work
is divided between cgo and go. It doesn't even matter if someone
benchmarks their system and claims to have had reliable low latency (like
me and Brian), because these issues are caused by the relationship between
go and the host OS thread scheduling and the widespread need for special
thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think it
is a reasonable goal since Go is a good general purpose language, with both
low and high level features. But I think these issues together are a
stopper for reliable latency under about twice a jiffy, which doesn't
really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help
issue 1. I have started looking at it and it seems to so far (the code is
pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with
the improvement suggested in the TODO. Any runtime/proc.go gurus willing
to comment?
Robert suggested adding runtime functions to define thread priorities and
affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving
1. Ian agreed that something like that was necessary but details were
unclear.
I have started looking at how to make progress on that more concrete. I
have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this
functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on
“standard hardware”. In the case of equal priorities, I believe the
standard time slice is 100 ms (although 64 ms in many systems). So without
scheduling priority control, if a program needs anything under 100 ms it is
not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC
pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve
with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time
duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling
latency + GC pause times. The GC latency improvements are great, and
enable a lot, but the GC operates within the context of OS thread
scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS
need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency
resulting from scheduling happens less often because of support for thread
scheduling. That is, they are more reliable than full Go with goroutines.
It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a
callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class,
and affinity very much for audio.
Best,
Scott
Post by Robert Engels
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
- From a privileged C thread, invoke a cgo-exported Go function.
The Go function can loop (without returning) to perform whatever real-time
work is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does not
need real-time scheduling (such as pre-rendering or decoding chunks of
audio).
FWIW, I have done a couple of experiments with real-time audio in Go in
the past. In 2013 it was possible to get acceptable latency characteristics
for interactive performance on a Linux desktop machine (using the ALSA C
API) without any special scheduling, provided that the main loop did not
allocate. Given the GC latency improvements since then, I would be
surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO)
and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler,
but it is "real time" according to Apple) do this. There is a strong
consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls.
Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually
offer a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the
case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to
make interoperability with
OS special scheduling characteristics of threads better, perhaps along
the lines above, and if anyone knows of other applications that fall in the
category of special OS thread scheduling (not cpu affinity) that would
benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would
be
Post by Scott Cotton
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-17 15:06:41 UTC
Permalink
According to the docs,

The default setting of the inherit-scheduler attribute in a newly initialized thread attributes object is PTHREAD_INHERIT_SCHED

so if the runtime doesn’t manipulate it should be inherited.
Post by Robert Engels
I would think you could do that now, just start the program, on linux at least, using
chrt.
See https://www.cyberciti.biz/faq/howto-set-real-time-scheduling-priority-process/ <https://www.cyberciti.biz/faq/howto-set-real-time-scheduling-priority-process/>
The problem there are quite a few internal threads, they should inherit the priority as well since that is the default.
Yes. My reading of pthread_attr_setinheritsched from man7.org <http://man7.org/linux/man-pages/man3/pthread_attr_setinheritsched.3.html> is that by default the scheduling is
not inherited, except for the case of the bug at the bottom
"""
BUGS top <http://man7.org/linux/man-pages/man3/pthread_attr_setinheritsched.3.html#top_of_page>
As at glibc 2.8, if a thread attributes object is initialized using
pthread_attr_init(3) <http://man7.org/linux/man-pages/man3/pthread_attr_init.3.html>, then the scheduling policy of the attributes
object is set to SCHED_OTHER and the scheduling priority is set to 0.
However, if the inherit-scheduler attribute is then set to
PTHREAD_EXPLICIT_SCHED, then a thread created using the attribute
object wrongly inherits its scheduling attributes from the creating
thread. This bug does not occur if either the scheduling policy or
scheduling priority attribute is explicitly set in the thread
attributes object before calling pthread_create(3) <http://man7.org/linux/man-pages/man3/pthread_create.3.html>.
"""
I think the only way to actually do it correctly and ensure the behaviour was as desired would be to introduce calls that would make the result independent of bugs like above.
The problem is that if the internal runtime is already using priorities for the scheduler all sorts of bad things might happen.
It doesn't look to me like pthread scheduling is currently manipulated in runtime. It is however opaque because of the optimisations and trampolining around pthread function calls. my assessment is only from perusing and grepping for where it would seem such things would occur. But with all the assembly related to pthreads in runtime, I could have missed something.
If OS scheduling inheritance were under an option or environmental variable or build tag then it wouldn't in principle prevent future or other efforts to introduce OS scheduling into the runtime.
Scott
Post by Scott Cotton
Hi all,
After looking at runtime and some thought about the scheduling of special priority threads, it has
occurred to me that there might be a simple solution to making the full Go runtime work (channels, goroutines, etc) with special OS thread scheduling.
The idea would be to use pthread_attr_setinheritsched and set it to PTHREAD_INHERIT_SCHED (maybe under some kind of option)
when creating a new thread.
Although actually getting this working in runtime looks like a hefty task, and although this would not allow mixing different thread scheduling in one Go runtime, it seems to me it would allow a Go program to run with all threads specially scheduled provided that
it was launched by a thread/process with the desired OS scheduling, thus enabling the possibility of using Go in contexts like a real-time audio processing chain which use special OS thread scheduling.
It also seems like it would be much simpler than modifying the runtime to know about OS scheduling priorities.
Any thoughts appreciated.
Best,
Scott
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your conclusions are in line with his data, the numbers are a little bit off. The default jiffy (roughly: the time that the scheduler gives one thread to occupy a cpu core without interruption) is 0.004 seconds. It was 0.01 seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities. It is used in Android AAudio, and Apple CoreAudio for this purpose, which I would classify as widespread use in real applications. Although you can get low latency in benchmarks on unloaded machines where you don't have to wait for a jiffy, this is not considered reliable in audio systems unless you have dedicated hardware. For similar reasons, it is also not considered reliable to have sys calls like sigaltstack in cgo callbacks on audio processing/rendering threads. this portaudio mailing list message <https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is a good description.
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable on most platforms as the runtime+cgo mechanism stands today. It doesn't matter how well something is programmed in pure Go or how smartly the work is divided between cgo and go. It doesn't even matter if someone benchmarks their system and claims to have had reliable low latency (like me and Brian), because these issues are caused by the relationship between go and the host OS thread scheduling and the widespread need for special thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think it is a reasonable goal since Go is a good general purpose language, with both low and high level features. But I think these issues together are a stopper for reliable latency under about twice a jiffy, which doesn't really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help issue 1. I have started looking at it and it seems to so far (the code is pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with the improvement suggested in the TODO. Any runtime/proc.go gurus willing to comment?
Robert suggested adding runtime functions to define thread priorities and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving 1. Ian agreed that something like that was necessary but details were unclear.
I have started looking at how to make progress on that more concrete. I have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
The linux kernel can perform context switches in under 5 usecs on “standard hardware”. In the case of equal priorities, I believe the standard time slice is 100 ms (although 64 ms in many systems). So without scheduling priority control, if a program needs anything under 100 ms it is not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling latency + GC pause times. The GC latency improvements are great, and enable a lot, but the GC operates within the context of OS thread scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency resulting from scheduling happens less often because of support for thread scheduling. That is, they are more reliable than full Go with goroutines. It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class, and affinity very much for audio.
Best,
Scott
From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
In other goroutines, perform any background work that does not need real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in the past. In 2013 it was possible to get acceptable latency characteristics for interactive performance on a Linux desktop machine (using the ALSA C API) without any special scheduling, provided that the main loop did not allocate. Given the GC latency improvements since then, I would be surprised if the “do not allocate” proviso is even still needed.
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it is "real time" according to Apple) do this. There is a strong consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls. Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?
At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio. It could use the native interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.
My question to golang-dev as a whole is if it seems feasible to try to make interoperability with
OS special scheduling characteristics of threads better, perhaps along the lines above, and if anyone knows of other applications that fall in the category of special OS thread scheduling (not cpu affinity) that would benefit?
Best
Scott
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign specially
scheduled or real-time thread. It would also apparently have the problems
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
Scott Cotton
http://www.iri-labs.com <http://www.iri-labs.com/>
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-17 15:20:10 UTC
Permalink
Yep, thanks my bad.

Scott
Post by robert engels
According to the docs,
The default setting of the inherit-scheduler attribute in a newly
initialized thread attributes object is *PTHREAD_INHERIT_SCHED*
so if the runtime doesn’t manipulate it should be inherited.
Post by Robert Engels
I would think you could do that now, just start the program, on linux at least, using
chrt.
See https://www.cyberciti.biz/faq/howto-set-real-time-schedu
ling-priority-process/
The problem there are quite a few internal threads, they should inherit
the priority as well since that is the default.
Yes. My reading of pthread_attr_setinheritsched from man7.org
<http://man7.org/linux/man-pages/man3/pthread_attr_setinheritsched.3.html> is
that by default the scheduling is
not inherited, except for the case of the bug at the bottom
"""
BUGS top
<http://man7.org/linux/man-pages/man3/pthread_attr_setinheritsched.3.html#top_of_page>
As at glibc 2.8, if a thread attributes object is initialized using
pthread_attr_init(3) <http://man7.org/linux/man-pages/man3/pthread_attr_init.3.html>, then the scheduling policy of the attributes
object is set to *SCHED_OTHER *and the scheduling priority is set to 0.
However, if the inherit-scheduler attribute is then set to
*PTHREAD_EXPLICIT_SCHED*, then a thread created using the attribute
object wrongly inherits its scheduling attributes from the creating
thread. This bug does not occur if either the scheduling policy or
scheduling priority attribute is explicitly set in the thread
attributes object before calling pthread_create(3) <http://man7.org/linux/man-pages/man3/pthread_create.3.html>.
"""
I think the only way to actually do it correctly and ensure the behaviour was as desired would be to introduce calls that would make the result independent of bugs like above.
Post by Robert Engels
The problem is that if the internal runtime is already using priorities
for the scheduler all sorts of bad things might happen.
It doesn't look to me like pthread scheduling is currently manipulated in runtime. It is however opaque because of the optimisations and trampolining around pthread function calls. my assessment is only from perusing and grepping for where it would seem such things would occur. But with all the assembly related to pthreads in runtime, I could have missed something.
If OS scheduling inheritance were under an option or environmental variable or build tag then it wouldn't in principle prevent future or other efforts to introduce OS scheduling into the runtime.
Scott
Post by Robert Engels
Hi all,
After looking at runtime and some thought about the scheduling of special
priority threads, it has
occurred to me that there might be a simple solution to making the full
Go runtime work (channels, goroutines, etc) with special OS thread
scheduling.
The idea would be to use pthread_attr_setinheritsched and set it to
PTHREAD_INHERIT_SCHED (maybe under some kind of option)
when creating a new thread.
Although actually getting this working in runtime looks like a hefty
task, and although this would not allow mixing different thread scheduling
in one Go runtime, it seems to me it would allow a Go program to run with
all threads specially scheduled provided that
it was launched by a thread/process with the desired OS scheduling, thus
enabling the possibility of using Go in contexts like a real-time audio
processing chain which use special OS thread scheduling.
It also seems like it would be much simpler than modifying the runtime to
know about OS scheduling priorities.
Any thoughts appreciated.
Best,
Scott
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your
conclusions are in line with his data, the numbers are a little bit off.
The default jiffy (roughly: the time that the scheduler gives one thread to
occupy a cpu core without interruption) is 0.004 seconds. It was 0.01
seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a
bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities.
It is used in Android AAudio, and Apple CoreAudio for this purpose, which I
would classify as widespread use in real applications. Although you can
get low latency in benchmarks on unloaded machines where you don't have to
wait for a jiffy, this is not considered reliable in audio systems unless
you have dedicated hardware. For similar reasons, it is also not considered
reliable to have sys calls like sigaltstack in cgo callbacks on audio
processing/rendering threads. this portaudio mailing list message
<https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is
a good description.
In answer to the question of how far we can go in audio without
scheduling priorities in the go runtime, it seems to me there are the
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys
call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable
on most platforms as the runtime+cgo mechanism stands today. It doesn't
matter how well something is programmed in pure Go or how smartly the work
is divided between cgo and go. It doesn't even matter if someone
benchmarks their system and claims to have had reliable low latency (like
me and Brian), because these issues are caused by the relationship between
go and the host OS thread scheduling and the widespread need for special
thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think
it is a reasonable goal since Go is a good general purpose language, with
both low and high level features. But I think these issues together are a
stopper for reliable latency under about twice a jiffy, which doesn't
really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help
issue 1. I have started looking at it and it seems to so far (the code is
pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with
the improvement suggested in the TODO. Any runtime/proc.go gurus willing
to comment?
Robert suggested adding runtime functions to define thread priorities
and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving
1. Ian agreed that something like that was necessary but details were
unclear.
I have started looking at how to make progress on that more concrete. I
have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this
functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on
“standard hardware”. In the case of equal priorities, I believe the
standard time slice is 100 ms (although 64 ms in many systems). So without
scheduling priority control, if a program needs anything under 100 ms it is
not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the
GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to
achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time
duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling
latency + GC pause times. The GC latency improvements are great, and
enable a lot, but the GC operates within the context of OS thread
scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS
need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency
resulting from scheduling happens less often because of support for thread
scheduling. That is, they are more reliable than full Go with goroutines.
It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a
callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class,
and affinity very much for audio.
Best,
Scott
Post by Robert Engels
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
- From a privileged C thread, invoke a cgo-exported Go function.
The Go function can loop (without returning) to perform whatever real-time
work is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does not
need real-time scheduling (such as pre-rendering or decoding chunks of
audio).
FWIW, I have done a couple of experiments with real-time audio in Go
in the past. In 2013 it was possible to get acceptable latency
characteristics for interactive performance on a Linux desktop machine
(using the ALSA C API) without any special scheduling, provided that the
main loop did not allocate. Given the GC latency improvements since then, I
would be surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio
(SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler,
but it is "real time" according to Apple) do this. There is a strong
consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems. Presumably, this would be via cgo->go calls.
Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually
offer a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the
case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1
type behaviour.
My question to golang-dev as a whole is if it seems feasible to try
to make interoperability with
OS special scheduling characteristics of threads better, perhaps
along the lines above, and if anyone knows of other applications that fall
in the category of special OS thread scheduling (not cpu affinity) that
would benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would
be
Post by Scott Cotton
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-17 22:08:26 UTC
Permalink
Just wanted to include the related go-nuts message from Ian for potential
Wanted to ask about the Go runtime use of threads. Specifically, suppose
I've got an app in mind that would run OS-priveleged and use specially
scheduled threads, like SCHED_RR in linux for example.
One could do this with chrt or calling from a process/thread at the
desired
scheduling priority/type (as pointed out on a related thread in
golang-dev)
The question is: does this as of go1.11 interfere with Go runtime internal
prioritising of threads?
The other question is: may it one day interfere with Go runtime internal
prioritising of threads?
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.
Yep, thanks my bad.
Scott
Post by robert engels
According to the docs,
The default setting of the inherit-scheduler attribute in a newly
initialized thread attributes object is *PTHREAD_INHERIT_SCHED*
so if the runtime doesn’t manipulate it should be inherited.
Post by Robert Engels
I would think you could do that now, just start the program, on linux at least, using
chrt.
See https://www.cyberciti.biz/faq/howto-set-real-time-schedu
ling-priority-process/
The problem there are quite a few internal threads, they should inherit
the priority as well since that is the default.
Yes. My reading of pthread_attr_setinheritsched from man7.org
<http://man7.org/linux/man-pages/man3/pthread_attr_setinheritsched.3.html> is
that by default the scheduling is
not inherited, except for the case of the bug at the bottom
"""
BUGS top
<http://man7.org/linux/man-pages/man3/pthread_attr_setinheritsched.3.html#top_of_page>
As at glibc 2.8, if a thread attributes object is initialized using
pthread_attr_init(3) <http://man7.org/linux/man-pages/man3/pthread_attr_init.3.html>, then the scheduling policy of the attributes
object is set to *SCHED_OTHER *and the scheduling priority is set to 0.
However, if the inherit-scheduler attribute is then set to
*PTHREAD_EXPLICIT_SCHED*, then a thread created using the attribute
object wrongly inherits its scheduling attributes from the creating
thread. This bug does not occur if either the scheduling policy or
scheduling priority attribute is explicitly set in the thread
attributes object before calling pthread_create(3) <http://man7.org/linux/man-pages/man3/pthread_create.3.html>.
"""
I think the only way to actually do it correctly and ensure the behaviour was as desired would be to introduce calls that would make the result independent of bugs like above.
Post by Robert Engels
The problem is that if the internal runtime is already using priorities
for the scheduler all sorts of bad things might happen.
It doesn't look to me like pthread scheduling is currently manipulated in runtime. It is however opaque because of the optimisations and trampolining around pthread function calls. my assessment is only from perusing and grepping for where it would seem such things would occur. But with all the assembly related to pthreads in runtime, I could have missed something.
If OS scheduling inheritance were under an option or environmental variable or build tag then it wouldn't in principle prevent future or other efforts to introduce OS scheduling into the runtime.
Scott
Post by Robert Engels
Hi all,
After looking at runtime and some thought about the scheduling of
special priority threads, it has
occurred to me that there might be a simple solution to making the full
Go runtime work (channels, goroutines, etc) with special OS thread
scheduling.
The idea would be to use pthread_attr_setinheritsched and set it to
PTHREAD_INHERIT_SCHED (maybe under some kind of option)
when creating a new thread.
Although actually getting this working in runtime looks like a hefty
task, and although this would not allow mixing different thread scheduling
in one Go runtime, it seems to me it would allow a Go program to run with
all threads specially scheduled provided that
it was launched by a thread/process with the desired OS scheduling, thus
enabling the possibility of using Go in contexts like a real-time audio
processing chain which use special OS thread scheduling.
It also seems like it would be much simpler than modifying the runtime
to know about OS scheduling priorities.
Any thoughts appreciated.
Best,
Scott
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your
conclusions are in line with his data, the numbers are a little bit off.
The default jiffy (roughly: the time that the scheduler gives one thread to
occupy a cpu core without interruption) is 0.004 seconds. It was 0.01
seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a
bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread priorities.
It is used in Android AAudio, and Apple CoreAudio for this purpose, which I
would classify as widespread use in real applications. Although you can
get low latency in benchmarks on unloaded machines where you don't have to
wait for a jiffy, this is not considered reliable in audio systems unless
you have dedicated hardware. For similar reasons, it is also not considered
reliable to have sys calls like sigaltstack in cgo callbacks on audio
processing/rendering threads. this portaudio mailing list message
<https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is
a good description.
In answer to the question of how far we can go in audio without
scheduling priorities in the go runtime, it seems to me there are the
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys
call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are unreliable
on most platforms as the runtime+cgo mechanism stands today. It doesn't
matter how well something is programmed in pure Go or how smartly the work
is divided between cgo and go. It doesn't even matter if someone
benchmarks their system and claims to have had reliable low latency (like
me and Brian), because these issues are caused by the relationship between
go and the host OS thread scheduling and the widespread need for special
thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think
it is a reasonable goal since Go is a good general purpose language, with
both low and high level features. But I think these issues together are a
stopper for reliable latency under about twice a jiffy, which doesn't
really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help
issue 1. I have started looking at it and it seems to so far (the code is
pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with
the improvement suggested in the TODO. Any runtime/proc.go gurus willing
to comment?
Robert suggested adding runtime functions to define thread priorities
and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving
1. Ian agreed that something like that was necessary but details were
unclear.
I have started looking at how to make progress on that more concrete.
I have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this
functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on
“standard hardware”. In the case of equal priorities, I believe the
standard time slice is 100 ms (although 64 ms in many systems). So without
scheduling priority control, if a program needs anything under 100 ms it is
not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the
GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to
achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time
duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling
latency + GC pause times. The GC latency improvements are great, and
enable a lot, but the GC operates within the context of OS thread
scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the OS
need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency
resulting from scheduling happens less often because of support for thread
scheduling. That is, they are more reliable than full Go with goroutines.
It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a
callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class,
and affinity very much for audio.
Best,
Scott
Post by Robert Engels
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
- From a privileged C thread, invoke a cgo-exported Go function.
The Go function can loop (without returning) to perform whatever real-time
work is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does not
need real-time scheduling (such as pre-rendering or decoding chunks of
audio).
FWIW, I have done a couple of experiments with real-time audio in Go
in the past. In 2013 it was possible to get acceptable latency
characteristics for interactive performance on a Linux desktop machine
(using the ALSA C API) without any special scheduling, provided that the
main loop did not allocate. Given the GC latency improvements since then, I
would be surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio
(SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler,
but it is "real time" according to Apple) do this. There is a strong
consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by this.
At the level of unprivileged access, Go would need to operate on
threads supplied
by the above systems. Presumably, this would be via cgo->go calls.
Ian: Was wondering
if the improvements you suggested were related to setting up the
Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually
offer a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go
to start goroutines on them.
In this case, I would imagine it would be nice to be able to have
M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the
case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1
type behaviour.
My question to golang-dev as a whole is if it seems feasible to try
to make interoperability with
OS special scheduling characteristics of threads better, perhaps
along the lines above, and if anyone knows of other applications that fall
in the category of special OS thread scheduling (not cpu affinity) that
would benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it
would be
Post by Scott Cotton
impossible to do without risk on the first scheduling of a
foreign specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
Scott Cotton
http://www.iri-labs.com
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-17 22:30:10 UTC
Permalink
And here is a follow up
Post by Scott Cotton
Just wanted to include the related go-nuts message from Ian for potential
Wanted to ask about the Go runtime use of threads. Specifically, suppose
I've got an app in mind that would run OS-priveleged and use specially
scheduled threads, like SCHED_RR in linux for example.
One could do this with chrt or calling from a process/thread at the
desired
scheduling priority/type (as pointed out on a related thread in
golang-dev)
The question is: does this as of go1.11 interfere with Go runtime
internal
prioritising of threads?
The other question is: may it one day interfere with Go runtime internal
prioritising of threads?
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.
If in Go, even moreover in non-runtime Go but user Go, one calls
runtime.LockOSThread and
then sets a priority and then creates another goroutine, then I would have
thought that that thread
may create another thread. Given the inheritance of scheduling priority,
if this were the case, then there
would be a leak of the protection w.r.t. scheduling priority. Any
thoughts? Is there some guard against an
m creating a new m if it is associated with a g via LockOSThread? I
couldn't find one, but it's not easy to verify
things like this in runtime without spending some time playing with the
code.

Second, a command like chrt and the lack of guarding against scheduling
priority inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me, so
I was a bit surprised.

But at any rate, for what I'm looking at doing, something like chrt without
LockOSThread would be much more interesting.
Perhaps running the go test suite in an os-priveleged context replacing the
executables with chrt wrappers would be a good
place to start examining this.


Scott
Post by Scott Cotton
Yep, thanks my bad.
Scott
Post by robert engels
According to the docs,
The default setting of the inherit-scheduler attribute in a newly
initialized thread attributes object is *PTHREAD_INHERIT_SCHED*
so if the runtime doesn’t manipulate it should be inherited.
Post by Robert Engels
I would think you could do that now, just start the program, on linux at least, using
chrt.
See
https://www.cyberciti.biz/faq/howto-set-real-time-scheduling-priority-process/
The problem there are quite a few internal threads, they should inherit
the priority as well since that is the default.
Yes. My reading of pthread_attr_setinheritsched from man7.org
<http://man7.org/linux/man-pages/man3/pthread_attr_setinheritsched.3.html> is
that by default the scheduling is
not inherited, except for the case of the bug at the bottom
"""
BUGS top
<http://man7.org/linux/man-pages/man3/pthread_attr_setinheritsched.3.html#top_of_page>
As at glibc 2.8, if a thread attributes object is initialized using
pthread_attr_init(3) <http://man7.org/linux/man-pages/man3/pthread_attr_init.3.html>, then the scheduling policy of the attributes
object is set to *SCHED_OTHER *and the scheduling priority is set to 0.
However, if the inherit-scheduler attribute is then set to
*PTHREAD_EXPLICIT_SCHED*, then a thread created using the attribute
object wrongly inherits its scheduling attributes from the creating
thread. This bug does not occur if either the scheduling policy or
scheduling priority attribute is explicitly set in the thread
attributes object before calling pthread_create(3) <http://man7.org/linux/man-pages/man3/pthread_create.3.html>.
"""
I think the only way to actually do it correctly and ensure the behaviour was as desired would be to introduce calls that would make the result independent of bugs like above.
Post by Robert Engels
The problem is that if the internal runtime is already using priorities
for the scheduler all sorts of bad things might happen.
It doesn't look to me like pthread scheduling is currently manipulated in runtime. It is however opaque because of the optimisations and trampolining around pthread function calls. my assessment is only from perusing and grepping for where it would seem such things would occur. But with all the assembly related to pthreads in runtime, I could have missed something.
If OS scheduling inheritance were under an option or environmental variable or build tag then it wouldn't in principle prevent future or other efforts to introduce OS scheduling into the runtime.
Scott
Post by Robert Engels
Hi all,
After looking at runtime and some thought about the scheduling of
special priority threads, it has
occurred to me that there might be a simple solution to making the full
Go runtime work (channels, goroutines, etc) with special OS thread
scheduling.
The idea would be to use pthread_attr_setinheritsched and set it to
PTHREAD_INHERIT_SCHED (maybe under some kind of option)
when creating a new thread.
Although actually getting this working in runtime looks like a hefty
task, and although this would not allow mixing different thread scheduling
in one Go runtime, it seems to me it would allow a Go program to run with
all threads specially scheduled provided that
it was launched by a thread/process with the desired OS scheduling,
thus enabling the possibility of using Go in contexts like a real-time
audio processing chain which use special OS thread scheduling.
It also seems like it would be much simpler than modifying the runtime
to know about OS scheduling priorities.
Any thoughts appreciated.
Best,
Scott
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your
conclusions are in line with his data, the numbers are a little bit off.
The default jiffy (roughly: the time that the scheduler gives one thread to
occupy a cpu core without interruption) is 0.004 seconds. It was 0.01
seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a
bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread
priorities. It is used in Android AAudio, and Apple CoreAudio for this
purpose, which I would classify as widespread use in real applications.
Although you can get low latency in benchmarks on unloaded machines where
you don't have to wait for a jiffy, this is not considered reliable in
audio systems unless you have dedicated hardware. For similar reasons, it
is also not considered reliable to have sys calls like sigaltstack in cgo
callbacks on audio processing/rendering threads. this portaudio
mailing list message
<https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is
a good description.
In answer to the question of how far we can go in audio without
scheduling priorities in the go runtime, it seems to me there are the
1. no goroutines and sigaltstack/cgo->go overhead (which involves a
sys call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.
A simple conclusion is that low latency audio apps in Go are
unreliable on most platforms as the runtime+cgo mechanism stands today. It
doesn't matter how well something is programmed in pure Go or how smartly
the work is divided between cgo and go. It doesn't even matter if someone
benchmarks their system and claims to have had reliable low latency (like
me and Brian), because these issues are caused by the relationship between
go and the host OS thread scheduling and the widespread need for special
thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I think
it is a reasonable goal since Go is a good general purpose language, with
both low and high level features. But I think these issues together are a
stopper for reliable latency under about twice a jiffy, which doesn't
really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help
issue 1. I have started looking at it and it seems to so far (the code is
pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even with
the improvement suggested in the TODO. Any runtime/proc.go gurus willing
to comment?
Robert suggested adding runtime functions to define thread priorities
and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving
1. Ian agreed that something like that was necessary but details were
unclear.
I have started looking at how to make progress on that more concrete.
I have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this
functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on
“standard hardware”. In the case of equal priorities, I believe the
standard time slice is 100 ms (although 64 ms in many systems). So without
scheduling priority control, if a program needs anything under 100 ms it is
not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given the
GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to
achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time
duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling
latency + GC pause times. The GC latency improvements are great, and
enable a lot, but the GC operates within the context of OS thread
scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the
OS need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency
resulting from scheduling happens less often because of support for thread
scheduling. That is, they are more reliable than full Go with goroutines.
It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a
callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling class,
and affinity very much for audio.
Best,
Scott
Post by Robert Engels
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
- From a privileged C thread, invoke a cgo-exported Go function.
The Go function can loop (without returning) to perform whatever real-time
work is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does not
need real-time scheduling (such as pre-rendering or decoding chunks of
audio).
FWIW, I have done a couple of experiments with real-time audio in Go
in the past. In 2013 it was possible to get acceptable latency
characteristics for interactive performance on a Linux desktop machine
(using the ALSA C API) without any special scheduling, provided that the
main loop did not allocate. Given the GC latency improvements since then, I
would be surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling. For example, AAudio
(SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler,
but it is "real time" according to Apple) do this. There is a strong
consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had
any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go
implied by this.
At the level of unprivileged access, Go would need to operate on
threads supplied
by the above systems. Presumably, this would be via cgo->go
calls. Ian: Was wondering
if the improvements you suggested were related to setting up the
Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually
offer a replacement for
things like AAudio and CoreAudio. It could use the native
interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go
to start goroutines on them.
In this case, I would imagine it would be nice to be able to have
M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in
the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1
type behaviour.
My question to golang-dev as a whole is if it seems feasible to try
to make interoperability with
OS special scheduling characteristics of threads better, perhaps
along the lines above, and if anyone knows of other applications that fall
in the category of special OS thread scheduling (not cpu affinity) that
would benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it
would be
Post by Scott Cotton
impossible to do without risk on the first scheduling of a
foreign specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have
the problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
Scott Cotton
http://www.iri-labs.com
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Keith Randall' via golang-dev
2018-09-17 22:41:29 UTC
Permalink
We changed the runtime to never span a new OS thread from a thread that is
currently locked with LockOSThread.
We have dedicated clean thread that can spawn new OS threads when needed.
We did this precisely because people do strange things to OS threads after
doing a LockOSThread, and we don't want to copy those strange things to
unrelated OS threads.

See https://github.com/golang/go/issues/20676
Post by Scott Cotton
And here is a follow up
Post by Scott Cotton
Just wanted to include the related go-nuts message from Ian for potential
Wanted to ask about the Go runtime use of threads. Specifically,
suppose
I've got an app in mind that would run OS-priveleged and use specially
scheduled threads, like SCHED_RR in linux for example.
One could do this with chrt or calling from a process/thread at the
desired
scheduling priority/type (as pointed out on a related thread in
golang-dev)
The question is: does this as of go1.11 interfere with Go runtime
internal
prioritising of threads?
The other question is: may it one day interfere with Go runtime
internal
prioritising of threads?
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.
If in Go, even moreover in non-runtime Go but user Go, one calls
runtime.LockOSThread and
then sets a priority and then creates another goroutine, then I would have
thought that that thread
may create another thread. Given the inheritance of scheduling priority,
if this were the case, then there
would be a leak of the protection w.r.t. scheduling priority. Any
thoughts? Is there some guard against an
m creating a new m if it is associated with a g via LockOSThread? I
couldn't find one, but it's not easy to verify
things like this in runtime without spending some time playing with the
code.
Second, a command like chrt and the lack of guarding against scheduling
priority inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me, so
I was a bit surprised.
But at any rate, for what I'm looking at doing, something like chrt
without LockOSThread would be much more interesting.
Perhaps running the go test suite in an os-priveleged context replacing
the executables with chrt wrappers would be a good
place to start examining this.
Scott
Post by Scott Cotton
Yep, thanks my bad.
Scott
Post by robert engels
According to the docs,
The default setting of the inherit-scheduler attribute in a newly
initialized thread attributes object is *PTHREAD_INHERIT_SCHED*
so if the runtime doesn’t manipulate it should be inherited.
Post by Robert Engels
I would think you could do that now, just start the program, on linux
at least, using
chrt.
See
https://www.cyberciti.biz/faq/howto-set-real-time-scheduling-priority-process/
The problem there are quite a few internal threads, they should
inherit the priority as well since that is the default.
Yes. My reading of pthread_attr_setinheritsched from man7.org
<http://man7.org/linux/man-pages/man3/pthread_attr_setinheritsched.3.html> is
that by default the scheduling is
not inherited, except for the case of the bug at the bottom
"""
BUGS top
<http://man7.org/linux/man-pages/man3/pthread_attr_setinheritsched.3.html#top_of_page>
As at glibc 2.8, if a thread attributes object is initialized using
pthread_attr_init(3) <http://man7.org/linux/man-pages/man3/pthread_attr_init.3.html>, then the scheduling policy of the attributes
object is set to *SCHED_OTHER *and the scheduling priority is set to 0.
However, if the inherit-scheduler attribute is then set to
*PTHREAD_EXPLICIT_SCHED*, then a thread created using the attribute
object wrongly inherits its scheduling attributes from the creating
thread. This bug does not occur if either the scheduling policy or
scheduling priority attribute is explicitly set in the thread
attributes object before calling pthread_create(3) <http://man7.org/linux/man-pages/man3/pthread_create.3.html>.
"""
I think the only way to actually do it correctly and ensure the behaviour was as desired would be to introduce calls that would make the result independent of bugs like above.
Post by Robert Engels
The problem is that if the internal runtime is already using
priorities for the scheduler all sorts of bad things might happen.
It doesn't look to me like pthread scheduling is currently manipulated in runtime. It is however opaque because of the optimisations and trampolining around pthread function calls. my assessment is only from perusing and grepping for where it would seem such things would occur. But with all the assembly related to pthreads in runtime, I could have missed something.
If OS scheduling inheritance were under an option or environmental variable or build tag then it wouldn't in principle prevent future or other efforts to introduce OS scheduling into the runtime.
Scott
Post by Robert Engels
Hi all,
After looking at runtime and some thought about the scheduling of
special priority threads, it has
occurred to me that there might be a simple solution to making the
full Go runtime work (channels, goroutines, etc) with special OS thread
scheduling.
The idea would be to use pthread_attr_setinheritsched and set it to
PTHREAD_INHERIT_SCHED (maybe under some kind of option)
when creating a new thread.
Although actually getting this working in runtime looks like a hefty
task, and although this would not allow mixing different thread scheduling
in one Go runtime, it seems to me it would allow a Go program to run with
all threads specially scheduled provided that
it was launched by a thread/process with the desired OS scheduling,
thus enabling the possibility of using Go in contexts like a real-time
audio processing chain which use special OS thread scheduling.
It also seems like it would be much simpler than modifying the runtime
to know about OS scheduling priorities.
Any thoughts appreciated.
Best,
Scott
Post by Scott Cotton
Hi Robert and All,
Ralph gave us info on the jiffy in linux scheduling. Although your
conclusions are in line with his data, the numbers are a little bit off.
The default jiffy (roughly: the time that the scheduler gives one thread to
occupy a cpu core without interruption) is 0.004 seconds. It was 0.01
seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a
bit, but that was too fast, so they backed off to 0.004 shortly thereafter.
This is why reliable low latency audio uses special thread
priorities. It is used in Android AAudio, and Apple CoreAudio for this
purpose, which I would classify as widespread use in real applications.
Although you can get low latency in benchmarks on unloaded machines where
you don't have to wait for a jiffy, this is not considered reliable in
audio systems unless you have dedicated hardware. For similar reasons, it
is also not considered reliable to have sys calls like sigaltstack in cgo
callbacks on audio processing/rendering threads. this portaudio
mailing list message
<https://lists.columbia.edu/pipermail/portaudio/2018-September/001530.html> is
a good description.
In answer to the question of how far we can go in audio without
scheduling priorities in the go runtime, it seems to me there are the
1. no goroutines and sigaltstack/cgo->go overhead (which involves a
sys call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences
them in any way.
A simple conclusion is that low latency audio apps in Go are
unreliable on most platforms as the runtime+cgo mechanism stands today. It
doesn't matter how well something is programmed in pure Go or how smartly
the work is divided between cgo and go. It doesn't even matter if someone
benchmarks their system and claims to have had reliable low latency (like
me and Brian), because these issues are caused by the relationship between
go and the host OS thread scheduling and the widespread need for special
thread priorities in audio systems.
I have a goal of making reliable low latency audio apps in Go. I
think it is a reasonable goal since Go is a good general purpose language,
with both low and high level features. But I think these issues together
are a stopper for reliable latency under about twice a jiffy, which doesn't
really quite fall into the low latency category.
Ian suggested the TODO in dropm in runtime/proc.go. This would help
issue 1. I have started looking at it and it seems to so far (the code is
pretty deep to learn overnight, so take with a grain of salt): in any event
the cgo->go and go->cgo directions would involve sigaltstack even
with the improvement suggested in the TODO. Any runtime/proc.go gurus
willing to comment?
Robert suggested adding runtime functions to define thread priorities
and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for
solving 1. Ian agreed that something like that was necessary but details
were unclear.
I have started looking at how to make progress on that more
concrete. I have asked for help w.r.t.
plan9 and windows and the various js host targets (where I guess this
functionality shouldn't be supported)
with no response as of yet.
Best,
Scott
Post by robert engels
The linux kernel can perform context switches in under 5 usecs on
“standard hardware”. In the case of equal priorities, I believe the
standard time slice is 100 ms (although 64 ms in many systems). So without
scheduling priority control, if a program needs anything under 100 ms it is
not even close to being guaranteed in a general purpose linux install.
Post by Robert Engels
I would be surprised it would be needed for audio as well, given
the GC pause times, and also that the OS drivers are buffered.
I was referring more to HPC systems and cache locality. Hard to
achieve with thousands of Goroutines if you can’t group and isolate them.
The OS driver buffers for low latency audio represent a real time
duration below the default OS thread scheduling latency. For Go,
I believe the up front latency cost would be OS thread scheduling
latency + GC pause times. The GC latency improvements are great, and
enable a lot, but the GC operates within the context of OS thread
scheduling. A Go app with 1ms GC latency would have
by default on linux a 0.004 sec latency + the GC latency should the
OS need to prioritise something else first.
In other languages (Java, C) the likelihood of 0.004 sec latency
resulting from scheduling happens less often because of support for thread
scheduling. That is, they are more reliable than full Go with goroutines.
It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a
callback given to a host sound system to run on a high priority thread.
I like your idea of adding to the runtime grouping, scheduling
class, and affinity very much for audio.
Best,
Scott
Post by Robert Engels
On Sep 13, 2018, at 7:53 AM, 'Bryan C. Mills' via golang-dev <
- From a privileged C thread, invoke a cgo-exported Go
function. The Go function can loop (without returning) to perform whatever
real-time work is needed, using buffered channels to communicate with the
rest of the program (and thereby avoid blocking the privileged thread).
- In other goroutines, perform any background work that does
not need real-time scheduling (such as pre-rendering or decoding chunks of
audio).
FWIW, I have done a couple of experiments with real-time audio in
Go in the past. In 2013 it was possible to get acceptable latency
characteristics for interactive performance on a Linux desktop machine
(using the ALSA C API) without any special scheduling, provided that the
main loop did not allocate. Given the GC latency improvements since then, I
would be surprised if the “do not allocate” proviso is even still needed.
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged
layer code
that uses special thread scheduling. For example, AAudio
(SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins
scheduler, but it is "real time" according to Apple) do this. There is a
strong consensus that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had
any apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go
implied by this.
At the level of unprivileged access, Go would need to operate on
threads supplied
by the above systems. Presumably, this would be via cgo->go
calls. Ian: Was wondering
if the improvements you suggested were related to setting up the
Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually
offer a replacement for
things like AAudio and CoreAudio. It could use the native
interface (either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go
to start goroutines on them.
In this case, I would imagine it would be nice to be able to have
M:N goroutines to threads.
To my understanding, this is not currently possible with
Goroutines locked to threads, and
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in
the case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1
type behaviour.
My question to golang-dev as a whole is if it seems feasible to
try to make interoperability with
OS special scheduling characteristics of threads better, perhaps
along the lines above, and if anyone knows of other applications that fall
in the category of special OS thread scheduling (not cpu affinity) that
would benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it
would be
Post by Scott Cotton
impossible to do without risk on the first scheduling of a
foreign specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have
the problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
Scott Cotton
http://www.iri-labs.com
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-18 12:09:28 UTC
Permalink
Hi Keith,

thanks, makes sense to me and it is indeed helpful to see the merge commit
and issue/related issues. It gives me a lot of context that I was lacking
before.

Hi all,

Just to try to help prevent this from getting buried in the various
exchanges, I'm pulling back to the top the following about
go runtime inheriting OS scheduler parameters, like via chrt on linux.
Post by Scott Cotton
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.
from me:
a command like chrt and the lack of guarding against scheduling priority
inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me, so
I was a bit surprised.

But at any rate, for what I'm looking at doing, something like chrt without
LockOSThread would be much more interesting.
Perhaps running the go test suite in an os-priveleged context replacing the
executables with chrt wrappers would be a good
place to start examining this. [+update: I'm looking at how to do this. If
a cross platform way of doing it were devised, would TruBot be available
for such a test?]

Best,
Scott
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'David Chase' via golang-dev
2018-09-18 15:39:36 UTC
Permalink
Thanks for raising the Go GC latency issue. That's a bug, not expected
behavior.
See https://github.com/golang/go/issues/27732 .
(A rapidly-allocating goroutine will be taxed to keep it from getting ahead
of the garbage collector.
In this case, the tax is far higher than expected. Details/cause TBD).
Also, it's great that this is a simple, fast reproducer.
Post by Scott Cotton
Hi Keith,
thanks, makes sense to me and it is indeed helpful to see the merge commit
and issue/related issues. It gives me a lot of context that I was lacking
before.
Hi all,
Just to try to help prevent this from getting buried in the various
exchanges, I'm pulling back to the top the following about
go runtime inheriting OS scheduler parameters, like via chrt on linux.
Post by Scott Cotton
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.
a command like chrt and the lack of guarding against scheduling priority
inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me, so
I was a bit surprised.
But at any rate, for what I'm looking at doing, something like chrt
without LockOSThread would be much more interesting.
Perhaps running the go test suite in an os-priveleged context replacing
the executables with chrt wrappers would be a good
place to start examining this. [+update: I'm looking at how to do this.
If a cross platform way of doing it were devised, would TruBot be available
for such a test?]
Best,
Scott
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-18 15:58:22 UTC
Permalink
Yea, that’s the point though. The GC collection in Go is not as efficient as the Shenandoah collector, so it effectively needs to pause/stall the app/mutators to catch up. Just as the G1 collector is not as good as Shenandoah, and the pauses are even longer.

Which is why I originally pointed the Go authors to Shenandoah - since it is open-source, I am fairly certain its techniques could be adopted (although I know there is some discussion on write barriers being not desirable).
Thanks for raising the Go GC latency issue. That's a bug, not expected behavior.
See https://github.com/golang/go/issues/27732 <https://github.com/golang/go/issues/27732> .
(A rapidly-allocating goroutine will be taxed to keep it from getting ahead of the garbage collector.
In this case, the tax is far higher than expected. Details/cause TBD).
Also, it's great that this is a simple, fast reproducer.
Hi Keith,
thanks, makes sense to me and it is indeed helpful to see the merge commit and issue/related issues. It gives me a lot of context that I was lacking before.
Hi all,
Just to try to help prevent this from getting buried in the various exchanges, I'm pulling back to the top the following about
go runtime inheriting OS scheduler parameters, like via chrt on linux.
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.
a command like chrt and the lack of guarding against scheduling priority inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me, so I was a bit surprised.
But at any rate, for what I'm looking at doing, something like chrt without LockOSThread would be much more interesting.
Perhaps running the go test suite in an os-priveleged context replacing the executables with chrt wrappers would be a good
place to start examining this. [+update: I'm looking at how to do this. If a cross platform way of doing it were devised, would TruBot be available for such a test?]
Best,
Scott
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'David Chase' via golang-dev
2018-09-18 15:59:52 UTC
Permalink
This is not a technique problem. It is a bug, and it can be fixed without
reengineering the Go garbage collector to use Brooks pointers.
Post by robert engels
Yea, that’s the point though. The GC collection in Go is not as efficient
as the Shenandoah collector, so it effectively needs to pause/stall the
app/mutators to catch up. Just as the G1 collector is not as good as
Shenandoah, and the pauses are even longer.
Which is why I originally pointed the Go authors to Shenandoah - since it
is open-source, I am fairly certain its techniques could be adopted
(although I know there is some discussion on write barriers being not
desirable).
On Sep 18, 2018, at 10:39 AM, 'David Chase' via golang-dev <
Thanks for raising the Go GC latency issue. That's a bug, not expected behavior.
See https://github.com/golang/go/issues/27732 .
(A rapidly-allocating goroutine will be taxed to keep it from getting
ahead of the garbage collector.
In this case, the tax is far higher than expected. Details/cause TBD).
Also, it's great that this is a simple, fast reproducer.
Post by Scott Cotton
Hi Keith,
thanks, makes sense to me and it is indeed helpful to see the merge
commit and issue/related issues. It gives me a lot of context that I was
lacking before.
Hi all,
Just to try to help prevent this from getting buried in the various
exchanges, I'm pulling back to the top the following about
go runtime inheriting OS scheduler parameters, like via chrt on linux.
Post by Scott Cotton
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.
a command like chrt and the lack of guarding against scheduling priority
inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me,
so I was a bit surprised.
But at any rate, for what I'm looking at doing, something like chrt
without LockOSThread would be much more interesting.
Perhaps running the go test suite in an os-priveleged context replacing
the executables with chrt wrappers would be a good
place to start examining this. [+update: I'm looking at how to do this.
If a cross platform way of doing it were devised, would TruBot be available
for such a test?]
Best,
Scott
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-18 16:08:36 UTC
Permalink
That’s good news.

Curious though, how do you know that? I look at it from the fact that if a mutator thread only allocates, and “frees", a single collector thread would need to be able to collect the garbage as fast as the allocator produces it (which is hard given basically free allocation costs), so I would think that finding, collecting, compacting, the garbage cannot be as cheap as the allocation. Then you throw in that the runtime might use other threads for housekeeping leading to cpu availability issues.

Even non-GC like malloc can perform very poorly at times due to fragmentation and allocation profile.

I am not questioning that it could be a bug, I was just wondering how you KNOW it is ? (technical curiosity)
This is not a technique problem. It is a bug, and it can be fixed without reengineering the Go garbage collector to use Brooks pointers.
Yea, that’s the point though. The GC collection in Go is not as efficient as the Shenandoah collector, so it effectively needs to pause/stall the app/mutators to catch up. Just as the G1 collector is not as good as Shenandoah, and the pauses are even longer.
Which is why I originally pointed the Go authors to Shenandoah - since it is open-source, I am fairly certain its techniques could be adopted (although I know there is some discussion on write barriers being not desirable).
Thanks for raising the Go GC latency issue. That's a bug, not expected behavior.
See https://github.com/golang/go/issues/27732 <https://github.com/golang/go/issues/27732> .
(A rapidly-allocating goroutine will be taxed to keep it from getting ahead of the garbage collector.
In this case, the tax is far higher than expected. Details/cause TBD).
Also, it's great that this is a simple, fast reproducer.
Hi Keith,
thanks, makes sense to me and it is indeed helpful to see the merge commit and issue/related issues. It gives me a lot of context that I was lacking before.
Hi all,
Just to try to help prevent this from getting buried in the various exchanges, I'm pulling back to the top the following about
go runtime inheriting OS scheduler parameters, like via chrt on linux.
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.
a command like chrt and the lack of guarding against scheduling priority inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me, so I was a bit surprised.
But at any rate, for what I'm looking at doing, something like chrt without LockOSThread would be much more interesting.
Perhaps running the go test suite in an os-priveleged context replacing the executables with chrt wrappers would be a good
place to start examining this. [+update: I'm looking at how to do this. If a cross platform way of doing it were devised, would TruBot be available for such a test?]
Best,
Scott
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'David Chase' via golang-dev
2018-09-18 17:40:32 UTC
Permalink
If you eyeball a trace, I'd estimate that overall for that benchmark,
there's about equal parts GC work and "real" work,
and that benchmark is very GC-heavy. For most programs the time spent in
GC is much lower (and can be made lower
by increasing GOGC, at the expense of a larger memory footprint).

For the sake of argument and easy math, assume that each slice allocation
and initialization uses 1 uS of real work.
(The actually observed average is some mix of GC+real and is measured at
1uS even with this bug, but this is also on a "4" processor box, but it's
not using them properly during assists, so mumble. Call it 1uS).

If we time-sliced GC infinitely well and ran it continuously on a single
processor, with equal parts spent on GC and real work, we'd observe a
latency of 2 uS.
This simple work argument gets us nowhere near a 5000 uS latency for this
benchmark -- but it's not a problem of the GC "keeping up".

One thing to watch out for in discussions of the Go garbage collector is
that "keep up" is with respect to a goal heap size; based on observed
behavior, the GC predicts when it needs to start in order to finish so that
the peak heap size is LIVE * (100 + GOGC)/100. When predictions are ideal,
even for a heavily allocating program, there should be (modulo bugs) no
need to draft the mutator to help with GC work, and on a 4 processor laptop
running a single mutator thread it should be easy for the GC to "keep up"
because it has 75% of the available CPU to use. But, sometimes, if the
estimate is wrong, the mutator will be drafted to perform GC work whenever
it allocates memory.

So anyhow, I think this is (at least) a time/work-slicing problem of some
sort. It looks like mark assist is handed a large lump of work that does
not get split up like it should, and threads that should be helping with GC
instead sit idle. (If large objects exist, splitting up their associated
work is a latency problem for all garbage collectors; there are known
solutions, but there are bugs).

And yes, there are some lumpy bits of work that don't time slice well, but
the largest of those is supposed to be smaller than 100 uS long.
Post by robert engels
That’s good news.
Curious though, how do you know that? I look at it from the fact that if a
mutator thread only allocates, and “frees", a single collector thread would
need to be able to collect the garbage as fast as the allocator produces it
(which is hard given basically free allocation costs), so I would think
that finding, collecting, compacting, the garbage cannot be as cheap as the
allocation. Then you throw in that the runtime might use other threads for
housekeeping leading to cpu availability issues.
Even non-GC like malloc can perform very poorly at times due to
fragmentation and allocation profile.
I am not questioning that it could be a bug, I was just wondering how you
KNOW it is ? (technical curiosity)
This is not a technique problem. It is a bug, and it can be fixed without
reengineering the Go garbage collector to use Brooks pointers.
Post by robert engels
Yea, that’s the point though. The GC collection in Go is not as efficient
as the Shenandoah collector, so it effectively needs to pause/stall the
app/mutators to catch up. Just as the G1 collector is not as good as
Shenandoah, and the pauses are even longer.
Which is why I originally pointed the Go authors to Shenandoah - since it
is open-source, I am fairly certain its techniques could be adopted
(although I know there is some discussion on write barriers being not
desirable).
On Sep 18, 2018, at 10:39 AM, 'David Chase' via golang-dev <
Thanks for raising the Go GC latency issue. That's a bug, not expected behavior.
See https://github.com/golang/go/issues/27732 .
(A rapidly-allocating goroutine will be taxed to keep it from getting
ahead of the garbage collector.
In this case, the tax is far higher than expected. Details/cause TBD).
Also, it's great that this is a simple, fast reproducer.
Post by Scott Cotton
Hi Keith,
thanks, makes sense to me and it is indeed helpful to see the merge
commit and issue/related issues. It gives me a lot of context that I was
lacking before.
Hi all,
Just to try to help prevent this from getting buried in the various
exchanges, I'm pulling back to the top the following about
go runtime inheriting OS scheduler parameters, like via chrt on linux.
Post by Scott Cotton
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.
a command like chrt and the lack of guarding against scheduling priority
inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me,
so I was a bit surprised.
But at any rate, for what I'm looking at doing, something like chrt
without LockOSThread would be much more interesting.
Perhaps running the go test suite in an os-priveleged context replacing
the executables with chrt wrappers would be a good
place to start examining this. [+update: I'm looking at how to do
this. If a cross platform way of doing it were devised, would TruBot be
available for such a test?]
Best,
Scott
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-18 17:58:04 UTC
Permalink
Thanks for the color. Makes sense. Other than those slides that were offered earlier, is there a current white paper describing the Go GC in detail ? It seems to have changed (improved) a lot, and I’ve curious about the internals - but don’t really want to dig through the code 
 :)
If you eyeball a trace, I'd estimate that overall for that benchmark, there's about equal parts GC work and "real" work,
and that benchmark is very GC-heavy. For most programs the time spent in GC is much lower (and can be made lower
by increasing GOGC, at the expense of a larger memory footprint).
For the sake of argument and easy math, assume that each slice allocation and initialization uses 1 uS of real work.
(The actually observed average is some mix of GC+real and is measured at 1uS even with this bug, but this is also on a "4" processor box, but it's not using them properly during assists, so mumble. Call it 1uS).
If we time-sliced GC infinitely well and ran it continuously on a single processor, with equal parts spent on GC and real work, we'd observe a latency of 2 uS.
This simple work argument gets us nowhere near a 5000 uS latency for this benchmark -- but it's not a problem of the GC "keeping up".
One thing to watch out for in discussions of the Go garbage collector is that "keep up" is with respect to a goal heap size; based on observed behavior, the GC predicts when it needs to start in order to finish so that the peak heap size is LIVE * (100 + GOGC)/100. When predictions are ideal, even for a heavily allocating program, there should be (modulo bugs) no need to draft the mutator to help with GC work, and on a 4 processor laptop running a single mutator thread it should be easy for the GC to "keep up" because it has 75% of the available CPU to use. But, sometimes, if the estimate is wrong, the mutator will be drafted to perform GC work whenever it allocates memory.
So anyhow, I think this is (at least) a time/work-slicing problem of some sort. It looks like mark assist is handed a large lump of work that does not get split up like it should, and threads that should be helping with GC instead sit idle. (If large objects exist, splitting up their associated work is a latency problem for all garbage collectors; there are known solutions, but there are bugs).
And yes, there are some lumpy bits of work that don't time slice well, but the largest of those is supposed to be smaller than 100 uS long.
That’s good news.
Curious though, how do you know that? I look at it from the fact that if a mutator thread only allocates, and “frees", a single collector thread would need to be able to collect the garbage as fast as the allocator produces it (which is hard given basically free allocation costs), so I would think that finding, collecting, compacting, the garbage cannot be as cheap as the allocation. Then you throw in that the runtime might use other threads for housekeeping leading to cpu availability issues.
Even non-GC like malloc can perform very poorly at times due to fragmentation and allocation profile.
I am not questioning that it could be a bug, I was just wondering how you KNOW it is ? (technical curiosity)
This is not a technique problem. It is a bug, and it can be fixed without reengineering the Go garbage collector to use Brooks pointers.
Yea, that’s the point though. The GC collection in Go is not as efficient as the Shenandoah collector, so it effectively needs to pause/stall the app/mutators to catch up. Just as the G1 collector is not as good as Shenandoah, and the pauses are even longer.
Which is why I originally pointed the Go authors to Shenandoah - since it is open-source, I am fairly certain its techniques could be adopted (although I know there is some discussion on write barriers being not desirable).
Thanks for raising the Go GC latency issue. That's a bug, not expected behavior.
See https://github.com/golang/go/issues/27732 <https://github.com/golang/go/issues/27732> .
(A rapidly-allocating goroutine will be taxed to keep it from getting ahead of the garbage collector.
In this case, the tax is far higher than expected. Details/cause TBD).
Also, it's great that this is a simple, fast reproducer.
Hi Keith,
thanks, makes sense to me and it is indeed helpful to see the merge commit and issue/related issues. It gives me a lot of context that I was lacking before.
Hi all,
Just to try to help prevent this from getting buried in the various exchanges, I'm pulling back to the top the following about
go runtime inheriting OS scheduler parameters, like via chrt on linux.
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.
a command like chrt and the lack of guarding against scheduling priority inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me, so I was a bit surprised.
But at any rate, for what I'm looking at doing, something like chrt without LockOSThread would be much more interesting.
Perhaps running the go test suite in an os-priveleged context replacing the executables with chrt wrappers would be a good
place to start examining this. [+update: I'm looking at how to do this. If a cross platform way of doing it were devised, would TruBot be available for such a test?]
Best,
Scott
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-19 01:06:41 UTC
Permalink
Hi all,

I wanted to summarise my take home on this very productive thread:

1) Go's GC is a great enabler for latency sensitive real-time audio/media
processing

2) Nonetheless, there are some issues around runtime/language and low
latency realtime audio APIs, as it is commonly considered unreliable to do
this outside of C because
a) Callbacks on a real time thread of an audio system can't call things
like sigaltstack in cgo->go
b) Go's runtime doesn't let you directly play with OS thread scheduling

3) We have some solutions for b)
- a) use LockOSThread()
- b) although received with some skepticism, there is no apparent reason
one can't set the thread scheduling from calling context or OS calls
(preliminary tests on my part work fine BTW)

4) FWIW, I have also in mind a solution for a) for the case of audio, as
the callbacks all have a similar form and just transfer data, so perhaps
cgo->go can be bypassed completely.

3 a) is suboptimal due to the inherent limitations of LockOSThread and
current Go runtime scheduling of them.

If 3 b) continues to work and 4) works out, then the only remaining
question is really Go's GC, which looks very promising in this regard to me.

So although current Go real-time audio apps are considered unreliable in
low latency situations as compared to more widely used systems, the
prospects of it working well with solutions 3 b) and 4) look very good to
me now. They didn't before learning everything from all that was shared
here. and no changes to runtime necessary, other than perhaps the
perception of it allowing to inherit (and even set) scheduling properties
from the OS and calling context.

I'm quite happy with those results.

Thanks all & regards
Scott
Post by Scott Cotton
Hi Keith,
thanks, makes sense to me and it is indeed helpful to see the merge commit
and issue/related issues. It gives me a lot of context that I was lacking
before.
Hi all,
Just to try to help prevent this from getting buried in the various
exchanges, I'm pulling back to the top the following about
go runtime inheriting OS scheduler parameters, like via chrt on linux.
Post by Scott Cotton
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.
a command like chrt and the lack of guarding against scheduling priority
inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me, so
I was a bit surprised.
But at any rate, for what I'm looking at doing, something like chrt
without LockOSThread would be much more interesting.
Perhaps running the go test suite in an os-priveleged context replacing
the executables with chrt wrappers would be a good
place to start examining this. [+update: I'm looking at how to do this.
If a cross platform way of doing it were devised, would TruBot be
available for such a test?]
Best,
Scott
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
robert engels
2018-09-19 01:31:40 UTC
Permalink
I would also search “android real-time audio issues”. I know that there were problems reported there - people that created “effects software” always complained especially when compared to iOS.

As far as I know, they solved these issues - most likely by a native driver layer in the linux kernel ? - but still, Android is Java, and it didn’t even have some of the lower level bridges to C/native that Go has, so I would expect the same techniques could be applied to Go, but you probably lose cross-platform doing real-time audio anyway.

I would review those problems and solutions.
Post by Scott Cotton
Hi all,
1) Go's GC is a great enabler for latency sensitive real-time audio/media processing
2) Nonetheless, there are some issues around runtime/language and low latency realtime audio APIs, as it is commonly considered unreliable to do this outside of C because
a) Callbacks on a real time thread of an audio system can't call things like sigaltstack in cgo->go
b) Go's runtime doesn't let you directly play with OS thread scheduling
3) We have some solutions for b)
- a) use LockOSThread()
- b) although received with some skepticism, there is no apparent reason one can't set the thread scheduling from calling context or OS calls (preliminary tests on my part work fine BTW)
4) FWIW, I have also in mind a solution for a) for the case of audio, as the callbacks all have a similar form and just transfer data, so perhaps cgo->go can be bypassed completely.
3 a) is suboptimal due to the inherent limitations of LockOSThread and current Go runtime scheduling of them.
If 3 b) continues to work and 4) works out, then the only remaining question is really Go's GC, which looks very promising in this regard to me.
So although current Go real-time audio apps are considered unreliable in low latency situations as compared to more widely used systems, the prospects of it working well with solutions 3 b) and 4) look very good to me now. They didn't before learning everything from all that was shared here. and no changes to runtime necessary, other than perhaps the perception of it allowing to inherit (and even set) scheduling properties from the OS and calling context.
I'm quite happy with those results.
Thanks all & regards
Scott
Hi Keith,
thanks, makes sense to me and it is indeed helpful to see the merge commit and issue/related issues. It gives me a lot of context that I was lacking before.
Hi all,
Just to try to help prevent this from getting buried in the various exchanges, I'm pulling back to the top the following about
go runtime inheriting OS scheduler parameters, like via chrt on linux.
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.
a command like chrt and the lack of guarding against scheduling priority inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me, so I was a bit surprised.
But at any rate, for what I'm looking at doing, something like chrt without LockOSThread would be much more interesting.
Perhaps running the go test suite in an os-priveleged context replacing the executables with chrt wrappers would be a good
place to start examining this. [+update: I'm looking at how to do this. If a cross platform way of doing it were devised, would TruBot be available for such a test?]
Best,
Scott
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-19 11:40:13 UTC
Permalink
Post by robert engels
I would also search “android real-time audio issues”. I know that there
were problems reported there - people that created “effects software”
always complained especially when compared to iOS.
Yes I'm quite aware of that. iOS/darwin is very un simple for performance
audio but very much ahead of other OSs. You can for example synchronise
devices (mic/speaker) to a single hardware audio clock there and treat
effects/processors the same as I/O, which doesn't appear possible to me
anywhere else.
Post by robert engels
As far as I know, they solved these issues - most likely by a native
driver layer in the linux kernel ? - but still, Android is Java, and it
didn’t even have some of the lower level bridges to C/native that Go has,
so I would expect the same techniques could be applied to Go, but you
probably lose cross-platform doing real-time audio anyway.
I would review those problems and solutions.
Of course. Been doing a lot of just that. With Android there are lots of
audio APIs so it takes time and I'm still learning, but the big picture
appears to be AAudio is the solution, and things like oboe are the best
solution
in terms of compatability between AAudio and older Android (OpenSL ES).

AAudio looks good to me, but still behind iOS in terms of OS<->hardware
interface and duplex synchronisation. The Android Audio HAL appears to be
moving forward too, with things like DMA
which help.

I think most "performance" audio apps use Android NDK and C/C++ for all but
the most high level,
not much actual Java runtime in the picture I see despite it being Android.

Best,
Scott
Post by robert engels
Hi all,
1) Go's GC is a great enabler for latency sensitive real-time audio/media processing
2) Nonetheless, there are some issues around runtime/language and low
latency realtime audio APIs, as it is commonly considered unreliable to do
this outside of C because
a) Callbacks on a real time thread of an audio system can't call things
like sigaltstack in cgo->go
b) Go's runtime doesn't let you directly play with OS thread scheduling
3) We have some solutions for b)
- a) use LockOSThread()
- b) although received with some skepticism, there is no apparent
reason one can't set the thread scheduling from calling context or OS
calls (preliminary tests on my part work fine BTW)
4) FWIW, I have also in mind a solution for a) for the case of audio, as
the callbacks all have a similar form and just transfer data, so perhaps
cgo->go can be bypassed completely.
3 a) is suboptimal due to the inherent limitations of LockOSThread and
current Go runtime scheduling of them.
If 3 b) continues to work and 4) works out, then the only remaining
question is really Go's GC, which looks very promising in this regard to me.
So although current Go real-time audio apps are considered unreliable in
low latency situations as compared to more widely used systems, the
prospects of it working well with solutions 3 b) and 4) look very good to
me now. They didn't before learning everything from all that was shared
here. and no changes to runtime necessary, other than perhaps the
perception of it allowing to inherit (and even set) scheduling properties
from the OS and calling context.
I'm quite happy with those results.
Thanks all & regards
Scott
Post by Scott Cotton
Hi Keith,
thanks, makes sense to me and it is indeed helpful to see the merge
commit and issue/related issues. It gives me a lot of context that I was
lacking before.
Hi all,
Just to try to help prevent this from getting buried in the various
exchanges, I'm pulling back to the top the following about
go runtime inheriting OS scheduler parameters, like via chrt on linux.
Post by Scott Cotton
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.
a command like chrt and the lack of guarding against scheduling priority
inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me,
so I was a bit surprised.
But at any rate, for what I'm looking at doing, something like chrt
without LockOSThread would be much more interesting.
Perhaps running the go test suite in an os-priveleged context replacing
the executables with chrt wrappers would be a good
place to start examining this. [+update: I'm looking at how to do this.
If a cross platform way of doing it were devised, would TruBot be available
for such a test?]
Best,
Scott
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-21 19:34:00 UTC
Permalink
Hi all,

In case anyone wants to follow
[here](https://github.com/zikichombo/sio/issues/17) is an issue
to track progress.

Best,
Scott
Post by Scott Cotton
Hi all,
1) Go's GC is a great enabler for latency sensitive real-time audio/media
processing
2) Nonetheless, there are some issues around runtime/language and low
latency realtime audio APIs, as it is commonly considered unreliable to do
this outside of C because
a) Callbacks on a real time thread of an audio system can't call things
like sigaltstack in cgo->go
b) Go's runtime doesn't let you directly play with OS thread scheduling
3) We have some solutions for b)
- a) use LockOSThread()
- b) although received with some skepticism, there is no apparent
reason one can't set the thread scheduling from calling context or OS
calls (preliminary tests on my part work fine BTW)
4) FWIW, I have also in mind a solution for a) for the case of audio, as
the callbacks all have a similar form and just transfer data, so perhaps
cgo->go can be bypassed completely.
3 a) is suboptimal due to the inherent limitations of LockOSThread and
current Go runtime scheduling of them.
If 3 b) continues to work and 4) works out, then the only remaining
question is really Go's GC, which looks very promising in this regard to me.
So although current Go real-time audio apps are considered unreliable in
low latency situations as compared to more widely used systems, the
prospects of it working well with solutions 3 b) and 4) look very good to
me now. They didn't before learning everything from all that was shared
here. and no changes to runtime necessary, other than perhaps the
perception of it allowing to inherit (and even set) scheduling properties
from the OS and calling context.
I'm quite happy with those results.
Thanks all & regards
Scott
Post by Scott Cotton
Hi Keith,
thanks, makes sense to me and it is indeed helpful to see the merge
commit and issue/related issues. It gives me a lot of context that I was
lacking before.
Hi all,
Just to try to help prevent this from getting buried in the various
exchanges, I'm pulling back to the top the following about
go runtime inheriting OS scheduler parameters, like via chrt on linux.
Post by Scott Cotton
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.
a command like chrt and the lack of guarding against scheduling priority
inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me,
so I was a bit surprised.
But at any rate, for what I'm looking at doing, something like chrt
without LockOSThread would be much more interesting.
Perhaps running the go test suite in an os-priveleged context replacing
the executables with chrt wrappers would be a good
place to start examining this. [+update: I'm looking at how to do this.
If a cross platform way of doing it were devised, would TruBot be
available for such a test?]
Best,
Scott
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-13 13:04:09 UTC
Permalink
Post by 'Bryan C. Mills' via golang-dev
- From a privileged C thread, invoke a cgo-exported Go function. The
Go function can loop (without returning) to perform whatever real-time work
is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
Yes I think that could work in the context of an OS privileged go program
providing audio. However, the buffering of channels to avoid non-realtime
scheduling timing of the program seems less flexible and less reliable than
the group idea to me.
Post by 'Bryan C. Mills' via golang-dev
- In other goroutines, perform any background work that does not need
real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in
the past. In 2013 it was possible to get acceptable latency characteristics
for interactive performance on a Linux desktop machine (using the ALSA C
API) without any special scheduling, provided that the main loop did not
allocate. Given the GC latency improvements since then, I would be
surprised if the “do not allocate” proviso is even still needed.
Cool! did you do duplex with ALSA? Duplex latency is much more demanding
than say latency of user interface interaction. The former, such as in
VoIP, is related to what we hear, which has much for fine grained timing
requirements than say 30-60fps game interaction.

Scott
Post by 'Bryan C. Mills' via golang-dev
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer
code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO)
and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but
it is "real time" according to Apple) do this. There is a strong consensus
that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied by
this.
At the level of unprivileged access, Go would need to operate on threads
supplied
Was wondering
if the improvements you suggested were related to setting up the
Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually offer
a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign threads
in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the case
of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type
behaviour.
My question to golang-dev as a whole is if it seems feasible to try to
make interoperability with
OS special scheduling characteristics of threads better, perhaps along
the lines above, and if anyone knows of other applications that fall in the
category of special OS thread scheduling (not cpu affinity) that would
benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Bryan C. Mills' via golang-dev
2018-09-13 13:22:41 UTC
Permalink
Post by Scott Cotton
Post by 'Bryan C. Mills' via golang-dev
- From a privileged C thread, invoke a cgo-exported Go function. The
Go function can loop (without returning) to perform whatever real-time work
is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
Yes I think that could work in the context of an OS privileged go program
providing audio. However, the buffering of channels to avoid non-realtime
scheduling timing of the program seems less flexible and less reliable than
the group idea to me.
Buffered channels are already in the language, and already useful
independent of realtime scheduling. It would be nice to see how far we can
get with existing features before we propose to add new ones. 🙂
Post by Scott Cotton
Post by 'Bryan C. Mills' via golang-dev
- In other goroutines, perform any background work that does not need
real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in
the past. In 2013 it was possible to get acceptable latency characteristics
for interactive performance on a Linux desktop machine (using the ALSA C
API) without any special scheduling, provided that the main loop did not
allocate. Given the GC latency improvements since then, I would be
surprised if the “do not allocate” proviso is even still needed.
Cool! did you do duplex with ALSA? Duplex latency is much more demanding
than say latency of user interface interaction. The former, such as in
VoIP, is related to what we hear, which has much for fine grained timing
requirements than say 30-60fps game interaction.
My experiment <https://bitbucket.org/bcmills/harmonolog> was only
half-duplex, but it's a live instrument (a just-tempered synthesizer using
the computer keyboard as input), so the end-to-end latency has similar
constraints to VoIP. (If the delay between the keyboard input and audio
output gets too long, the instrument becomes more-or-less unplayable.)

It looks like I was using 5ms latency, which is comparable to what I use
for ASIO instruments.


Scott
Post by Scott Cotton
Post by 'Bryan C. Mills' via golang-dev
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged layer
code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO)
and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but
it is "real time" according to Apple) do this. There is a strong consensus
that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied
by this.
At the level of unprivileged access, Go would need to operate on threads
supplied
by the above systems. Presumably, this would be via cgo->go calls.
Ian: Was wondering
if the improvements you suggested were related to setting up the
Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually offer
a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the
case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type
behaviour.
My question to golang-dev as a whole is if it seems feasible to try to
make interoperability with
OS special scheduling characteristics of threads better, perhaps along
the lines above, and if anyone knows of other applications that fall in the
category of special OS thread scheduling (not cpu affinity) that would
benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-13 13:34:30 UTC
Permalink
5ms looks to me like a common latency spot: it's close to a power of 2
buffer size for fft for common sample rates, it is doable pretty reliably
on unloaded systems independent of OS scheduling priority, and it is small
enough to not be terribly irritating to play interactively. But I would
think a professional music studio would offer much lower latency, and a
latency sensitive musician may not like 5ms very much.

Probably just good enough for mass market interactive music apps.

Scott
Post by 'Bryan C. Mills' via golang-dev
Post by 'Bryan C. Mills' via golang-dev
- From a privileged C thread, invoke a cgo-exported Go function. The
Go function can loop (without returning) to perform whatever real-time work
is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
Yes I think that could work in the context of an OS privileged go
program providing audio. However, the buffering of channels to avoid
non-realtime scheduling timing of the program seems less flexible and less
reliable than the group idea to me.
Buffered channels are already in the language, and already useful
independent of realtime scheduling. It would be nice to see how far we can
get with existing features before we propose to add new ones. 🙂
Post by 'Bryan C. Mills' via golang-dev
- In other goroutines, perform any background work that does not
need real-time scheduling (such as pre-rendering or decoding chunks of
audio).
FWIW, I have done a couple of experiments with real-time audio in Go in
the past. In 2013 it was possible to get acceptable latency characteristics
for interactive performance on a Linux desktop machine (using the ALSA C
API) without any special scheduling, provided that the main loop did not
allocate. Given the GC latency improvements since then, I would be
surprised if the “do not allocate” proviso is even still needed.
Cool! did you do duplex with ALSA? Duplex latency is much more
demanding than say latency of user interface interaction. The former, such
as in VoIP, is related to what we hear, which has much for fine grained
timing requirements than say 30-60fps game interaction.
My experiment <https://bitbucket.org/bcmills/harmonolog> was only
half-duplex, but it's a live instrument (a just-tempered synthesizer using
the computer keyboard as input), so the end-to-end latency has similar
constraints to VoIP. (If the delay between the keyboard input and audio
output gets too long, the instrument becomes more-or-less unplayable.)
It looks like I was using 5ms latency, which is comparable to what I use
for ASIO instruments.
Scott
Post by 'Bryan C. Mills' via golang-dev
Post by Scott Cotton
Thank Ian,
For audio, there is a tendency to have user land but OS privileged
layer code
that uses special thread scheduling. For example, AAudio (SCHED_FIFO)
and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but
it is "real time" according to Apple) do this. There is a strong consensus
that this is necessary for reliable
scheduling of real-time audio (although I haven't personally had any
apparently scheduling
related problems myself outside of real-time thread context)
At any rate, there are different levels of interaction with Go implied
by this.
At the level of unprivileged access, Go would need to operate on
threads supplied
by the above systems. Presumably, this would be via cgo->go calls.
Ian: Was wondering
if the improvements you suggested were related to setting up the
Goroutine on the
foreign thread the first time, or w.r.t. checking the pointers and
everything for Go gc?
At the level of privileged access, Go could potentially eventually
offer a replacement for
things like AAudio and CoreAudio. It could use the native interface
(either cgo or sys calls, depending)
to generate such specially scheduled threads, and then use cgo->go to
start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N
goroutines to threads.
To my understanding, this is not currently possible with Goroutines
locked to threads, and
probably would violate some safety assumptions put on for foreign
threads in other types of applications.
But in this case, Go would control the "foreign" thread creation.
The M:N idea would in my estimation also be useful if applied in the
case of unprivileged
access. It would I guess mostly take the form of old GOMAXPROCS=1 type
behaviour.
My question to golang-dev as a whole is if it seems feasible to try to
make interoperability with
OS special scheduling characteristics of threads better, perhaps along
the lines above, and if anyone knows of other applications that fall in the
category of special OS thread scheduling (not cpu affinity) that would
benefit?
Best
Scott
Post by Scott Cotton
Post by Scott Cotton
I think LockOsThread could be used in these contexts, but it would be
impossible to do without risk on the first scheduling of a foreign
specially
Post by Scott Cotton
scheduled or real-time thread. It would also apparently have the
problems
Post by Scott Cotton
in the issue cited below.
I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.
Ian
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Scott Cotton
http://www.iri-labs.com
--
Scott Cotton
http://www.iri-labs.com
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-13 13:45:53 UTC
Permalink
Post by 'Bryan C. Mills' via golang-dev
Post by 'Bryan C. Mills' via golang-dev
- From a privileged C thread, invoke a cgo-exported Go function. The
Go function can loop (without returning) to perform whatever real-time work
is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
Yes I think that could work in the context of an OS privileged go
program providing audio. However, the buffering of channels to avoid
non-realtime scheduling timing of the program seems less flexible and less
reliable than the group idea to me.
Buffered channels are already in the language, and already useful
independent of realtime scheduling. It would be nice to see how far we can
get with existing features before we propose to add new ones. 🙂
Perhaps, will have to see how some other things advance and bandwidth.

I think improvements to relationship between Go's scheduler and OS
scheduling has long been a concern and improvements have already been
proposed, at least informally. Consensus to me seems so far to be that it
would be a good thing to address, not just for audio.

So I would like to propose not to de-prioritise addressing it while waiting
for a project without sufficient resources to plan releases.

Scott
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Scott Cotton
2018-09-13 23:17:34 UTC
Permalink
Post by 'Bryan C. Mills' via golang-dev
Post by 'Bryan C. Mills' via golang-dev
- From a privileged C thread, invoke a cgo-exported Go function.
The Go function can loop (without returning) to perform whatever real-time
work is needed, using buffered channels to communicate with the rest of the
program (and thereby avoid blocking the privileged thread).
Yes I think that could work in the context of an OS privileged go
program providing audio. However, the buffering of channels to avoid
non-realtime scheduling timing of the program seems less flexible and less
reliable than the group idea to me.
Buffered channels are already in the language, and already useful
independent of realtime scheduling. It would be nice to see how far we can
get with existing features before we propose to add new ones. 🙂
Also, I didn't propose that, Robert Engels did for HPC HFT, because he said
you can't do that without pinning threads to cpus and the like.

The question of supporting widely used OS capabilities like scheduling
characteristics is also a chicken and egg thing. If Go doesn't interface
nicely with widely used OS capabilities, it will weaken its prospects for
the applications which need that. For the case of audio, if Go can't say
it supports goroutines in specially scheduled OS threads, then my
impression is it won't be taken seriously for audio, and
then there would be fewer users for Go with audio. Given the recent use of
surveys and statistics to drive development, if this were fact it may be
used as an argument against taking action to support it in the future. But
IMO discouraging the support of widely used OS functionality is not really
the disposition a general purpose language should take, independent of my
interests in it.


Scott
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...