runtime: FreeBSD memory corruption involving fork system call #15658

derekmarcotte · May 12, 2016

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.6.2 freebsd/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="freebsd"
GOOS="freebsd"
GOPATH=""
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/freebsd_amd64"
GO15VENDOREXPERIMENT="1"
CC="cc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="clang++"

What did you do?

package main

/* stdlib includes */
import (
        "fmt"
        "os/exec"
)

func run(done chan struct{}) {
        cmd := exec.Command("true")
        if err := cmd.Start(); err != nil {
                goto finished
        }

        cmd.Wait()

finished:
        done <- struct{}{}
        return
}

func main() {
        fmt.Println("Starting a bunch of goroutines...")

        // 8 & 16 are arbitrary
        done := make(chan struct{}, 16)

        for i := 0; i < 8; i++ {
                go run(done)
        }

        for {
                select {
                case <-done:
                        go run(done)
                }
        }
}

What did you expect to see?

I expect this strange program to spawn instances of /bin/true in parallel, until I stop it.

What did you see instead?

Various types of panics caused by what looks to be corruption within the finalizer lists, caused by what I am assuming is based on race conditions. These panics can happen as quickly as 2 minutes, or much longer. 10 minutes seems a good round number.

Occasionally addspecial gets stuck in an infinite loop holding the lock, and the process wedges. This is illustrated in log 1462933614, with x.next pointing to x. This appears to be corruption of that data structure. I have seen processes in this state run for 22 hours.

I understand there is some trepidation expressed in issue #11485 around the locking of the data structures involved.

Here are some sample messages:

fatal error: runtime.SetFinalizer: finalizer already set
runtime: nonempty check fails b.log[0]= 0 b.log[1]= 0 b.log[2]= 0 b.log[3]= 0 fatal error: workbuf is empty

1462926841-SetFinalizer-ex1.txt
1462926969-SetFinalizer-ex2.txt
1462933295-nonempty-check-fails.txt
1462933614-wedged.txt

This was run on an 8-core processor, and a 4-core 8-thread processor with ECC RAM, similar results.

Additionally, while this example is an extreme, it also represents the core functionality of a project I've been working on part-time for many months. I'm happy to provide any further assistance diagnosing this issue - I'm very invested!

derekmarcotte · May 12, 2016

Here's another panic experienced in mallocgc by the same sample code:

1463046620-malloc-panic.txt

bradfitz · May 12, 2016

@derekmarcotte, can you also reproduce this at master? (which will become Go 1.7)

And do you only see it on FreeBSD, or other operating systems as well?

/cc @aclements @ianlancetaylor

aclements · May 12, 2016

@RLH, weren't you seeing "finalizer already set" failures a little while ago? Did you track that down? Could it have been related?

aclements · May 12, 2016

On closer inspection of the failures you posted (thanks for collecting several, BTW), this smells like memory corruption. Just guessing, there are a few likely culprits. It may be the finalizer code, but I actually think that's less likely. More likely is that it's the fork/exec code: that code is really subtle, mucks with the address space, and contains system-specific parts (which would explain why it's showing up on FreeBSD, but I haven't been able to reproduce it on Linux yet).

@derekmarcotte, can you try commenting out the runtime.SetFinalizer call in newProcess in os/exec.go (your test doesn't need that finalizer) and see if you can still reproduce it? If you can, that will rule out finalizers.

bradfitz · May 12, 2016

Note that FreeBSD runs via gomote, if this is that easily reproducible. I haven't yet tried.

derekmarcotte · May 13, 2016

Just got a golang/go dev environment set up on my machine (was from FreeBSD packages). Will report back soon.

derekmarcotte · May 13, 2016

Here's the heads of a bunch of logs with the epoch at the start of the process, so you can see the interval. I suspected a race vs. memory corruption because by and large it is the finalizer already set that crashes the process. I thought maybe the gc was setting these as free (or otherwise touching them) before the SetFinalizer had a chance to set their value.

I didn't include too many of them in my initial report, because I thought they were largely redundant.

@bradfitz: these logs are against master:

1463168442
Starting a bunch of goroutines...
fatal error: runtime.SetFinalizer: finalizer already set

runtime stack:
runtime.throw(0x4c366d, 0x2b)
        /home/derek/go/src/github.com/golang/go/src/runtime/panic.go:566 +0x8b fp=0xc420059f48 sp=0xc420059f30
runtime.SetFinalizer.func2()
        /home/derek/go/src/github.com/golang/go/src/runtime/mfinal.go:375 +0x73 fp=0xc420059f80 sp=0xc420059f48
runtime.systemstack(0xc420019500)
        /home/derek/go/src/github.com/golang/go/src/runtime/asm_amd64.s:298 +0x79 fp=0xc420059f88 sp=0xc420059f80


1463170469
Starting a bunch of goroutines...
fatal error: runtime.SetFinalizer: finalizer already set

runtime stack:
runtime.throw(0x4c366d, 0x2b)
        /home/derek/go/src/github.com/golang/go/src/runtime/panic.go:566 +0x8b fp=0x7fffffffea80 sp=0x7fffffffea68
runtime.SetFinalizer.func2()
        /home/derek/go/src/github.com/golang/go/src/runtime/mfinal.go:375 +0x73 fp=0x7fffffffeab8 sp=0x7fffffffea80
runtime.systemstack(0x52ad00)
        /home/derek/go/src/github.com/golang/go/src/runtime/asm_amd64.s:298 +0x79 fp=0x7fffffffeac0 sp=0x7fffffffeab8


1463170494
Starting a bunch of goroutines...
fatal error: runtime.SetFinalizer: finalizer already set

runtime stack:
runtime.throw(0x4c366d, 0x2b)
        /home/derek/go/src/github.com/golang/go/src/runtime/panic.go:566 +0x8b fp=0xc420067f48 sp=0xc420067f30
runtime.SetFinalizer.func2()
        /home/derek/go/src/github.com/golang/go/src/runtime/mfinal.go:375 +0x73 fp=0xc420067f80 sp=0xc420067f48
runtime.systemstack(0xc420019500)
        /home/derek/go/src/github.com/golang/go/src/runtime/asm_amd64.s:298 +0x79 fp=0xc420067f88 sp=0xc420067f80


1463171133
Starting a bunch of goroutines...
fatal error: runtime.SetFinalizer: finalizer already set

runtime stack:
runtime.throw(0x4c366d, 0x2b)
        /home/derek/go/src/github.com/golang/go/src/runtime/panic.go:566 +0x8b fp=0xc4202cbf48 sp=0xc4202cbf30
runtime.SetFinalizer.func2()
        /home/derek/go/src/github.com/golang/go/src/runtime/mfinal.go:375 +0x73 fp=0xc4202cbf80 sp=0xc4202cbf48
runtime.systemstack(0xc42001c000)
        /home/derek/go/src/github.com/golang/go/src/runtime/asm_amd64.s:298 +0x79 fp=0xc4202cbf88 sp=0xc4202cbf80

@aclements: I will try your patch next. One variable at a time.

RLH · May 14, 2016

The "fatal error: runtime.SetFinalizer: finalizer already set" bug I was
seeing a few days ago were on the 1.8 dev.garbage branch and the result of
a TOC write barrier not marking an object as being published, TOC doing a
rollback, and reallocating a new object over an old one. Both objects had
finalizers and the bug was tripped.

None of the TOC code is in 1.7 or for that matter on TIP. I can force
myself to imagine that the 1.7 allocCache code could cause a similar
situation if the allocCache and bitmaps were not coherent. but if that were
the case it would expect lots of failures all over the place and across all
platforms, not just freeBSD.

On Fri, May 13, 2016 at 6:47 PM, Derek Marcotte notifications@github.com
wrote:

Here's the heads of a bunch of logs with the epoch at the start of the
process, so you can see the interval. I suspected a race vs. memory
corruption because by and large it is the finalizer already set that
crashes the process. I thought maybe the gc was setting these as free (or
otherwise touching them) before the SetFinalizer had a chance to set their
value.

I didn't include too many of them in my initial report, because I thought
they were largely redundant.

@bradfitz https://github.com/bradfitz: these logs are against master:

1463168442
Starting a bunch of goroutines...
fatal error: runtime.SetFinalizer: finalizer already set

runtime stack:
runtime.throw(0x4c366d, 0x2b)
/home/derek/go/src/github.com/golang/go/src/runtime/panic.go:566 +0x8b fp=0xc420059f48 sp=0xc420059f30
runtime.SetFinalizer.func2()
/home/derek/go/src/github.com/golang/go/src/runtime/mfinal.go:375 +0x73 fp=0xc420059f80 sp=0xc420059f48
runtime.systemstack(0xc420019500)
/home/derek/go/src/github.com/golang/go/src/runtime/asm_amd64.s:298 +0x79 fp=0xc420059f88 sp=0xc420059f80

1463170469
Starting a bunch of goroutines...
fatal error: runtime.SetFinalizer: finalizer already set

runtime stack:
runtime.throw(0x4c366d, 0x2b)
/home/derek/go/src/github.com/golang/go/src/runtime/panic.go:566 +0x8b fp=0x7fffffffea80 sp=0x7fffffffea68
runtime.SetFinalizer.func2()
/home/derek/go/src/github.com/golang/go/src/runtime/mfinal.go:375 +0x73 fp=0x7fffffffeab8 sp=0x7fffffffea80
runtime.systemstack(0x52ad00)
/home/derek/go/src/github.com/golang/go/src/runtime/asm_amd64.s:298 +0x79 fp=0x7fffffffeac0 sp=0x7fffffffeab8

1463170494
Starting a bunch of goroutines...
fatal error: runtime.SetFinalizer: finalizer already set

runtime stack:
runtime.throw(0x4c366d, 0x2b)
/home/derek/go/src/github.com/golang/go/src/runtime/panic.go:566 +0x8b fp=0xc420067f48 sp=0xc420067f30
runtime.SetFinalizer.func2()
/home/derek/go/src/github.com/golang/go/src/runtime/mfinal.go:375 +0x73 fp=0xc420067f80 sp=0xc420067f48
runtime.systemstack(0xc420019500)
/home/derek/go/src/github.com/golang/go/src/runtime/asm_amd64.s:298 +0x79 fp=0xc420067f88 sp=0xc420067f80

1463171133
Starting a bunch of goroutines...
fatal error: runtime.SetFinalizer: finalizer already set

runtime stack:
runtime.throw(0x4c366d, 0x2b)
/home/derek/go/src/github.com/golang/go/src/runtime/panic.go:566 +0x8b fp=0xc4202cbf48 sp=0xc4202cbf30
runtime.SetFinalizer.func2()
/home/derek/go/src/github.com/golang/go/src/runtime/mfinal.go:375 +0x73 fp=0xc4202cbf80 sp=0xc4202cbf48
runtime.systemstack(0xc42001c000)
/home/derek/go/src/github.com/golang/go/src/runtime/asm_amd64.s:298 +0x79 fp=0xc4202cbf88 sp=0xc4202cbf80

@aclements https://github.com/aclements: I will try your patch next.
One variable at a time.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#15658 (comment)

derekmarcotte · May 15, 2016

@aclements: I've run it with your patch, although I haven't been able to babysit it too much.

The first time, all threads were idle after a number of hours (i.e. 0% cpu across the board). Connecting gdb to that process gave me trouble, and I couldn't get any logging out of it.

This morning, I was able to connect to a different process that looks a lot like 1462933614-wedged.txt. I've attached a log from gdb there:

1463270694.txt

Will keep trying to come up with more info.

derekmarcotte · May 16, 2016

@aclements: Here's some more logs from a binary build with the patch:

1463315708-finalizer-already-set.txt
1463349807-finalizer-already-set.txt
1463352601-workbuf-not-empty.txt
1463362849-workbuf-empty.txt
1463378745-wedged-gdb.txt

Please let me know if I can be of further assistance?

aclements · May 16, 2016

Thanks for the logs! I hadn't realized there were two finalizers involved here. Could you also comment out the SetFinalizer in NewFile in os/file_unix.go and see if it's still reproducible? (Your test also doesn't need that finalizer.)

aclements · May 16, 2016

I suspected a race vs. memory corruption because by and large it is the finalizer already set that crashes the process.

I didn't mean to say that it isn't necessarily a race. It's actually quite likely a race, but it's resulting in corruption of internal runtime structures, which suggests that the race is happening on freed memory. The "workbuf is empty" failure mode especially points at memory corruption, which is why my initial guess is that the finalizers (and the specials queue in general) may be victims rather than perpetrators. It's also easy to get finalizers out of the picture, while it's harder to get fork/exec out of the picture without completely changing the program. :)

derekmarcotte · May 17, 2016

Thanks @aclements !

One crash, ~4 hours into running, since removing the SetFinalizer in NewFile.

1463429459.txt

I have a second process running for almost 11 hours, with 4 of the threads wedged, but it is still doing work.

derekmarcotte · May 17, 2016

After ~11 hours + 5 minutes, the process panic'd:

1463445934-invalid-p-state.txt

aclements · May 17, 2016

Thanks for the new logs. I suspect that's actually not the same failure, which suggests that the original problem is in fact related to finalizers.

derekmarcotte · May 17, 2016

My pleasure, thanks for the feedback. Are there any next steps for me? (I'm likely to poke around this bottom issue, although I'm neither a go runtime guy, nor a FreeBSD systems-level guy - yet. Would like to be as helpful as possible.)

Thanks again!

derekmarcotte · May 19, 2016

I'm going to post a few more here. This'll be my last batch, unless requested. 😄 1463584079 is a new message.

 ==> 1463485780 <==
Starting a bunch of goroutines...

fatal error: workbuf is not empty

runtime stack:
runtime.throw(0x4bff02, 0x14)
        /home/derek/go/src/github.com/golang/go/src/runtime/panic.go:566 +0x8b
runtime.(*workbuf).checkempty(0x8007dae00)
        /home/derek/go/src/github.com/golang/go/src/runtime/mgcwork.go:301 +0x3f
runtime.getempty(0x8007dae00)

==> 1463584079 <==
Starting a bunch of goroutines...
fatal error: MHeap_AllocLocked - MSpan not free

runtime stack:
runtime.throw(0x4c2528, 0x22)
        /home/derek/go/src/github.com/golang/go/src/runtime/panic.go:566 +0x8b
runtime.(*mheap).allocSpanLocked(0x52dbe0, 0x1, 0xc4200ae1a0)
        /home/derek/go/src/github.com/golang/go/src/runtime/mheap.go:637 +0x498
runtime.(*mheap).alloc_m(0x52dbe0, 0x1, 0x12, 0xc420447fe0)
        /home/derek/go/src/github.com/golang/go/src/runtime/mheap.go:510 +0xd6

==> 1463603516 <==
Starting a bunch of goroutines...
acquirep: p->m=842351485952(10) p->status=1
acquirep: p->m=842350813184(4) p->status=1
fatal error: acquirep: invalid p state
fatal error: acquirep: invalid p state

runtime stack:
runtime.throw(0x4c0ab3, 0x19)
        /home/derek/go/src/github.com/golang/go/src/runtime/panic.go:566 +0x8b
runtime.acquirep1(0xc420015500)

==> 1463642257 <==
Starting a bunch of goroutines...
acquirep: p->m=0(0) p->status=2
fatal error: acquirep: invalid p state
acquirep: p->m=0(0) p->status=3
fatal error: acquirep: invalid p state

runtime stack:
runtime.throw(0x4c0ab3, 0x19)
        /home/derek/go/src/github.com/golang/go/src/runtime/panic.go:566 +0x8b
runtime.acquirep1(0xc42001c000)

aclements · May 19, 2016

Thanks! I assume these last few failures are also with master and with the two SetFinalizer calls commented out?

derekmarcotte · May 19, 2016

That's correct! Thanks. Anyone else able to reproduce?

aclements · May 26, 2016

@derekmarcotte, what version of FreeBSD and how many CPUs are you running with? I haven't had any luck yet reproducing on FreeBSD 10.1 with 2 CPUs.

derekmarcotte · May 26, 2016

@aclements: both machines were 4 core, 8 thread CPUs:

Intel(R) Xeon(R) CPU E5-2637 v3 @ 3.50GHz (32GB/ECC) - none of the logs included were from this machine, but could not keep my process (of my project, not the listing in this issue) running for very long

AMD FX(tm)-8350 Eight-Core Processor (8GB) - this is my dev box where all the logs are included

The Xeon is running 10.3-RELEASE, and the AMD was running 10.1-RELEASE at the time of the logs (has since been upgraded to 10.3-RELEASE).

I suspect I would be able to chew through many more invocations in the same time as a 2 core machine on these hosts, and additionally increase probability of contention/collisions in any given instant.

The Xeon has since moved to production, so I don't have that hardware at my disposal for the time being, although I might be able to arrange something if it's required.

I can get dmesgs/kldstats for the xeon, and amd if helpful (would rather post out of band).

aclements · May 31, 2016

@derekmarcotte, thanks for the extra details (I see now you already gave the CPU configuration; sorry I missed that).

Two more experiments to try:

See if you can reproduce with GOMAXPROCS=2 (either 1.6.2 or master is fine).
Try the same test program, but invoking an absolute path to a command that doesn't exist (and ignoring the error). This should get the exec out of the picture.

derekmarcotte · Jun 1, 2016

@aclements, Thanks for your suggestions. I'm currently exploring a different option. I was using gb for building my project (sorry! forgot to mention), and additionally for this test case.

I certainly didn't expect wildly differing binaries in a project with no external dependencies, as gb uses the go compiler internally. I've got more research to do here to account for this, so I apologize for that.

I've built using go directly and am in the process of testing. So far it has been running for 12 hours, without problem (with the SetFinalizers disabled). I have had previous test runs last this long, so I'm not ready to call it a success just yet. I'll be out of town for the next few days, so I can leave it running for a while and see where it ends up.

I think this is a very promising lead, based on the objdump of the two artefacts. It might be interesting to include in the Issue Report template, with the build tool ecosystem that is currently out there (or that there is an ecosystem at all).

derekmarcotte · Jun 1, 2016

@aclements, rebuilding gb from source using the go based on master, and then rebuilding the test with the new gb creates nearly identical binaries (minus file path TEXT entries), this is to be expected.

Perhaps there's something to this. Will keep you posted.

josharian · Jun 1, 2016

cc @davecheney

derekmarcotte · Jun 2, 2016

@aclements, the go-only build of the binary did eventually crash... somewhere around 24 hours (~153M goroutines later), so I don't think it's gb-related. I'm going to re-build with the following, per your suggestion:

func run(done chan struct{}) {
        cmd := exec.Command("doesnotexist")
        cmd.Wait()

        done <- struct{}{}
        return
}

derekmarcotte · Jun 6, 2016

@alements, after the above change, it ran for 4 days without a panic.

aclements · Jun 6, 2016

@derekmarcotte, argh, I'm sorry. That wasn't a useful test. :( That eliminates the fork, the exec, and the finalizer, so there isn't really anything left. It needs to be something like:

func run(done chan struct{}) {
    cmd := exec.Command("/doesnotexist")
    cmd.Start()
    cmd.Wait()

    done <- struct{}{}
    return
}

It's important that the path be absolute. Otherwise, Go will try to resolve the path itself, fail at that point, and not do anything for the subsequent method calls. It's also important that the cmd.Start() still happen (even though it will fail). Again, without that, the Wait just returns nil immediately.

I'm sorry I didn't catch that earlier.

derekmarcotte · Jun 9, 2016

@aclements, at least we know that Go can spin for 4 days without crashing on my dev machine - so not a total waste.

I've recompiled with the above, and crashed after ~32 hours. Log attached:

doesnotexists.txt

aclements · Jun 9, 2016

@derekmarcotte, thanks! Was that with go or gb? With or without the SetFinalizer calls?

aclements · Jun 9, 2016

Also, 1.6.2 or master?

derekmarcotte · Jun 9, 2016

@aclements great questions. Was thinking we aught to recap all the variables at this point anyways.

Trying to narrow things down as much as possible. So, this is with go build using master (as of 7af2ce3), with this diff applied:

diff --git a/src/os/exec.go b/src/os/exec.go
index 239fd92..6a8eed5 100644
--- a/src/os/exec.go
+++ b/src/os/exec.go
@@ -5,7 +5,6 @@
 package os

 import (
-       "runtime"
        "sync/atomic"
        "syscall"
 )
@@ -19,7 +18,6 @@ type Process struct {

 func newProcess(pid int, handle uintptr) *Process {
        p := &Process{Pid: pid, handle: handle}
-       runtime.SetFinalizer(p, (*Process).Release)
        return p
 }

diff --git a/src/os/file_unix.go b/src/os/file_unix.go
index 9b64f21..a997e9e 100644
--- a/src/os/file_unix.go
+++ b/src/os/file_unix.go
@@ -54,7 +54,6 @@ func NewFile(fd uintptr, name string) *File {
                return nil
        }
        f := &File{&file{fd: fdi, name: name}}
-       runtime.SetFinalizer(f.file, (*file).close)
        return f
 }

with source:

package main

/* stdlib includes */
import (
    "fmt"
    "os/exec"
)

func run(done chan struct{}) {
    cmd := exec.Command("/doesnotexist")
    cmd.Start()
    cmd.Wait()

    done <- struct{}{}
    return
}

func main() {
    fmt.Println("Starting a bunch of goroutines...")

    // 8 & 16 are arbitrary
    done := make(chan struct{}, 16)

    for i := 0; i < 8; i++ {
        go run(done)
    }

    for {
        select {
        case <-done:
            go run(done)
        }
    }
}

ianlancetaylor · Jun 14, 2016

It sounds like this is FreeBSD-specific, so marking as such. Please correct me if I am wrong.

derekmarcotte · Jun 15, 2016

@ianlancetaylor : thanks for this. I'm just building a linux box dedicated to testing tonight. I haven't yet been able to test on linux. I'll update the thread as I find more information out.

derekmarcotte · May 25, 2017

The last notable status I have is from March. As far as I know, this is still an issue. I'm at the limits of my understanding/ability and have directed my efforts to places where I can be more helpful. If someone wanted to point me at an instance, I would be happy to collaborate, and get it set-up and repeatable?

derekmarcotte · May 25, 2017

Confirming this is still an issue on 10.3-RELEASE, against master (a9d8d4d):

acquirep: p->m=842351116288(6) p->status=1
fatal error: acquirep: invalid p state

rs · May 28, 2017

Wouldn't using vfork or posix_spawn as suggested in #5838 be a workaround to this issue?

stevenh · May 28, 2017

The kernel code underneath fork is he same as vfork so unlikely

kostikbel · May 28, 2017

Do you know what the issue is ? Do you know that the issue is in FreeBSD kernel and not in the go runtime ?

Also, even if the issue is apparently in the kernel, fork vs. vfork difference is exactly where VM is involved. So, although not definitive, the experiment might give some hints where to look next.

stevenh · May 28, 2017

No we still don't know if it's kernel or runtime. I've tried lots of kernel options on the raw kernel forks but none made any deference.

I've tried replicating with c code and have been unable to do so.

Even with the best reproduction case I have it always takes quite a number of attempts.

Given this it seems it requires complex threading interaction during the actual fork kernel call.

derekmarcotte · Jun 16, 2017

Based on the fact that the same symptoms are reproducible under linux, does it make sense to no longer call this a FreeBSD issue?

rsc · Jun 16, 2017

Let's not merge them until we know it's the same underlying cause. Same symptoms can be misleading.

rsc · Jun 16, 2017

(Sorry, wrong button.)

derekmarcotte · Jun 16, 2017

@rsc: Thanks. Agreed that there may be other things going on here.

Having said that, the issue I'm referencing is having the same crashes (fatal error: workbuf is not empty, fatal error: acquirep: invalid p state), in the same places, under Linux running the same test as in FreeBSD. I suspect this should be enough to invalidate the change to FreeBSD-only, yet perhaps not enough to merge the two issues.

I haven't seen the fatal error: unexpected signal during runtime execution referenced there, for example, and I've seen a lot of crashes.

jclinton · Jun 25, 2017

Are the folks experiencing this running on Skylake or Kaby Lake hardware? If so, this might be relevant: https://lists.debian.org/debian-devel/2017/06/msg00308.html . Please try updating your CPU microcode to the latest release.

stevenh · Jun 25, 2017

Nope Nehalem EP here.

ianlancetaylor · Aug 2, 2017

I guess we still don't know what is happening. Punting to 1.10.

reezer · Aug 25, 2017

I agree @derekmarcotte. I think it doesn't make sense to state something is FreeBSD related when something very similar happens on at least one other OS too.

If this gets changed it should be mentioned in the release notes as well. However there it might be enough to simply mention Linux and reference the according issue.

dankm · Sep 19, 2017

Might this change to FreeBSD be related? I doesn't seem likely to me, but others here know more about how go works than I do.

stevenh · Sep 20, 2017

It looks like go only uses LDT on 386 and not amd64 so I suspect this wouldn't have any impact @dankm

stevenh · Oct 11, 2017

Just wanted to confirm this is still an issue with 1.9.1 which includes the signal handling fixes:

2017/10/10 22:57:08 Starting 8 forking goroutines...
2017/10/10 22:57:08 GOMAXPROCS: 24 version: go1.9.1
acquirep: p->m=842351920128(9) p->status=1
fatal error: acquirep: invalid p state

runtime stack:
runtime.throw(0x7a8101, 0x19)
        /usr/local/go/src/runtime/panic.go:605 +0x95
runtime.acquirep1(0xc42001ec00)
        /usr/local/go/src/runtime/proc.go:3689 +0x16d
runtime.acquirep(0xc42001ec00)
        /usr/local/go/src/runtime/proc.go:3661 +0x2b
runtime.findrunnable(0xc420021300, 0x0)
        /usr/local/go/src/runtime/proc.go:2068 +0x74f
runtime.schedule()
        /usr/local/go/src/runtime/proc.go:2245 +0x12c
runtime.park_m(0xc420152900)
        /usr/local/go/src/runtime/proc.go:2308 +0xb6
runtime.mcall(0x800a0b040)
        /usr/local/go/src/runtime/asm_amd64.s:286 +0x5b
...
1 runs completed, 1 failures, over 6h47m54s

blinkingled · Dec 8, 2017

I was running the original test case posted above on DragonFlyBSD out of curiosity mostly and although I don't see any crashes/corruption in the short time I let it run, on 5.0.2 after running the testcase I consistently get this in syslog -

td 0xffffff81dd9a6900 test: timeout race
td 0xffffff81e01e7400 test: timeout race

This is coming from sys/kern/kern_synch.c:716 - which is tsleep() -

 /*
         * Cleanup the timeout.  If the timeout has already occured thandle
         * has already been stopped, otherwise stop thandle.  If the timeout
         * is running (the callout thread must be blocked trying to get
         * lwp_token) then wait for us to get scheduled.
         */
        if (timo) {
                while (td->td_flags & TDF_TIMEOUT_RUNNING) {
                        /* else we won't get rescheduled! */
                        if (lp->lwp_stat != LSSTOP)
                                lp->lwp_stat = LSSLEEP;
                        lwkt_deschedule_self(td);
                        td->td_wmesg = "tsrace";
                        lwkt_switch();
                        kprintf("td %p %s: timeout race\n", td, td->td_comm);
                }

I am not sure this is even relevant - I haven't looked deeper, but thought I would throw this out just in case.

vangyzen · Feb 15, 2018

I haven't read the details of this issue, but even so, it could be fixed by this recent change in the FreeBSD VM system:

https://svnweb.freebsd.org/base?view=revision&revision=329254

I you want to test that change, you'll also need this one:

https://svnweb.freebsd.org/base?view=revision&revision=329252

stevenh · Feb 15, 2018

It's not clear if these two kernel fixes will help fix the issues identified here but they are definitely in the right area, particularly the second one.
Do not call pmap_enter() with invalid protection mode.
Ensure memory consistency on COW

I'm trying to confirm either way if the above do have an impact, tests in progress. I'll report back when I have more, but if anyone else has a good reproduction environment which they can apply the above patches and test that would be great.

stevenh · Feb 15, 2018

Hehe, I guess I should have refreshed my page before replying, thanks @vangyzen :)

stevenh · Feb 17, 2018

Having applied the two patches I've unfortunately manged to confirm it hasn't solved the issue as I had a crash:

acquirep: p->m=5491744(0) p->status=1
fatal error: acquirep: invalid p state

bradfitz added this to the Go1.7Maybe milestone May 12, 2016

bradfitz assigned aclements May 12, 2016

derekmarcotte referenced this issue in derekmarcotte/hfm May 16, 2016
Open
FreeBSD instability: go runtime panics #1

ianlancetaylor changed the title from runtime: garbage collector race condition (SetFinalizer, addfinalizer, addspecial, removespecial) to runtime: FreeBSD memory corruption involving finalizers Jun 14, 2016

ianlancetaylor added the OS-FreeBSD label Jun 14, 2016

adg added this to the Go1.8 milestone Jun 20, 2016

rhysh referenced this issue Jun 2, 2017
Closed
runtime: memory corruption crashes with os/exec on Linux kernel 4.4 #20427

bradfitz deleted a comment from fcntl Jun 16, 2017

rsc closed this Jun 16, 2017

rsc reopened this Jun 16, 2017

aclements referenced this issue Aug 2, 2017
Closed
runtime: crash in runSafepointFn apparently because sched.safePointFn is nil #21276

ianlancetaylor modified the milestones: Go1.10, Go1.9 Aug 2, 2017

ianlancetaylor added the release-blocker label Aug 2, 2017

justinclift referenced this issue in sqlitebrowser/dbhub.io Aug 14, 2017
Open
Cannot connect #73

stevenh referenced this issue in shirou/gopsutil Oct 4, 2017
Open
Use of external commands causes binaries using gopsutil to crash #430

stevenh referenced this issue in shirou/gopsutil Oct 5, 2017
Merged
Eliminate use of sysctl command on FreeBSD #432

marcan referenced this issue in prometheus/node_exporter Nov 8, 2017
Open
Various crashes/segfaults on one host #730

ianlancetaylor removed the release-blocker label Jan 3, 2018

ianlancetaylor modified the milestones: Go1.10, Go1.11 Jan 3, 2018

golang/go

Join GitHub today

runtime: FreeBSD memory corruption involving fork system call #15658

Comments

derekmarcotte commented May 12, 2016 • edited Edited 1 time derekmarcotte May 12, 2016

derekmarcotte commented May 12, 2016

bradfitz added this to the Go1.7Maybe milestone May 12, 2016

bradfitz commented May 12, 2016

bradfitz assigned aclements May 12, 2016

aclements commented May 12, 2016

aclements commented May 12, 2016

bradfitz commented May 12, 2016

derekmarcotte commented May 13, 2016 • edited Edited 1 time derekmarcotte May 13, 2016

derekmarcotte commented May 13, 2016

RLH commented May 14, 2016

derekmarcotte commented May 15, 2016 • edited Edited 1 time derekmarcotte May 16, 2016

derekmarcotte commented May 16, 2016 • edited Edited 1 time derekmarcotte May 16, 2016

aclements commented May 16, 2016

aclements commented May 16, 2016

derekmarcotte referenced this issue in derekmarcotte/hfm May 16, 2016

FreeBSD instability: go runtime panics #1

derekmarcotte commented May 17, 2016 • edited Edited 1 time derekmarcotte May 17, 2016

derekmarcotte commented May 17, 2016

aclements commented May 17, 2016

derekmarcotte commented May 17, 2016

derekmarcotte commented May 19, 2016 • edited Edited 1 time derekmarcotte May 19, 2016

aclements commented May 19, 2016

derekmarcotte commented May 19, 2016

aclements commented May 26, 2016

derekmarcotte commented May 26, 2016

aclements commented May 31, 2016

derekmarcotte commented Jun 1, 2016

derekmarcotte commented Jun 1, 2016

josharian commented Jun 1, 2016

derekmarcotte commented Jun 2, 2016 • edited Edited 1 time derekmarcotte Jun 2, 2016

derekmarcotte commented Jun 6, 2016

aclements commented Jun 6, 2016

derekmarcotte commented Jun 9, 2016

aclements commented Jun 9, 2016

aclements commented Jun 9, 2016

derekmarcotte commented Jun 9, 2016

ianlancetaylor commented Jun 14, 2016

ianlancetaylor changed the title from runtime: garbage collector race condition (SetFinalizer, addfinalizer, addspecial, removespecial) to runtime: FreeBSD memory corruption involving finalizers Jun 14, 2016

ianlancetaylor added the OS-FreeBSD label Jun 14, 2016

derekmarcotte commented Jun 15, 2016

adg added this to the Go1.8 milestone Jun 20, 2016

derekmarcotte commented May 25, 2017

derekmarcotte commented May 25, 2017

rs commented May 28, 2017

stevenh commented May 28, 2017

kostikbel commented May 28, 2017

stevenh commented May 28, 2017

rhysh referenced this issue Jun 2, 2017

runtime: memory corruption crashes with os/exec on Linux kernel 4.4 #20427

bradfitz deleted a comment from fcntl Jun 16, 2017

derekmarcotte commented Jun 16, 2017

rsc commented Jun 16, 2017

rsc closed this Jun 16, 2017

rsc reopened this Jun 16, 2017

rsc commented Jun 16, 2017

derekmarcotte commented Jun 16, 2017

jclinton commented Jun 25, 2017

stevenh commented Jun 25, 2017

aclements referenced this issue Aug 2, 2017

runtime: crash in runSafepointFn apparently because sched.safePointFn is nil #21276

ianlancetaylor commented Aug 2, 2017

ianlancetaylor modified the milestones: Go1.10, Go1.9 Aug 2, 2017

ianlancetaylor added the release-blocker label Aug 2, 2017

justinclift referenced this issue in sqlitebrowser/dbhub.io Aug 14, 2017

Cannot connect #73

reezer commented Aug 25, 2017 • edited Edited 1 time reezer Aug 25, 2017

dankm commented Sep 19, 2017

stevenh commented Sep 20, 2017

stevenh referenced this issue in shirou/gopsutil Oct 4, 2017

Use of external commands causes binaries using gopsutil to crash #430

stevenh added a commit to stevenh/gopsutil that referenced this issue Oct 5, 2017

stevenh added a commit to stevenh/gopsutil that referenced this issue Oct 5, 2017

stevenh referenced this issue in shirou/gopsutil Oct 5, 2017

Eliminate use of sysctl command on FreeBSD #432

stevenh commented Oct 11, 2017 • edited Edited 1 time stevenh Oct 11, 2017

derekmarcotte commented May 12, 2016 •

Edited 1 time

derekmarcotte
May 12, 2016

derekmarcotte commented May 13, 2016 •

Edited 1 time

derekmarcotte
May 13, 2016

derekmarcotte commented May 15, 2016 •

Edited 1 time

derekmarcotte
May 16, 2016

derekmarcotte commented May 16, 2016 •

Edited 1 time

derekmarcotte
May 16, 2016

derekmarcotte commented May 17, 2016 •

Edited 1 time

derekmarcotte
May 17, 2016

derekmarcotte commented May 19, 2016 •

Edited 1 time

derekmarcotte
May 19, 2016

derekmarcotte commented Jun 2, 2016 •

Edited 1 time

derekmarcotte
Jun 2, 2016

reezer commented Aug 25, 2017 •

Edited 1 time

reezer
Aug 25, 2017

stevenh commented Oct 11, 2017 •

Edited 1 time

stevenh
Oct 11, 2017

vangyzen commented Feb 15, 2018 •

Edited 1 time

vangyzen
Feb 15, 2018

stevenh commented Feb 17, 2018 •

Edited 1 time

stevenh
Feb 17, 2018