After setting up a coda server and client connections on the server and client machine (2 machines in my LAN involved), I'm seeing the following behaviour:
- Transferring data from ZFS to
/coda/[realm] with rsync the transfer rate is good (40MB/s) for some GB of data, then the transfer stalls so that the next file is only transferred after ~10minutes, sometimes stalls for hours, sometimes doesn't progress over night. venus on the server machine has
12:43:36 fatal error -- Recov_LoadRDS: heap mismatch (0x50000000, d0488000) vs (0x50000000, 2d0488000)
Assertion failed: 0, file "venusrecov.cc", line 519
***BackTrace***
/usr/sbin/venus(coda_assert+0x76)[0x562f976a5a66]
/usr/sbin/venus(_Z5chokePKciS0_z+0xc8)[0x562f97664428]
/usr/sbin/venus(_Z9RecovInitv+0x335)[0x562f976617f5]
/usr/sbin/venus(main+0x332)[0x562f976305d2]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fc1739433f1]
/usr/sbin/venus(_start+0x2a)[0x562f976328fa]
Sleeping forever. You may use gdb to attach to process 16261.
in the logs (the process has no backtrace in gdb).
venus randomly crashes on both machines due to
[42079.451522] coda: venus_pioctl: Venus returns: -22 for (00000001.ffffffff.fffffffc.00000000)
[42083.097729] coda: Unexpected interruption.
[42083.097735] coda: venus_pioctl: Venus returns: -4 for (00000004.01000001.00000001.00000001)
(dmesg displays coda: Venus dead, not sending upcall)
venus on the server machine furthermore crashes due to
13:21:29 fatal error -- fsobj::dir_Create: (.o6MxU0L3M1pnfonh4qr63fXobG5b2KtU,K04npr-Ldt841.Pfm1is, 2.7f000000.fffffffe.80f4f) Create failed 27!
13:21:30 RecovTerminate: dirty shutdown (1 uncommitted transactions)
Assertion failed: 0, file "fso_dir.cc", line 98
***BackTrace***
venus(coda_assert+0x76)[0x560019991a66]
venus(_Z5chokePKciS0_z+0xc8)[0x560019950428]
venus(_ZN5fsobj10dir_CreateEPKcP8VenusFid+0x12b)[0x56001993ae2b]
venus(_ZN5fsobj11LocalCreateEjPS_Pcjt+0x23)[0x5600199365e3]
venus(_ZN5fsobj18DisconnectedCreateEjjPPS_Pctii+0x291)[0x560019936951]
venus(_ZN5fsobj6CreateEPcPPS_jti+0x50)[0x560019936a00]
venus(_ZN5vproc6createEP11venus_cnodePcP10coda_vattriiS1_+0x2af)[0x56001997404f]
venus(_ZN6worker4mainEv+0x91d)[0x56001991dedd]
venus(_Z13VprocPreamblePv+0xbe)[0x56001996e0ae]
/usr/lib/coda/liblwp.so.2(+0x5d7c)[0x7f38193b2d7c]
/lib/x86_64-linux-gnu/libc.so.6(+0x357f0)[0x7f381876f7f0]
/lib/x86_64-linux-gnu/libc.so.6(sigsuspend+0x16)[0x7f381876fb26]
/usr/lib/coda/liblwp.so.2(lwp_makecontext+0x124)[0x7f38193b2f04]
which is not the same incident given the delay in time, but probably related. I/O operations with rsync on the server side then take for ever (> 30 minutes without progress without any noticable I/O in iotop).
- on the client machine I'm getting I/O erros and
[ W(13) : 0000 : 15:37:27 ] fsobj::TryToCover: vdb::Get(#@.Trash) failed (110)
[ W(13) : 0000 : 15:37:27 ] Allowing access to stale status! (key = <1.ff000001.fffffffc.7>)
[ W(13) : 0000 : 15:37:27 ] Allowing access to stale status! (key = <1.ff000001.1.1>)
[ W(13) : 0000 : 15:37:27 ] Allowing access to stale status! (key = <1.ff000001.fffffffc.8>)
[ W(13) : 0000 : 15:37:27 ] fsobj::TryToCover: vdb::Get(#@.Trash-1000) failed (110)
[ W(13) : 0000 : 15:37:27 ] Allowing access to stale status! (key = <1.ff000001.fffffffc.8>)
[ W(13) : 0000 : 15:37:46 ] Allowing access to stale status! (key = <1.ff000001.1.1>)
[ W(13) : 0000 : 15:37:46 ] Allowing access to stale status! (key = <1.ff000001.fffffffc.7>)
[ W(13) : 0000 : 15:37:46 ] fsobj::TryToCover: vdb::Get(#@.Trash) failed (110)
[ W(13) : 0000 : 15:37:46 ] Allowing access to stale status! (key = <1.ff000001.fffffffc.7>)
[ W(13) : 0000 : 15:37:46 ] Allowing access to stale status! (key = <1.ff000001.1.1>)
[ W(13) : 0000 : 15:37:46 ] Allowing access to stale status! (key = <1.ff000001.fffffffc.8>)
[ W(13) : 0000 : 15:37:46 ] fsobj::TryToCover: vdb::Get(#@.Trash-1000) failed (110)
[ W(13) : 0000 : 15:37:46 ] Allowing access to stale status! (key = <1.ff000001.fffffffc.8>)
[ W(13) : 0000 : 15:37:47 ] Allowing access to stale status! (key = <1.ff000001.1.1>)
[ W(13) : 0000 : 15:37:47 ] Allowing access to stale status! (key = <1.ff000001.fffffffc.7>)
[ W(13) : 0000 : 15:37:47 ] fsobj::TryToCover: vdb::Get(#@.Trash) failed (110)
[ W(13) : 0000 : 15:37:47 ] Allowing access to stale status! (key = <1.ff000001.fffffffc.7>)
[ W(13) : 0000 : 15:37:47 ] Allowing access to stale status! (key = <1.ff000001.1.1>)
[ W(13) : 0000 : 15:37:47 ] Allowing access to stale status! (key = <1.ff000001.fffffffc.8>)
[ W(13) : 0000 : 15:37:47 ] fsobj::TryToCover: vdb::Get(#@.Trash-1000) failed (110)
[ W(13) : 0000 : 15:37:47 ] Allowing access to stale status! (key = <1.ff000001.fffffffc.8>)
[ W(13) : 0000 : 15:37:47 ] Cachefile::SetLength 4096
in venus.log for some files only.
- As soon as
venus.log contains ***LWP (0x55dc733053c0): Select returns error: 4 the installation seems to be impossible to recover. I got into this state the last two times I wanted to get coda running and I'm now in it.
These issue might be separate or connected, I'll separate them into different reports if you explain me a separation criteria - it's just very hard to understand what's going on if crashes happen due to non-verbose/difficult to understand assertion failures.
I noticed that venus is started with a delay of some seconds which is unrelated to the coda-client systemd unit because it's stopped which might interfere with a venus -init which sometimes restores responsiveness of the client after a reboot, but causes data loss.
experienced with 6.11.2-1+ubuntu16.10 on Ubuntu 16.10 amd64
After setting up a coda server and client connections on the server and client machine (2 machines in my LAN involved), I'm seeing the following behaviour:
/coda/[realm]withrsyncthe transfer rate is good (40MB/s) for some GB of data, then the transfer stalls so that the next file is only transferred after ~10minutes, sometimes stalls for hours, sometimes doesn't progress over night.venuson the server machine hasin the logs (the process has no backtrace in
gdb).venusrandomly crashes on both machines due to(
dmesgdisplayscoda: Venus dead, not sending upcall)venuson the server machine furthermore crashes due towhich is not the same incident given the delay in time, but probably related. I/O operations with
rsyncon the server side then take for ever (> 30 minutes without progress without any noticable I/O iniotop).in
venus.logfor some files only.venus.logcontains***LWP (0x55dc733053c0): Select returns error: 4the installation seems to be impossible to recover. I got into this state the last two times I wanted to get coda running and I'm now in it.These issue might be separate or connected, I'll separate them into different reports if you explain me a separation criteria - it's just very hard to understand what's going on if crashes happen due to non-verbose/difficult to understand assertion failures.
I noticed that
venusis started with a delay of some seconds which is unrelated to thecoda-clientsystemdunit because it's stopped which might interfere with avenus -initwhich sometimes restores responsiveness of the client after a reboot, but causes data loss.experienced with 6.11.2-1+ubuntu16.10 on Ubuntu 16.10 amd64