00:52:45 steves_logging has quit 00:55:01 steves_logging has joined #emc-devel 01:34:43 jmkasunich has joined #emc-devel 01:55:50 skunkworks has joined #emc-devel 01:55:55 hi all 01:55:56 hey 01:56:09 hi 01:57:46 ok, how do I ssh into skunkworks broken box? 01:58:06 can I post my ip here 01:58:15 better to priv it 01:58:33 good 01:59:11 I don't think I can start a pm - I don't think I can for some reason 01:59:57 try again 02:00:13 (I signed in with my password and turned "unfiltered" on 02:00:27 (freenode has been doing stuff with privmsg to deal with spam 02:02:38 when you have the IP and username, just ssh username@1.2.3.4 02:02:46 then you'll get a password prompt 02:03:58 ok, I'm logged into skunkworks's box 02:04:17 it's the installed deb that fails, or a cvs checkout, or both? 02:05:11 tap, tap, is this thing on? 02:05:42 I see a CVS checkout by alex 02:06:57 I did the cradek isntall first - that didnt work - then alex while goofing around installed the source and tried to compile it - I don't think that worked either 02:07:16 I tried to re-run ./configure, and it seems to be hung 02:07:36 crap 02:07:50 if I give a ctrl-C, will that kill the ssh session, or go to the remote box? 02:07:57 it'll go to the remote box 02:08:01 I cannot reboot it from here. :( 02:08:04 it will work just like a local shell 02:09:01 skunkworks - dunno if the box is hung, maybe just my session 02:09:04 trying another one 02:09:40 box is OK, I have another ssh shell 02:09:48 where did configure stop? 02:09:54 good 02:09:58 checking for ranlib 02:10:06 strange 02:10:14 WTF!? the second shekk stopped in the middle of "ps -A" 02:10:27 got the header line only 02:10:38 skunkworks: can I have the login information too? 02:10:40 this is probably an ssh problem 02:10:45 let me try it 02:10:50 I use ssh 100 times a day 02:10:59 im me 02:11:06 pm what ever 02:11:15 skunkworks, I can pass it to him 02:11:30 got it 02:11:35 ok 02:12:01 cradek: first thing, find out what my shells are doing 02:12:05 it's nice and responsive here 02:12:30 nothing - they're just idle at bash 02:12:31 poor machine ;) 02:12:32 I have a NAT router, think that might be messing it up? 02:12:37 12597 ? S 0:00 sshd: sam@pts/2 02:12:37 12598 pts/2 Ss+ 0:00 -bash 02:12:37 10932 ? S 0:00 sshd: sam@pts/0 02:12:37 10933 pts/0 Ss+ 0:00 -bash 02:12:46 violated everyway from sunday 02:12:58 no, nat is unlikely to mess up ssh 02:13:12 the configure and ps -A probably ran fine, but my link got fscked somehow 02:13:19 I agree 02:13:31 how do I terminate my login? 02:13:45 RETURN ~ . 02:14:10 configure runs fine for me... 02:14:17 wonder what's wrong between you two 02:14:21 that would be wonderfull if I could actually type anything into the shell 02:14:30 you can just type it blind 02:14:37 that's a command to your local ssh 02:14:42 tried, no joy 02:14:48 all caps on the RETURN? 02:14:56 no, sorry, the return key 02:14:59 uh, enter 02:15:12 , then '~', then '.'? 02:15:17 yes 02:15:28 that should terminate the ssh 02:15:31 that worked 02:15:49 ok, both conns closee 02:15:56 closed even 02:16:05 I would not expect problems with this... very strange 02:16:12 ssh always works 02:16:32 not for me 02:16:37 23664 ? D 0:00 /sbin/insmod /usr/realtime-2.6.12-magma/modules/rtai_up.ko 02:16:40 23665 ? D 0:00 /sbin/insmod /usr/realtime-2.6.12-magma/modules/rtai_up.ko 02:16:43 there are hung insmods 02:16:51 just tried again, logged in, ps -A printed header and nothing else 02:17:11 dmesg? 02:17:25 ... 02:17:27 [ 5035.309916] HAL: thread created 02:17:27 [ 5035.309940] MOTION: setting Traj cycle time to 10000000 nsecs 02:17:27 [ 5035.309947] MOTION: setting Servo cycle time to 1000000 nsecs 02:17:27 [ 5035.309951] MOTION: init_threads() complete 02:17:29 [ 5035.309954] MOTION: init_module() complete 02:17:37 looks ok 02:17:45 so it loaded at least once 02:17:58 there's a motmod etc loaded, I'm going to try to unload them 02:18:07 rtai_up.ko is the rtai uniprocessor scheduler 02:18:15 crap, sudo wants a password 02:18:24 should be the same 02:18:36 it is the only user on the machine as far as I know 02:18:36 aha 02:18:43 yeah, user pw, not root 02:18:55 we've had problems with that before 02:19:26 ok all modules unloaded nicely 02:19:31 rtai_ksched (or rtai_sched, don't recall) is a symlink to rtai_up (or _smp, or maybe even another one) 02:19:52 you can load using the symlink, but you must unload with the real name, or something like that 02:19:58 right 02:20:02 realtime start works fine 02:20:26 Module Size Used by 02:20:26 hal_lib 24460 0 02:20:26 rtapi 25664 1 hal_lib 02:20:26 rtai_math 25860 0 02:20:26 rtai_sem 14976 1 rtapi 02:20:28 the prob seemed to be with user space access to the HAL shmem 02:20:28 rtai_shm 8192 1 rtapi 02:20:31 rtai_fifos 23500 1 rtapi 02:20:33 rtai_up 69400 4 rtapi,rtai_sem,rtai_shm,rtai_fifos 02:20:36 rtai_hal 20888 5 rtapi,rtai_sem,rtai_shm,rtai_fifos,rtai_up 02:20:39 adeos 14336 2 rtai_up,rtai_hal 02:20:58 looks very normal 02:21:05 I'm going to be useless on this - I wish we could figure out your ssh problem 02:21:11 try halcmd show 02:21:22 Loaded HAL Components: 02:21:22 ID Type Name 02:21:22 01 User halcmd14546 02:21:30 everything else is empty 02:21:37 as expected 02:21:42 do you need your router to port forward the right port to your box? 02:22:03 on your end? 02:22:06 could be.... but it establishes the initial connection, then loses it later 02:22:09 skunkworks: no, it's a one-way outgoing connection 02:22:39 skunkworks: no fancy stuff needed like for dcc,ftp,etc 02:22:52 how does the info get returned? 02:23:02 the one tcp connection stays open 02:23:17 forever, until you disconnect 02:23:21 ah 02:23:27 or, in jmk's case, until he runs a command that gives a lot of output 02:23:43 what's the command to view open connections? 02:23:52 netstat maybe? 02:24:14 jmkasunich: you have ethernet to your nat box, then broadband of some kind? 02:24:26 yes 02:24:28 DSL 02:25:01 I can access the NAT config with a browser (its one of those little plastic boxes, not a computer 02:26:22 netstat says I have a connection (even after the lockup) 02:26:40 for alex to get in I had to port forward 22 or 24 to my internal ip (what ever port it was) 02:26:52 yeah 22 02:28:53 jmkasunich: try ping 12.125.74.54 02:29:03 I think that's right outside skunkworks's machine 02:29:13 works 02:29:27 jmkasunich: try ping -s1300 12.125.74.54 02:30:02 works 02:30:14 tried pinging his box, no joy 02:30:17 huh 02:30:28 yeah pings are blocked somewhere past this address 02:30:49 I think charter cable disabled it. 02:30:53 I'm baffled 02:31:09 wish I could help 02:31:25 Also I think my router is set to not respond to pings. 02:31:32 we can do three way troubleshooting ;-/ 02:31:40 I'll try 02:31:43 I could turn that on temp if that would help 02:31:52 no, I think it wouldn't help 02:31:55 try bin/halcmd loadrt blocks wcomp=1 02:32:02 the bin/halcmd show again 02:32:53 sam@ubuntu:~$ halcmd loadrt blocks wcomp=1 02:32:53 RTAPI: ERROR: version mismatch 0 vs 529 02:32:53 HAL: ERROR: rtapi init failed 02:32:53 halcmd: hal_init() failed 02:32:53 NOTE: 'rtapi' kernel module must be loaded 02:33:08 so we can recreate it 02:33:19 the error came on the loadrt, or the show? 02:33:27 the loadrt 02:33:33 I pasted the prompt and command too 02:33:40 duh 02:33:51 try the show 02:34:05 same prob I bet 02:34:11 sam@ubuntu:~/emc2/src$ halcmd show 02:34:11 RTAPI: ERROR: version mismatch 0 vs 529 02:34:11 HAL: ERROR: rtapi init failed 02:34:11 halcmd: hal_init() failed 02:34:11 NOTE: 'rtapi' kernel module must be loaded 02:34:25 but the very first show worked 02:34:42 scripts/realtime stop, we'll try again from the too 02:34:44 top 02:35:01 did the stop work? 02:35:04 sam@ubuntu:~/emc2/src$ /etc/init.d/realtime start 02:35:04 sam@ubuntu:~/emc2/src$ halcmd show 02:35:04 Loaded HAL Components: 02:35:04 ID Type Name 02:35:04 01 User halcmd16517 02:35:09 yes stop and restart gives this 02:35:15 ok, do another show 02:35:25 any number of shows work 02:35:45 but the loadrt fails? and then shows fail too? 02:35:57 cat /proc/rtapi/* 02:36:16 sam@ubuntu:~$ halcmd loadrt blocks wcomp=1 02:36:16 HAL:0: ERROR: Can't find module 'blocks' in /usr/rtlib 02:36:22 eh??? 02:36:34 maybe thats the module path thing 02:36:44 lemme try it here 02:37:19 running installed or in place? 02:37:34 installed 02:37:42 what env do I set? 02:37:48 HAL_RTMOD_DIR 02:38:07 but what is strange is that you got farther this time 02:38:25 sam@ubuntu:~$ export HAL_RTMOD_DIR=/usr/realtime-2.6.12-magma/modules/emc2 02:38:25 sam@ubuntu:~$ halcmd loadrt blocks wcomp=1 02:38:25 HAL:0: ERROR: module 'blocks' not loaded 02:38:37 BUT 02:38:38 sam@ubuntu:~$ lsmod|head 02:38:38 Module Size Used by 02:38:38 blocks 18508 0 02:38:46 it DOES load 02:39:33 how does module-helper return status? 02:39:54 it just execvs insmod/rmmod, so their exit codes are returned 02:40:02 ok 02:40:10 damn I miss kate 02:40:18 apt-get install kate 02:40:34 and 47 gigs of dependencies 02:40:37 not now 02:40:49 making and installing TESTING 02:40:50 22MB 02:41:04 still, not now - I'll manage with gedit 02:41:48 I can build RIP TESTING if you want 02:42:00 hold on 02:42:34 ok, I understand the "not loaded" 02:42:44 ok cool 02:42:59 after forking and invoking module-helper (and waiting for it to return) halcmd does a show comp (internally) 02:43:07 its not seeing the installed module, so it complains 02:43:26 there is something fscked with shared memory 02:43:37 ok 02:43:41 halcmd show still only shows halcmd, right> 02:43:43 so this is another symptom of the same problem 02:43:48 s/>/?/ 02:43:57 yes 02:44:02 01 User halcmd16669 02:44:07 cat /proc/rtapi/* 02:44:29 you want it all? it's a lot 02:44:37 just a sec 02:44:43 ******* RTAPI MODULES ******* 02:44:43 ID Type Name 02:44:43 01 RT HAL_LIB 02:44:43 02 RT HAL_blocks 02:44:49 **** RTAPI SHARED MEMORY **** 02:44:49 ID Users Key Size 02:44:49 RT/UL 02:44:49 01 2/0 1212238881 65500 02:45:30 where is "realtime" on an install? 02:45:37 /etc/init.d 02:45:59 oh, not in my path 02:46:04 nope 02:47:15 something tells me that when you do halcmd it opens a different shmem block than the one the RT HAL is using 02:48:02 ok, try this: 02:48:06 halcmd -f 02:48:14 that will open a halcmd, and it will wait for input 02:48:27 sam@ubuntu:~/emc2/src$ halcmd -f 02:48:27 RTAPI: ERROR: version mismatch 0 vs 529 02:48:27 HAL: ERROR: rtapi init failed 02:48:27 halcmd: hal_init() failed 02:48:27 NOTE: 'rtapi' kernel module must be loaded 02:48:28 in another shell, cat /proc/rtapi/shmem 02:49:02 each time it says that, it must be accessing a different shmem or something 02:49:20 try again, see if you can get it to run (give a halcmd: prompt) 02:49:55 no, I tried many times 02:49:59 ok 02:50:11 and now halcmd show doesn't work 02:50:28 realtime stop, clean things up 02:50:38 I have a short RTAI shmem test program, let me find it 02:51:20 https://mail.rtai.org/pipermail/rtai/2005-July/012321.html 02:51:20 ok clean 02:51:42 there's a 20-30 line program embedded in that email message 02:51:58 can you cut/paste it onto sam's box and compile it? 02:52:18 ready 02:52:28 that was fast 02:52:39 I compile like the wind 02:52:45 run it in a shell, it should print 1-30 at 1 second intervals, then exit 02:52:55 root@ubuntu:~# ./a.out 02:52:55 SHM_USR: Allocation failed 02:53:13 oh, rtai isn't loaded 02:53:16 realtime start 02:53:34 (don't need rtapi or hal for this, but do need the rtai modules) 02:53:37 root@ubuntu:~# ./a.out 02:53:37 SHM_USR: incrementing count: was 18546688, now 18546689 02:53:37 SHM_USR: incrementing count: was 18546689, now 18546690 02:53:37 SHM_USR: incrementing count: was 18546690, now 18546691 02:53:41 ... 02:53:47 well thats fscked 02:54:05 maybe not, hang on 02:54:09 I don't see you clearing *p in that program 02:54:15 no 02:54:26 I thought RTAI cleared shmem regions, maybe not 02:54:29 should I add that? 02:54:30 irrelevant anyway 02:54:33 ok 02:54:33 no 02:54:43 because the prog only gets interesting when you run two of them 02:54:50 open another shell 02:54:50 did an old version of rtai clear shmem?? that could be our bug 02:55:06 run in one shell, then start the prog in the other before the first one exits 02:55:20 they both should be accessing and incrementing the same counter 02:55:39 thy are 02:55:42 they are 02:55:47 and it started at 0 this time 02:56:06 the last one exited at 60 02:56:10 correct 02:56:39 when I run it again, it's starts at 60. 02:56:43 it 02:57:01 the very first time, it opened some region of memory with a big number in it 02:57:10 ok 02:57:19 emc doesn't depend on the memory starting cleared does it? 02:57:26 second time, a different region (or cleared the region), and then when you ran a second instance, they both accessed the same region 02:57:27 that could explain why we get random failures 02:57:51 nothing in HAL depends on memory clearing (99.9% certain, I'll check in a few mins) 02:58:36 when you ran it a third time, it must have opened the same region again. That's not required (once both instances ended in the previous run, the shmem region is gone) 02:58:45 ok 02:59:01 should I try this thing you describe with three invocations? 02:59:07 sure 02:59:55 seems to work right (they all share the numbers) 02:59:59 ok 03:00:13 I'd be very suprised to see the exact same bug 03:00:31 but that prog is a simple way to test shmem in general 03:00:45 emc's usage is a little more complex 03:00:57 hal_lib.ko opens a shmem region when loaded 03:01:07 each RT hal module opens the same region 03:01:14 as does each non-RT hal module 03:01:16 and halcmd 03:01:32 in the case of halcmd, every invocation opens it, then closes it on program exit 03:01:36 what triggers the version mismatch error? 03:01:50 but since hal_lib.ko is loaded, all invocations refer to that region 03:01:56 ok 03:02:04 there is a magic number and a version number in the shmem block 03:02:15 ok 03:02:22 so we're getting a different block maybe 03:02:34 the very first time its opened, some global init needs done. that is done if the magic number is missing, then sets the magic 03:02:39 it also sets the version 03:03:04 subsequent opens of the region check the magic, see it set, know they don't have to do the global init, then they check the version 03:03:27 to make sure you don't run hal components with mismatched hal data structs 03:03:53 for instance if you changed the structure defs and didn't recompile everything, or if you got a binary hal module from somewhere 03:04:01 ok I see now 03:04:13 I think I'm not getting any wrong behavior with your test program 03:04:23 I tend to agree 03:05:11 alex made some changes to rtapi_common.h to print some stuff 03:05:28 I wonder if those are in the checkout in ~/sam/emc2? 03:05:48 he prints if magic is found, etc 03:06:09 sam@ubuntu:~/emc2.aj$ bin/halcmd show 03:06:09 init_rtapi_data: initial rev_code=529 03:06:09 init_rtapi_data: rtapi_mutex_try() returned -1 03:06:09 init_rtapi_data: assigned rev_code=529 03:06:09 ird: #1 529 03:06:11 ird: #2 529 03:06:13 ird: #4 529 03:06:16 ird: #4 529 03:06:19 init_rtapi_data: rev_code=529 03:06:21 Loaded HAL Components: 03:06:24 ID Type Name 03:06:26 01 User halcmd20114 03:07:07 where is that ird: coming from? 03:07:34 on subsequent runs, I get different output: 03:07:40 sam@ubuntu:~/emc2.aj$ bin/halcmd show 03:07:40 init_rtapi_data: MAGIC is ok, rev_code=529 03:07:40 Loaded HAL Components: 03:07:40 ID Type Name 03:07:40 01 User halcmd20129 03:08:15 that print is in init_rtapi_data 03:08:52 ok, it wasn't in the part he pasted into the email 03:09:02 I think it only happens the first time 03:10:08 sam@ubuntu:~/emc2.aj$ bin/halcmd -f 03:10:08 init_rtapi_data: MAGIC is ok, rev_code=529 03:10:08 halcmd: 03:10:10 he prints after each of those for loops? 03:10:14 yes 03:10:23 now I have a halcmd prompt 03:10:32 he was trying to see if it got stomped on by the loops 03:11:08 what did you want me to do with a halcmd: prompt open? 03:11:41 cat /proc/rtapi/shmem 03:13:54 also, cat /proc/rtai/names 03:15:17 still there? 03:15:39 on this box. rtai/names shows three lines with SHMEM in them 03:16:08 on is 12288 bytes, I think that is used by RTAI, one is 65536, that is HAL, and one is 2Meg, dunno what is using that 03:16:37 usage counts on the two small ones are both 2 03:16:52 (this is with blocks loaded) 03:17:48 argh, now I'm having connectivity problems 03:17:48 bear with me if I disappear for a few minutes 03:17:48 fsort of 03:17:48 argh 03:18:06 ok I think I'm back 03:19:23 sam@ubuntu:~$ cat /proc/rtapi/shmem 03:19:23 **** RTAPI SHARED MEMORY **** 03:19:23 ID Users Key Size 03:19:23 RT/UL 03:19:23 01 1/0 1212238881 65500 03:19:33 Slot Name ID Type RT_Handle Pointer Tsk_PID MEM_Sz USG Cnt 03:19:36 ------------------------------------------------------------------------------- 03:19:39 55 CF$Z86 0x48414c21 SHMEM 0xf8dfe000 0x00000000 0 65536 2 03:19:42 62 RTGLBF 0x9ac6d9e5 SHMEM 0xf8f1d000 0x00000000 0 2097152 1 03:19:45 74 PUFUQK 0x90280a48 SHMEM 0xf8c1b000 0x00000000 0 12288 2 03:19:49 that's with the halcmd prompt showing? 03:20:07 yes I think it's still there, but that terminal is hung 03:20:22 20176 pts/1 SL+ 0:00 bin/halcmd -f 03:20:26 yes it's still running 03:20:42 if the halcmd was still going, then RT/UL under rtapi shared memory should be 1/1 03:20:52 hal_lib on the RT side, and halcmd on the user side 03:21:09 I notice size is also different 03:21:13 65500 vs 65536 03:21:32 I request a little under 64K in case their using a slab allocator 03:21:40 ah 03:21:58 if I asked for exaclty 64K and they add a few bytes of overhead, all of a sudden you get twice as much 03:22:10 right 03:22:31 interesting that they have an accurage usage count 03:23:04 I think the magic number and/or the version is getting stomped somehow 03:23:31 when I kill the halcmd, /proc/rtapi/shmem doesn't change 03:23:48 and also when I start another one 03:23:53 yeah, somehow it didn't even know about the halcmd 03:24:08 sam@ubuntu:~/emc2.aj$ bin/halcmd -f 03:24:08 init_rtapi_data: MAGIC is ok, rev_code=529 03:24:14 but the halcmd thinks it's ok 03:24:58 I notice this machine has 1.5GB of ram 03:25:05 wow, thats a lot 03:25:41 I've been looking at code 03:25:47 it is a machine in limbo right now - it was a rip for a large immage setter 03:26:13 the only way to get the version mismatch message is for the magic number in the RTAPI region to be OK but the version to be munged 03:26:24 interesting 03:26:39 especially since the version is immediately after the magic in the struct 03:26:50 seems pretty unlikely to get that magic # by chance 03:27:17 yeah 03:27:41 although after you're loaded rtapi once, there is at least one memory location that contains the magic 03:27:54 (maybe - I think I might actually clear it on removal) 03:28:43 nope (probably should tho, as long as I can be _SURE_ I'm the last one holding it) 03:30:07 if the magic gets messed up, that isn't pretty either 03:30:33 you void* in rtapi_shmem_getptr can point anywhere in 1.5GB right? I think you can get 4GB with a four byte pointer? 03:30:34 because the next time you load halcmd, it will redo the global init, stomping on rtapi internal data 03:30:48 youR 03:31:36 should be able to point anywhere 03:31:56 ok 03:32:03 just thinking all that ram makes this machine unusual 03:32:08 yeah 03:32:26 rtapi does a lot of housekeeping 03:32:40 most of which I haven't looked at in a couple years 03:32:57 obviously because it has always worked until now... 03:33:12 except for the previous shared memory strangeness 03:33:32 that must have been before my time 03:33:35 which was repeatable on multiple boxes, depended only on the kernel/rtai 03:33:48 (that email with the test program in it) 03:33:49 I could pull some of the ram out of it tomorrow 03:33:58 yeah this one's pretty special because we know everything is the same as our boxes 03:34:12 skunkworks: it's only a shot in the dark... 03:34:26 skunkworks: in case you haven't noticed, I don't know what I'm talking about here :-) 03:34:39 I am hanging on for dear lifr 03:34:41 life 03:34:42 you know, once something stomps on the block that rtapi uses for its data, things can get messy 03:35:01 that is the 12288 block, not the 64K one 03:35:28 (that is also where the magic and version codes are - this isn't even a HAL thing, it is either RTAPI or RTAI itself) 03:36:03 (or bad memory, but what are the odds of us getting the same bad memory every time we ask for a block of shmem 03:36:23 also seems unlikely 03:36:28 if the box is stable otherwise 03:36:41 cradek: look in dmesg or /var/log/messages, and see if any of alex's messages are in there 03:36:55 that code he patched is common to both RT and user space 03:38:47 Feb 15 18:21:18 ubuntu kernel: [ 4871.207439] init_rtapi_data: initial rev_code=529 03:38:51 Feb 15 18:21:18 ubuntu kernel: [ 4871.207443] init_rtapi_data: rtapi_mutex_try() returned 0 03:38:54 Feb 15 18:21:18 ubuntu kernel: [ 4871.207447] init_rtapi_data: assigned rev_code=529 03:38:57 Feb 15 18:21:18 ubuntu kernel: [ 4871.207451] ird: #1 529 03:38:59 Feb 15 18:21:18 ubuntu kernel: [ 4871.207454] ird: #2 529 03:39:02 Feb 15 18:21:18 ubuntu kernel: [ 4871.207459] ird: #4 529 03:39:04 Feb 15 18:21:18 ubuntu kernel: [ 4871.207461] ird: #4 529 03:39:07 Feb 15 18:21:18 ubuntu kernel: [ 4871.207464] init_rtapi_data: rev_code=529 03:39:09 Feb 15 18:21:18 ubuntu kernel: [ 4871.207487] RTAPI: Init complete 03:39:18 yesterday? 03:39:22 just a couple that look like these 03:39:26 yes that's yesterday 03:39:54 0200 alex time? Was he really working on it then? 03:40:14 dunno 03:40:17 he ended around 1;00 his time 03:40:27 right, he was in germany 03:40:36 said he had to go to bed - had to catch a plane - yes germany 03:40:42 :0 03:41:15 interesting that the initial rev code is correct 03:41:23 jmkasunich: so if I get it to fail again, will I get more info with alex's debug output? 03:41:55 that line is only executed if the magic does NOT match, which means (I thought) that we had an uninitialized shmem block) 03:42:16 actually, most of alex's output is normal 03:42:26 logger_devel: bookmark 03:42:26 See http://solaris.cs.utt.ro/irc/irc.freenode.net:6667/emcdevel/2006-02-17#T03-42-26 03:42:27 oh hey 03:42:32 sam@ubuntu:~/emc2.aj$ /etc/init.d/realtime start 03:42:32 sam@ubuntu:~/emc2.aj$ bin/halcmd show 03:42:32 init_rtapi_data: initial rev_code=529 03:42:32 init_rtapi_data: rtapi_mutex_try() returned 0 03:42:32 init_rtapi_data: assigned rev_code=529 03:42:34 ird: #1 529 03:42:37 ird: #2 0 03:42:39 ird: #4 0 03:42:42 ird: #4 0 03:42:44 init_rtapi_data: rev_code=0 03:42:46 !wow! 03:42:47 RTAPI: ERROR: version mismatch 0 vs 529 03:42:57 that's repeatable, it does it over and over 03:43:31 ok, I don't have alex's code 03:43:43 where is the irc: #1 and #2? 03:43:56 I'm looking at the unmodified rtapi_common.c 03:44:01 for (n = 0; n <= RTAPI_MAX_SHMEMS; n++) { 03:44:04 around this loop 03:44:08 rtapi_common.h 03:44:30 #1 is after the tasks loop, before the shmem loop? and #2 is after the shmem loop? 03:44:40 yes 03:45:47 that loop is clearing stuff that is pretty far away from the rev_code 03:46:20 and holding a mutex while it does it... 03:46:22 obviously not in this case 03:46:35 one of those pointers is wrong? 03:46:48 which pointers? 03:47:20 oh it's all inside the struct 03:47:22 hmm 03:47:47 can rtapi_print do %p? 03:47:54 I think so 03:48:05 this is halcmd causing this, right? 03:48:07 let me add some 03:48:09 yes 03:48:10 halcmd show 03:48:18 rtapi.ko is loaded the whole time 03:48:25 so magic should be set 03:48:30 rtapi 25664 1 hal_lib 03:48:34 and this code should _NOT_ be running 03:48:45 oh! 03:49:25 are you positive it doesn't run for you? 03:49:30 can you put this same printf in yours? 03:49:42 yes (similar anyway) 03:52:30 I'm gonna print "data" too 03:57:03 Ok - I am going to have to call it a night. Is this going well? 03:57:18 skunkworks: well, we see things that look wrong, which is good 03:57:24 heh 03:57:25 seems like you are getting closer 03:57:36 thanks for letting us play with your machine 03:57:46 yeah, thanks 03:57:54 use the machine as long as you like or untill it locks up ;) 03:58:12 it seems solid enough 03:58:32 what is your email? (in case we want to ask you to remove some ram or something tomorrow) 03:58:35 I will talk to you guys tomorrow to see if you want me to change anything (memory) 03:59:01 you can email me at samcoinc@gmail.com 03:59:18 ok 03:59:47 good luck - good night 04:00:01 Feb 16 22:59:28 localhost kernel: [ 9048.888960] rtapi_init (RT): calling global init, data e0ae2000Feb 16 22:59:28 localhost kernel: [ 9048.888964] init_rtapi_data(): start 04:00:01 Feb 16 22:59:28 localhost kernel: [ 9048.888967] ird: data: e0ae2000 magic 0 04:00:01 Feb 16 22:59:28 localhost kernel: [ 9048.888970] ird: magic not right, initial rev 0 04:00:02 Feb 16 22:59:28 localhost kernel: [ 9048.888975] ird: #1 data: e0ae2000 magic 308286473 rev 529 04:00:02 Feb 16 22:59:28 localhost kernel: [ 9048.888981] ird: #2 data: e0ae2000 magic 308286473 rev 529 04:00:03 Feb 16 22:59:28 localhost kernel: [ 9048.888984] rtapi_init (RT): rev code OK 04:00:07 Feb 16 22:59:28 localhost kernel: [ 9048.888998] RTAPI: Init complete 04:00:13 thats on the realtime side 04:00:28 john@ke-main-ubuntu:~/emcdev/emc2testing/src$ halcmd show 04:00:28 rtapi_init (RT): calling global init, data 0xb7f0c000 04:00:28 init_rtapi_data(): start 04:00:28 ird: data: 0xb7f0c000 magic 308286473 04:00:28 init_rtapi_data: MAGIC is ok, rev_code=529 04:00:30 rtapi_init (RT): rev code OK 04:00:32 Loaded HAL Components: 04:00:36 on the user side 04:00:58 skunkworks has quit 04:01:49 data->rev_code at 0xb7f16004 val 529 04:01:49 data->rev_code at 0xb7f16004 val 529 04:01:49 data->shmem_array[n].bitmap[m] at 0xb7f17004 04:01:49 data->rev_code at 0xb7f16004 val 0 04:01:49 data->shmem_array[n].bitmap[m] at 0xb7f17008 04:01:51 data->rev_code at 0xb7f16004 val 0 04:02:05 that bitmap line is the one that nukes it 04:02:09 data->shmem_array[n].bitmap[m] = 0; 04:02:23 it writes a zero to b7f17004 04:02:30 but the value at b7f16004 is nuked 04:02:36 one bit different 04:02:40 bad address line? 04:02:53 time to run memtextx86 all night 04:03:03 yeah maybe so 04:03:09 but none of this should be running at all? 04:03:26 ? 04:03:49 you said something about this whole block of code shouldn't be running? 04:04:03 well, it runs if magic is busted 04:04:22 magic is at b7f16000 04:04:30 and probably vulnerable to the same thing 04:04:38 right before rev_code? 04:04:41 yes 04:05:09 let me see if b7f17000 is written to 04:05:33 I would be surprised/thrilled if this was a "simple" hardware problem 04:05:40 first time, magic is wrong (expected), so the init code sets magic, then runs the rest of the init loop which busts magic and rev (or maybe only one of them, intermittently) 04:05:57 ok I see 04:05:58 next time, if magic is ok but rev is wrong, we get the rev mismatch 04:06:15 if magic is wrong, we re-run the init code and fsck up rtapi's accounting 04:06:36 data->rev_code at 0xb7f34004 val 529 04:06:41 data->shmem_array[n].bitmap[m] at 0xb7f35004 04:06:41 data->rev_code at 0xb7f34004 val 0 04:06:55 different location this time, but still one bit different 04:07:12 SAME BIT 04:07:15 yes 04:07:37 data->rev_code at 0xb7f7e004 val 529 04:07:39 data->shmem_array[n].bitmap[m] at 0xb7f7f004 04:07:39 data->rev_code at 0xb7f7e004 val 0 04:07:47 same bit 04:07:57 smoking gun?? 04:08:10 looks smokey to me 04:08:17 "somewhat smokey" 04:08:20 haha 04:08:46 its amazing that nothing else crashes 04:08:51 no kidding 04:09:01 I can happily build emc over and over 04:09:13 unless RTAI allocates shmem from one end of memory, and linux allocates from the other 04:09:43 0xb8000000 is pretty high 04:09:58 do top, see if its using any swap 04:10:07 I bet with all that ram, it never gets full 04:10:14 no swap in use 04:10:15 so Linux never uses those addys 04:10:49 back in the day or RTLinux, you needed to reserve space at end of phys memory for RT shmem 04:10:55 these addresses are at 3GB 04:11:08 you don't any more, but I bet RTAI still allocates shmem from the end 04:11:34 but the bit that seems wrong is really low. Problems would show up everywhere. 04:11:48 every 4k 04:12:10 he could have a bad DIMM or intermitten socket pin 04:12:28 yeah I guess dram works in strange ways 04:12:28 so first 1G is fine, last 512M has a prob every 4K 04:12:45 (assuming 3x512M dimms) 04:13:03 you know what 04:13:09 I could reboot it into memtest86 04:13:29 and let him get the results tomorrow? 04:13:30 it could run all night, you send him an email describing what to look for, and he could report in the morning 04:13:49 waitaminnit - how can you do that 04:13:59 reboot, yes, but reboot ? 04:14:12 I can use my powers only for good, not for evil 04:14:24 well this is assuming there's no CD or floppy in the machine 04:14:46 I would just set the "default" grub boot entry to memtest 04:14:47 maybe we should just have him run memtext when he gets there? 04:14:57 oh, I see 04:15:06 except... 04:15:12 never mind 04:15:40 (was wondering "how will he get it out of memtest", then realized he has 30 seconds or whatever at the grub menu) 04:15:40 there's no floppy in it 04:15:58 yeah he just has to use the menu 04:16:26 what the heck, I'm going to do it 04:16:31 ok 04:16:36 he'll know in the morning 04:16:40 hours of memtest are a good thing 04:16:50 if there's red(?) on the screen it's bad 04:16:58 I think the errors are red iirc 04:18:22 he did say the machine was used, maybe it just has "old" - memory needs reseated in sockets or something 04:18:46 I bet memtest will show this easily 04:18:51 yeah 04:18:53 but memtest takes a long time on this much ram 04:18:59 I have a 6G machine at work 04:19:05 I just wish there was a way we could see the results 04:19:06 it takes many hours per cycle 04:19:17 hmm 04:19:19 yeah, but that's impossible until morning 04:19:21 don't reboot it yet 04:19:25 ok 04:19:35 remember my shmem test program? 04:19:39 yes 04:19:51 so we make it get a big block (64K or so) 04:19:56 and do our own little memtest 04:20:11 ahhhh 04:20:26 when I make -j on emc, all the gccs crash with a seg fault 04:20:30 it's ram 04:20:48 -j runs em in parallel? 04:20:59 yes, all of the files at once 04:21:12 there are a thousand kernel oopses in the dmesg now 04:21:16 smoking gun 04:21:35 [105492.130112] EIP: 0060:[] Not tainted VLI 04:21:35 [105492.130115] EFLAGS: 00010202 (2.6.12-magma) 04:21:35 [105492.130133] EIP is at page_add_anon_rmap+0x18/0x5c 04:21:45 yep, smoke pouring out everywhere 04:21:48 damned impressive troubleshooting sir! 04:21:53 haha 04:21:57 it's why I get the big bucks 04:22:00 oh, wait 04:22:06 * jmkasunich takes off his hat and makes a sweeping bow 04:22:14 same to you 04:22:18 I bow to the master 04:22:21 bah 04:22:31 (now I'll just send you the hard ones ;-) 04:22:39 I'm going to reboot it before it crashes 04:22:44 it's probably quite fucked now 04:23:03 into memtest? or better to not mess with menu.lst while its unstable? 04:23:10 yeah, into memtest 04:23:18 it seems ok as long as I don't run gcc :-) 04:23:20 vi is small 04:23:33 ok here it goes 04:23:37 any last words? 04:23:42 nope 04:24:10 use the force luke.... 04:24:16 ok it's done 04:24:41 ha I forgot we were spamming a public channel all this time 04:25:11 I sure expected to find a software problem... 04:25:15 I'm happy it's not 04:25:15 I like to see the masters at work 04:25:20 ditto 04:25:39 this is twice that we've had strange rtapi stuff, and it turned out to be something else 04:25:58 I'm glad it's not my rtai build... that's a pain in the neck 04:27:51 so do you want to talk about setupconfig and configs/common? or do you want to go to sleep? 04:28:57 I think you pretty much answered my question 04:29:08 I think we should undo that mess while we can, I don't like it 04:29:26 which part of the mess, the whole common/ thing? 04:29:30 yes 04:29:40 the fact that you can't copy a config to a different directory and have it work 04:29:59 the sample configs are there FOR copying 04:30:18 "the fact that you can't copy it unless you use a special tool" 04:30:26 right 04:31:23 I really hate the idea of client.nml, server.nml, emc.nml in every fscking sample dir tho 04:31:38 maintainence nightmare 04:31:53 well let's look at this a different way 04:32:04 say we change the inis at install time to use an absolute path to common 04:32:14 suddenly, you can copy a sample config 04:32:38 if we want to update something in common, we can - do we want that to affect the previously copied configs, or just the samples? 04:32:47 depends 04:32:53 (NEFS) 04:33:07 I don't speak your crazy moon-language 04:33:15 I mean, what's NEFS? 04:33:23 if the "something" is an NML file (which is rarely ever changed by the user) then we probalby want to fix everybody 04:33:50 ok, another approach 04:33:54 if its a file that they've modified, we don't want to stomp on their mods 04:34:07 the deb updater will NOT nuke a changed config file without asking 04:34:15 I tagged everything in /etc/emc2 as config files 04:34:18 thats why setupconfig copies everything out of common into their dir when you do a new 04:34:30 so if some dummy edits a sample config, it won't get overwritten without asking them 04:34:52 nice 04:34:58 ok, yet another approach 04:35:06 that covers the debs, which seem to be (rightly) your focus 04:35:13 not so good for rip or cvs checkout 04:35:20 I don't know or care what's in an nml file, I've never needed to change it 04:35:34 right, it is almost never changed 04:35:35 so if we take NMLFILE=.... out of the inis, let's have emc do a reasonable default thing 04:35:49 if you need to do something different, you can specify an NMLFILE= 04:35:57 I wouldn't go that far 04:36:01 :-) 04:36:03 brainstorming 04:36:06 yeha 04:36:08 yeah 04:36:10 that doesn't solve the core-stepper thing though. 04:36:27 I don't have too much heartburn about the hal files 04:36:54 it kinda sucks if we have mutiples 04:37:01 actually core-servo is worse 04:37:11 there are probably only 2 configs that use core-stepper 04:37:27 yeah, it sucks, but it also sucks to not be able to use the normal system tools in a reasonable way to manipulate configs 04:37:28 maybe even only one, in which case it shouldn't be in common anyway 04:37:37 but core-servo is used by multiple configs 04:37:41 agreed 04:37:42 I wish we could minimize both sucks 04:38:25 I could see absolute paths for nmlfiles, and local copies for hal files (even if it means duplication) 04:39:04 looks like the .var and .tbl files are already duplicated in every directory anyway 04:39:23 sometimes I think this whole thing is silly, john 04:39:29 every computer is hooked to one mill 04:39:34 you only need one config 04:39:37 probably because of the desire to have stepper.tbl, ppmc.tbl, foo.tbl instead of emc.tbl 04:39:38 you only ever use one config. 04:40:05 yep 04:40:14 or course, you get 20 samples 04:40:30 sure, that's fine 04:41:51 ok, given that it is silly, what do we do? 04:42:02 that's a good question 04:42:16 drop common/, put everything in each sample config, and let them copy dirs as needed? 04:43:03 maybe dropping common is the first step 04:43:08 I don't know what the second step is, though 04:43:19 drop pickconfig and setupconfig? ;-) 04:43:40 nah, I think they are usefull for aunt tillie types, if nobody else 04:43:43 pickconfig is good for trying the different GUIs if nothing else 04:43:46 but simplify as much as possible 04:44:05 you know, that is really the problem 04:44:08 honestly, unless we plan to have a full gui editor, I don't think we gain much from setupconfig 04:44:14 what is? 04:44:41 we have at least three "dimensions" and we're trying to cover them with lots of samples 04:44:55 but with a 3D space, coverage is sparse even with a lot of samples 04:45:26 dimension 1: machine config - simple steppers, medium servo, complex mazak with toolchanger 04:45:27 yeah, I've thought that too 04:45:56 dimension 2: I/O devices - motenc, stc, m5i20, vigalent, parport 04:46:05 dimension 3: UI 04:46:24 worse: 1&2 are hardware, 3 is a user preference 04:46:31 I know ray (and you?) disagree 04:46:38 but I think gui is a user pref. 04:47:01 user as opposed to integrator? 04:47:18 yes 04:47:34 sometimes user and integrator are one and the same ;-) 04:47:38 sure 04:48:01 you know what else I think, and you're going to object: mm/inch is a user preference too 04:48:16 no ;-) 04:48:21 GUI inch/mm, yes 04:48:26 default GUI units, yes 04:48:29 yes 04:48:30 machine configs, no 04:48:47 the values of HAL signals are in either mm or inches, you can't go changing that on a whim 04:48:57 I think it's absurd that we have to rewrite the entire latter to change the former 04:49:00 INPUT_SCALE is either counts/mm or counts/in 04:49:22 (well we fixed that in AXIS) 04:49:26 (I think) 04:49:31 isn't default GUI units set by one line in the ini file? 04:49:40 yeah the user units 04:49:44 ok 04:49:51 not a problem then 04:49:59 there are two possible values: 1 and 0.393whatever, everything else breaks 04:50:10 that is just fscked 04:50:13 but if you change those, you have to rewrite the rest of the damn ini 04:50:20 yes it is 04:50:23 waitaminnit 04:50:26 those aren't users units 04:50:29 are they? 04:50:32 yes 04:50:48 number of user units per mm 04:51:34 ok, there is units in [TRAJ] and in [AXIS] 04:51:47 yeah I don't know why you have to say it 4 times 04:51:57 those probably aren't even used 04:52:08 cause Fred and Will are academics 04:52:22 and were designing a very flexible program 04:52:31 also you specify LINEAR or ANGULAR for XYZABC, but half the code has hardcoded XYZ=linear ABC=angular 04:52:57 the ini is full of crap that we don't need, and that makes the configuration process obtuse 04:52:57 part of that comes from a failure to distinguish between axes and joints 04:53:21 hmm. 04:53:22 axes - cartesean space coordinates, three linear, three angles 04:53:28 joints - machine coordinates 04:53:29 I think I'm just complaining now 04:53:39 for a trivkins machine they are the same 04:53:48 yeah I know the difference, but I've written code that doesn't (knowing full well what I was doing) 04:54:15 emc does and does not know the difference 04:54:20 yep 04:54:21 (at the same time! 04:54:45 if you have trivial kins, then a huge amount of the ini file is redundant 04:54:53 if you have non-trivial kins, you need it 04:54:53 seems we could maybe have a simple ini format and a complex ini format 04:55:20 the simple ini format could be fully specified with one gui form (one screen) 04:55:56 scale, vel, accel * 3, units (pick from 2), gui (pick from 3) ...? 04:56:38 PERIOD (actually you would specify your machine's MHz) 04:56:41 a few others, but I get your point 04:56:47 max feed override 04:56:54 yeah 04:57:20 but you are right, 80% of the ini file is stuff that joe average user never changes 04:57:29 if we had that, even for just steppers, it would be a big step 04:57:43 what percentage of our users have steppers? 90? 04:58:01 if you have servos you have to be much, much more aware of what's going on because you have to tune them 04:58:27 we can probably concentrate on the ease of use for the stepper people. 04:58:32 this gets back to where I was going with my "3D" comments 04:58:53 I follow you 04:59:06 instead of trying to cover the space with samples, provide a wizard/script/whatever that asks questions and generates the ini 04:59:25 stepper/servo? branch based on that 04:59:37 and puts the ONE ini in the place where it goes, wherever that is 04:59:39 which GUI? branch based on that 04:59:59 we still need to support multiples, for wierd people 05:00:26 we tend to confuse the tillies and the power users 05:00:36 (like us - we might find ourselves loading somebody elses config to help them, or we want a working config and a sim) 05:00:37 yeah 05:00:41 we want to make it easy for tillie but maximally powerful for the power users 05:00:46 that is not a reasonable goal. 05:01:06 power users may not even want our gui. 05:01:11 and the bias on that scale from tilly to power depends on who you asn (and when) 05:01:12 we don't have to concentrate on ease for them. 05:01:28 s/asn/ask 05:01:41 yes. the person you ask will tend to be on the opposite end of the scale! 05:01:47 (whichever you pick) 05:01:50 heh 05:02:05 probably because it's easier to argue than do 05:02:06 I do it too 05:02:24 unfortunately I think we need to focus on the tillies, because those are the ones that will make us want to commit mayhem 05:02:36 not just because of that 05:02:50 I was quite impressed with Willie Walker the other day 05:02:56 never heard of him before 05:02:56 because tillies are the ones who will go buy a xylotex and just want the thing to work without any screwing around. 05:03:14 he pops up on list with a good description of his problem and what he;s already tried to fix it 05:03:22 he responds well to our advice 05:03:33 and he succeeds 05:03:39 we may never hear from him again 05:03:51 not sure I remember him 05:04:05 needed to debounce his limit switches 05:04:08 oh right 05:04:14 he was sure cheery when I tried to help him 05:04:21 a nice guy, I bet 05:04:36 meanwhile there was another guy with limit switch problems, he gave us nothing to work with 05:04:53 there's a whole range of people out there... 05:05:07 I prefer to deal with the smart ones 05:05:11 dumb people annoy me 05:05:12 maybe we have to work on accomodating the ones on the "far" end. 05:05:37 I know..... but :-( 05:05:44 maybe I'll write an ini generator 05:05:53 totally standalone 05:06:03 it just has to write a file, maybe copy some others 05:06:38 it would be nice if it used a template of sorts 05:06:58 so you don't have to rewrite the program to extend it 05:07:06 configure --with-x-maximum-velocity=1.2 --with-x-acceleration=20 05:07:20 sorry, kidding 05:07:23 lol 05:07:43 I was thinking about things like having comments in the generated ini 05:07:51 if you had a template ini 05:07:55 with things like: 05:08:06 if the program is simple (not f-ing tcl) extending it would be as easy as editing a template 05:08:48 GUI = {choices:axis, tkemc, mini;descriptions: new and fancy, blue, fscking huge window} 05:08:59 hehehe 05:09:10 params that you don't need to prompt the user about would just be copied 05:09:24 as would the comments 05:09:42 I'd prefer to lay out a screen with all the choices in one place though - wizards are irritating for simple things 05:09:53 with that in mind, it's hard to run from a template 05:09:58 yeah 05:10:05 I hate being asked one question at a time 05:10:11 other things wouldn't work for templates either 05:10:14 you should be able to see all related things at once. 05:10:17 scaling - you need to do math 05:10:40 let them fill in things like steps/rev, microstepping, gear ratio, and thread pitch 05:11:10 degrees per step 05:11:20 heh, both are used 05:11:21 (the thing that's written on the motor) 05:11:23 oh 05:11:31 well they could enter either. 05:12:01 radiobutton (x) steps/rev (_) degrees/step 05:12:41 the idea sounds good for basic machines 05:12:50 a lot harder as things get more complex 05:13:02 like the guy who needed to debounce his limit switches 05:13:28 yeah, that's outside the realm of this hypothetical program. 05:14:24 it would be nice if, without editing files, someone could get some reasonable steps/dirs out the parport. 05:14:40 yeha 05:14:44 yeah 05:14:57 then, that might snag them long enough to figure out how to get their limit switches working 05:15:05 and then they're committed 05:15:05 right now, really the only thing they have to change is scale and maybe vel/acc 05:15:18 xylotex or standard pinout 05:15:23 yes 05:15:27 inch or mm 05:15:33 very basic things 05:16:11 probably one screen, at worst 4 tabs (one for general, one for each axis with scaling, accel, and velocity stuff) 05:16:54 more brainstorming 05:17:36 what if we keep setupconfig, give it the existing new, backup, restore commands 05:17:57 does setupconfig work today? 05:17:58 but new then gives you a choice of copying a template, OR invoking the program you are describing 05:18:33 it did and maybe still does for RIP, but needs fixed to understand paths and permissions for installed 05:18:49 and of course if common goes away a lot of cruft can come out of it 05:19:02 I didn't know it was so close to being done. 05:19:06 anyway, the program you are describing could of course be invoked alone 05:19:13 if we polish it up can we have a release? 05:19:18 backup works (I think) restore no 05:19:21 new works 05:20:01 hey I just had a ridiculous idea 05:20:12 a web-based ini generator 05:20:14 given the time it took to make pickconfig, I suspect it would take a very busy weekend or a week to get setupconfig to a similar level of doneness 05:20:23 you fill out the web form, it gives you your ini to download 05:20:48 how does that compare to the program you were just describing? 05:20:51 both have forms 05:20:57 both generate ini files 05:21:03 it's the same. different type of programming. 05:21:05 which is harder to write? 05:21:38 for me the web is probably easier, but it can't directly manipulate the files on the user's computer. 05:21:46 just a thought. 05:22:01 web is easier? 05:22:02 the benefit is web forms are so familiar to everybody. 05:22:13 probably. not much gui to design. 05:22:41 who am I kidding? they'd both be harder than it seems like they should be. 05:22:58 yep 05:23:09 we need some more volunteers. 05:23:10 thats what happened with setupconfig 05:23:25 the GUI code started to overwhelm the actual "stuff" that it does 05:23:28 2-3-4 of us aren't enough to make this project 05:23:38 yeah, that's because tcl is awful. 05:23:50 what is better? 05:24:00 I really need to figure that out. 05:24:17 python is better than tcl. gtk+python may be the way to go, I'm not sure. 05:24:23 wxpython might be ok 05:24:34 I'm a C programmer who's been draged kicking and screaming into GUI stuff 05:24:37 gtk+glade+C seems not hard to use 05:24:52 scope is GTK + C 05:24:57 I'd take lisp+gtk if I had it 05:24:57 I didn't use glade 05:25:05 I'd take anything at ALL over tcl 05:25:23 hmm, except maybe perl 05:25:43 is perl the one where indenting counts? or is that python? 05:25:51 I should write a simple app in each of these and find out which sucks the least 05:26:01 yes python uses indention where C uses { } 05:26:25 seems strange for only about the first five minutes 05:26:44 of course you need a decent editor to make it easy to work with 05:26:58 I just want us to release the damned thing so I can go back to the stuff I really want to write 05:27:14 VCP and an associated HAL<->NML UI thing 05:27:40 it's a bit silly that you're stuck writing things like setupconfig. 05:27:51 like you said, we need more people 05:28:00 yep 05:28:00 it seems like its always 2-3 people 05:28:09 the names change, but the list never grows 05:28:15 and all in the board now, which is odd 05:28:16 Fred/Will 05:28:23 then Matt/Ray 05:28:29 then Paul/Me 05:28:39 then Alex/Paul/Me 05:28:52 now Alex/You/Jepler/me 05:28:59 (so I guess we are getting better) 05:29:06 there are others as well 05:29:09 I do see a bit of increase there 05:29:31 not so odd really 05:29:43 the people who get elected are those who are seen as getting things done 05:29:54 true. 05:30:20 before I ran, I voted for the ones I saw hanging out on irc answering questions. That was my only metric. 05:30:40 name recognition ;-) 05:30:50 not really - if they didn't care to help people use the software, they didn't get my vote. 05:31:09 my helpfullness varies 05:31:32 you just spent your evening helping 05:31:52 because it seemed there was a bug in code that I consider mine 05:32:17 that's exactly when your help is needed 05:32:55 I must admit that I ignored the complaints about BASE_PERIOD far longer than I should have 05:33:03 that only took one evening to find and fix 05:33:14 but it was at least a week after it was first reported 05:33:42 sometimes it's hard to take seriously the first few reports 05:34:00 bug reporting, etc, is not an exact science 05:34:04 true 05:34:23 the nature of the report makes a huge differnence 05:34:31 sometimes a real bug report is mixed up in lots of other things, like gene's recent problem with the makefile 05:34:57 between "jeez, what an idiot, he probalby fscked it up" and "hmm, that looks real, and theres enough info here that I might be able to find the problem" 05:35:06 exactly right 05:35:41 it really was a joy working with the bouncy limit switch guy 05:35:52 yeah, I thought so too 05:36:07 we've had several of those 05:36:22 he understood HAL enough to hook up the limits on his own, used scope quite well for someone not accustomed to such things... 05:36:40 they pop in, give a great bug report, get an answer right away (because the report is so good), they're gone 05:37:45 like the guy who reported the accel problem when the axes/traj were different - it made me want to fix it for him, he'd gone to the trouble of figuring out exactly what was going on 05:38:00 yeah 05:38:15 I had known it was somehow wrong for a long time, but never bothered to change the numbers the dozen different ways necessary to figure it out 05:38:28 here is the _other_ guy with limit problems: I have a fresh install of BDI 4.38. I haven't even hooked up the machine 05:38:28 to the computer yet and I am getting the error that says hardware limit 05:38:28 error on axis 0,1, and 2. Nothing has even been hooked up to the 05:38:28 computer yet how can this be. Any thoughts? 05:39:00 ouch. 05:39:07 yeah, I had some thoughts when I read that, but I wasn't gonna share them on a public mailing list 05:39:09 not a good grasp of what might be going on... 05:39:31 I have a programmer at work who's like that 05:39:44 I so want to say "how about you try troubleshooting?" 05:39:55 he might hurt himself 05:40:10 well, he'd give me that look 05:40:13 you know the one 05:40:18 haha 05:40:21 I better get to bed 05:40:25 same here 05:40:29 goodnight 05:40:30 its tomorrow already 05:40:33 did you email skunks? 05:40:38 oops, no 05:40:48 I'll do that now 05:40:57 I'm anxious to hear what he finds in the morning 05:41:11 and I bet he'll have it fixed and working before the end of the day 05:41:35 and then we get to post to the list (so folks know it wasn't rtapi ;-) 05:41:50 ha 05:42:28 or ubuntu! 05:42:40 right 05:42:50 btw there was a security update, so I built new kernel packages 05:43:05 let me know if the update gives you any troubles pleas 05:43:05 e 05:43:27 I saw the updates, they're downloaded and installed, but I imagine I won't use them until I reboot 05:43:35 oh done already? cool 05:43:41 (tomorrow, I power down overnight except on the weekends) 05:44:25 ok, goodnight now 05:44:29 night 05:48:34 sent 05:48:37 jmkasunich has quit 06:23:10 SWP_Away has joined #emc-devel 06:31:19 alex_joni has quit 06:45:14 SWPadnos has quit 09:57:09 alex_joni has joined #emc-devel 13:07:01 logger_devel, bookmark 13:07:01 See http://solaris.cs.utt.ro/irc/irc.freenode.net:6667/emcdevel/2006-02-17#T13-07-01 13:07:55 skunkworks has joined #emc-devel 13:08:07 lobber_devel: bookmark 13:08:21 oops 13:08:29 logger_devel: bookmark 13:08:29 See http://solaris.cs.utt.ro/irc/irc.freenode.net:6667/emcdevel/2006-02-17#T13-08-29 13:29:57 skunkworks: use Tab expension, just like in *nix 13:30:43 - ok I have no clue what you just said there 13:31:20 when you try to execute a command in linux, do you always type the whole name, or do you type the first few letters then hit the 'Tab' key? 13:31:32 try it now, type 'sk' and then hit tab 13:31:45 your IRC software should expand it to skunkworks 13:32:04 that is cool - thanks 13:32:15 the same thing works in command line mode 13:32:21 for folders, commands, etc 13:32:47 so it looks at what is currently up and sees what matches? 13:33:13 if you have multiple solutions to the extension, try pushing the Tab twice, it will give you a list of possible extensions 13:33:16 skunkworks: yes 13:34:12 double tabbing doesn't work in mirc 13:34:20 multible tabbing 13:34:49 but it is a start - will have to try it in the whatever irc client is in linux 13:36:30 wow that is neet. It will help my bad spelling ;) 13:37:25 ok - I am going to reboot this thing - it was on pass 4 - 0 errors 13:38:06 I will take one of the chips out and run emc - was emc broken the last you remember or should I be able to run it? 13:38:11 alex_joni? 13:40:12 I have the boot screen they where talking about - do I pick the 2.6.12 magma (top one) 13:55:28 sorry, was away 13:55:40 yes 13:55:48 you pick the 2.6.12-magma 13:56:19 skunkworks: you should be able to run emc (from the GUI) 13:58:33 ok 13:58:37 got side tracked 14:11:32 rayh has joined #emc-devel 14:12:28 morning 14:12:42 Hi Chris 14:12:56 Say, TESTING helps a LOT. 14:13:17 great 14:13:55 I got roltek running with both installed and rip. 14:14:22 He's seeing some problems with his cdrom burner yet. 14:14:23 moving TESTING is almost like a mini-release 14:14:52 I'll make new ubuntu packages at the same time, so everyone gets to test the same thing, no matter how they choose to update 14:15:03 I'll try to get an install going here in the next couple of days. 14:15:22 I still haven't tried to fix my CD burner either 14:15:27 Will there be an easy way for someone to see which testing they have. 14:15:42 yes, it'll be in help/about 14:15:49 Ah. Great. 14:16:11 hi guys 14:16:18 * alex_joni managed to finish stuff for now ;) 14:16:19 help/about also tells if you are NOT running a testing version 14:16:21 Hi alex 14:16:24 so I'm back ;) 14:16:25 are you back home? 14:16:27 yay 14:16:43 cradek: my main server crashed yesterday, while I was away.. that was a PITA 14:16:49 ouch 14:16:53 cradek - I ran emc with the full memory - it didn't start. removed the 512mb stick booted and emc started. Reinstalled the 512mb stick - emc didn't start. I have another 512 I can put in to see. 14:16:54 that always happens when we're away doesn't it 14:17:08 skunkworks: that's great news 14:17:11 the hdd with the home's wasn't working any more 14:17:13 skunkworks: yay 14:17:27 so I guess that makes cradek & jmk a bit smarten than memtest86 14:17:42 skunkworks: it was not just an emc problem - when I loaded the machine it royally crapped out... you could try that again too. 14:17:44 cradek: glad you managed to take over from where I left that 14:17:49 be back in a bit 14:19:23 cradek: ok 14:29:43 rayh: hi there 14:34:48 Back home from your travels? 14:35:00 yeah, but only to find a mess over here 14:35:13 Your computers? 14:35:20 I arrived last night, and now I just finished my first day to get things back together 14:35:32 rayh: our server (and mainly the /home HDD) 14:35:40 Ouch. 14:35:43 about 80G 14:35:56 of data, but it's recovered, and back operational now 14:36:06 Fantastic. 14:36:11 good thing I had a spare server I started to set-up last week 14:36:21 so I only had to swap them remotely yesterday ;) 14:36:30 I guess. What went wrong? 14:36:37 ok - I just put a differnt 512 back in and it is not starting 14:37:11 skunkworks: that might prove to be interesting 14:37:37 can you live with 1G for now? 14:38:08 yes - do you think that there is some odd limit in the rt/emc2? 14:38:19 cradek: maybe a bug in the kernel accessing stuff over 1G ? 14:38:26 skunkworks: it's emc independent 14:38:36 even gcc started to crash when using lots of memory 14:42:50 skunkworks: can you try and swap the mem chips? the 512 and the 1024, I mean 14:43:03 that is what I am doing 14:43:08 right as we speek 14:43:10 booting 14:44:49 ok 14:44:55 sam_ has joined #emc-devel 14:44:59 hi sam_ 14:45:10 Starting emc... 14:45:10 HAL: ERROR: pin 'axis.0.motor-pos-cmd' not found 14:45:10 HAL:5: link failed 14:45:10 HAL config file /etc/emc2/sample-configs/sim-AXIS//../common/core_sim.hal failed. 14:45:10 Shutting down and cleaning up EMC... 14:45:19 ok, so same thing 14:45:21 yes 14:45:45 so it doesn't seem to be the physical memory but the amount 14:45:55 it might be the chipset 14:46:05 and the address it tries to write/read to/from 14:46:32 because only one bit is always the problem (reading the discussion of cradek & jmk) on the higher address space 14:46:43 so if you use less memory you don't get in that address space 14:46:59 if you use more, then EMC internal stuff ends up there, and is affected by the bug 14:47:12 but I am VERY surprised it doesn't show up in memtest86 14:48:36 odd 14:49:24 indeed 14:49:39 can you put 2G in? 14:49:47 you said you had 2 512 chips 14:49:49 yes 14:49:56 hold on - I will do that 14:50:00 sam_ has quit 14:50:03 try that, it might get emc to work 14:51:09 crap - no I can't. I thought there was 3 slots in there but only 2 14:51:23 well.. that's it 14:51:29 use the 2x512 ;) 14:51:34 it doesn't shut down correctly either 14:51:34 and you have a spare 1G 14:51:43 that is expectable 14:51:48 the screen goes wacky 14:51:57 in order to shutdown an ATX box, you need ACPI in the kernel 14:52:11 but ACPI & RT don't mix well 14:52:20 so RT kernels have ACPI disabled by default 14:52:29 skunkworks: was emc running when you shut down? 14:52:40 normally it goes though the text shut down - but with the extra memory and emc crash it doesn't 14:52:52 oh, then I'm not worried ;) 14:52:56 Probably was - houw do you stop it? 14:53:00 extra memory means linux might crash too 14:53:28 keep the power button pressed for more than 4 seconds 14:53:35 or pull the power cord ;) 14:54:10 right - I ment before I shut down - I thought you guys where stopping the left over emc stuff 14:54:23 you could try this: 14:54:36 /usr/bin/emc_module_helper remove motmod 14:54:47 /etc/init-d/realtime stop 14:54:55 ok 14:55:32 * alex_joni heads home.. 14:55:47 I don't know how old the bios is - could that effect it - if there was a bug? 14:58:02 probably not, I suspect a HW problem 14:58:14 I'll be back later 15:03:55 skunkworks: I've had a P4 machine with bad cache (on the processor itself) 15:04:11 the problem with your machine is not necessarily in the ram modules 15:04:27 it could be in the processor or on the motherboard too. 15:08:03 ok 15:08:15 have you run emc on more than 1gb? 15:08:31 putting 2 512mb chips works also 15:08:40 not personally 15:09:34 my fast machine has only 512 15:09:58 I know it's a long shot, but do you have another P4 processor you could try? 15:10:03 so with this motherboard 1gb is the limit. I am trying to think if I have enough pc133 memory to take my other box to over 1gb 15:10:13 I will have to look 15:10:15 might 15:15:35 also I noticed that it is running at at only 1800MHz when the processor is a 2400 15:15:48 so maybe the motherboard thinks something is wrong 15:22:37 is ubuntu okay with sata drives? 15:23:21 yes, I'm sure it is 15:24:46 Thanks. 15:25:58 hello 15:26:36 yes, ubuntu works just fine with SATA 15:26:49 even SATA CD/DVD burners 15:27:28 that's good to know 15:27:34 so far I only have SCSI and regular IDE 15:27:53 in fact, ubuntu is the only OS I've been able to install on my big machine 15:27:55 do you guys have a hint for a good server for me? 15:28:04 tried Gentoo 64, XP 64 15:28:18 hosting or to buy? 15:28:21 I need some hotswappable, SCSI drives (better with HW Raid), hot-swappable PSU's 15:28:22 SWP_Away is now known as SWPadnos 15:28:25 SWPadnos: to buy 15:28:44 not very much processor speed, 1G mem is fine 15:29:01 not very much =~1-2GHz 15:29:04 alex_joni: I'm happy with my sempron 3300 which was quite cheap but it's fast 15:29:12 cradek: for home? 15:29:14 I'm not sure you can get anything that slow ;) 15:29:14 yes 15:29:38 cradek: yes, for home it's ok, but I want to replace the normal PC I used as a server 15:29:45 ah 15:29:52 I don't have any idea about server class hardware 15:30:06 my experience is that it's expensive and harder to replace parts on when something goes wrong. 15:30:26 yes, but it might be more reliable (I hope) 15:30:28 http://www.retrobox.com, for used server stuff 15:30:46 SWPadnos: I'd rather go with something I can get around here 15:30:52 so I guess IBM & co. 15:31:08 hotswap PSU will be the kicker, I think 15:31:18 can you get SuperMicro hardware there? 15:31:18 I build my servers - I like to use supermicro motherboards 15:31:31 dunno.. 15:31:34 :) 15:31:49 well, the case I have is their SC743-645T 15:32:21 there is a version with hot-swap power supplies, and I think there's a version with a SCSI RAID cage (instead of SATA) 15:32:53 any of their server motherboards will fit. I have the H8DCE 15:34:33 hmm.. this looks nice: http://www.supermicro.com/products/system/4U/7044/SYS-7044H-X8R.cfm 15:34:38 kinda something I want 15:34:59 that's basically what I have for my new workstation 15:35:16 you can flip it over - the right hand set of drive bays can be rotated 15:35:57 I don't want to flip it over, I want to rack-mount it ;) 15:36:04 see if these guys will ship to you: http://www.monarchcomputer.com 15:36:25 oh, well in that case, you can still look at the towers, they can be rackmounted as well 15:36:34 you flip them on their sides ;) 15:37:15 SWPadnos: am I supposed to take retrobox.com seriously? It comes up with a flash-only front page here 15:37:36 only if you want to. that flash page is a new thing 15:37:51 ugh. now it's opened a new window and must be starting java or something 15:37:52 http://www.retrobox.com/rbwww/home/ 15:37:53 I hate the web 15:38:39 where else can you get a SCSI array with 1 36G drive + 13 9G drives for $151 15:39:08 only $65 for the 12 x 18G arrays 15:39:26 hi jeff 15:40:41 SWPadnos: http://www.retrobox.com/rbwww/home/unit_view.asp?id=1483201&bin_id=world 15:40:46 that sounds nice 15:41:08 What would you want with an 12 x 18G array? Throughput? 15:41:16 yep. $256 isn't bad ;) 15:41:40 how about 100G of redundant storage with hot spares? 15:44:54 that's nice too 15:44:58 what's RAID 5? 15:46:04 I like raid 5 15:46:17 need 3 or more drives of the same size 15:46:37 you loose the capasity of one drive but if any one goes bad you keep running 15:46:44 minimum of 3 drives, parity data is striped across all drives 15:47:06 data is completely available if any one drive in the array fails 15:47:22 ok, I like that ;) 15:48:00 you can also add drives, and you still lose only one for parity (so 5 drives gives 4x storage + 1 parity) 15:48:10 right 15:48:42 and with most SCSI controllers, you can have hot spares - drives that are running but unused. if a drive fails, the spare will automatically be added to the array, and the data rebuilt. 15:49:17 hrmm.. a sample config is $4180 15:49:22 heh 15:49:32 (make sure your harware raid alows for dynamic resizing - so you can add drives and rebuild without wiping clean and starting over 15:49:53 I would think anything new would do that though. 15:50:22 hope the 'Adaptec 1662200 2930 Ultra 320 SCSI 64-Bit PCI Host Adaptor' might know that ;) 15:50:24 I don't think it's that common, actually 15:51:07 if you plan to run Linux, then hardware RAID isn't necessarily the best bet 15:51:16 RAID levels: 0, 1, 10, 5, 50, JBOD 15:51:28 Online RAID Level Migration 15:51:29 Online capacity expansion 15:51:29 Immediate RAID availability (background initialization) 15:51:30 my latest supermicro motherboard with raid allows that - my older motherboards don't (pentium II vintage) 15:51:34 SWPadnos: how come? 15:51:44 oh - forgot about linux - sorry 15:51:46 the Linux RAID code is pretty efficient 15:52:00 hardware used to be faster, but isn't any more 15:52:06 (use novell and micro$oft 15:52:19 CPU speed has improved way faster than drives or RAID controllers 15:52:30 interesting 15:54:10 you also get the advantage of being able to RAID any drives - combine SATA + IDE + USB + SCSI in one array 15:54:30 that is pretty neet 15:54:49 so you could do something silly like have RAID5 SCSI drives, and mirror the whole array to a single huge capacity IDE drive ;) 15:55:20 some people do that kind of thing for "backup" purposes 15:55:32 heh, that really sounds nice 16:50:39 skunkworks has quit 17:35:26 rayh has quit 17:42:07 rayh has joined #emc-devel 17:43:36 Is there any point to apt-cdrom add with the ubuntu? 17:44:02 I think the CD is already in apt 17:44:08 you mean the install CD? 17:44:32 yeah it's the first line in my sources.list 17:46:29 Okay. Thanks. Installing now. 17:47:40 are you going to install the extra packages over your dialup? 17:48:31 if so I bet you can safely skip the ubuntu OS updates 17:48:47 especially their updated kernel packages, since you won't be using them 17:48:53 that will save you tens of MB 17:50:51 off to lunch 17:53:37 Thanks for the tip.\ 18:18:33 sam_ has joined #emc-devel 18:19:16 cradek: alex_joni: ? 18:20:51 sam_ has quit 18:31:11 skunkworks has joined #emc-devel 18:31:33 Ok - I have some interesting info. 18:33:12 I just installed ubuntu on a totally differnt computer. (my workstation dell dimention 3000) that has 1.25gb of memory. Emc crashes upon startup just like the other computer. If I remove the 256mb - emc starts and runs. 18:34:12 The plot thickens ;) 18:35:17 2.8ghz pentium 4 with 1.25gb memory 18:36:31 thinking now it may not be hardware. 18:37:13 it is like people have real jobs or something ;) 18:38:13 btw - nice work - I was goofing around with emc2 on the other computer (1.8ghz) and was able to get .0002 period. that is an unreal improvement. 18:38:58 I ment .00002 - and I am happy with .00003 18:39:11 .00003 is really all I need 18:40:33 that gives me 100ipm on my slow axis. 18:44:25 rayh_ has joined #emc-devel 18:44:47 This is from ubuntu. 18:44:59 Now to get the emc upgrades. 18:50:43 ? 18:51:23 The following packages have unmet dependencies: 18:51:23 emc2-axis: Depends: emc2 but it is not going to be installed 18:51:23 E: Broken packages 18:51:23 r 18:51:48 darn 18:52:23 cradek, You around? 18:53:28 are you trying to install ubuntu and then emc2 from cradeks site? 18:53:46 Ubuntu is installed. 18:54:04 I've got the ubuntu box online 18:54:05 then there is like 50mb of ubuntu updates 18:54:15 grabbed his installer script 18:54:33 Cradek thought I could skip those updates.\ 18:54:54 I haven't and have not had a problem (installed it atleast 5 times so far) 18:55:21 although I have not tried skipping them 18:55:55 He said much of the updates were kernel which would not get used at all. 18:56:05 When we installed his kernel. 18:56:08 I mean that I have allways installed the updates before emc2 18:56:27 interesting 18:57:05 maybe he has a newer ubuntu install cd? 18:57:05 I understand. My problem is a dialup. If I can skip, I will. 18:57:14 5.10 18:57:30 yah - that would be a pain 18:58:10 I downloaded it from the ubuntu's site a few days ago - still required 50mb of updates ;) 18:58:11 Maybe the problem is with alex's server. 18:58:25 I just installed it about an hour ago 18:58:32 ah okay. 18:58:49 emc2 that is from cradeks script 19:00:09 could you look at the updates and only install the one you need? 19:00:28 (I have no clue which one - unmet?) 19:01:12 just talking out of my ass - not a linux person 19:01:58 not much of one either. 19:02:56 no one is around. 19:02:58 ticked the kernel headers and image and 1h55m to go. 19:03:11 wow 19:03:12 That'll be a start. 19:03:23 I could email them to you ;) 19:03:48 uh huh 19:03:52 ;) 19:05:15 I should probably hire a horse or cross-country skier. 22:15:45 logger_devel has joined #emc-devel 22:15:45 topic is: "Welcome to the Enhanced Machine Control development place. | Regular Developers' meetings 24/7 !" 22:15:45 Users on #emc-devel: logger_devel skunkworks @ChanServ rayh alex_joni SWPadnos steves_logging LawrenceG jepler jtr_ cradek 22:32:43 steves_logging has quit 22:42:16 Ubuntu is up with EMC. Question about the developer stuff? 22:43:24 did it take the updates for it to work? 22:48:25 No. 22:48:46 Because I used static IP addys on the local net. 22:49:08 It didn't uncomment the normal locations for packages needed by emc 22:49:20 wow 22:50:09 I was looking for the emc source package name 23:02:11 rayh_ has joined #emc-devel 23:14:16 cradek: What do I need to do to get the emc source stuff for development?