Thursday, September 30, 2010

Saving simulation results

Recently (now that's a big lie, but, oh, well...) I've been running long running simulations on a remote machine. I used to connect to it using an SSH client. The trouble was, that due to intermittent faults in our Internet connectivity, the SSH session often gets disconnected. When it does, the simulator process is history. I couldn't see what output it produced to the console, because there isn't any console anymore, and the process itself is killed due to the sad demise of its parent.
So, one solution suggested to me by my advisor was to prepend the simulator invocation command by "nohup" and append an "&" to it. That will ensure that the simulator keeps running even if the SSH session dies. I put in statements to write simulator results to a file. Now I can periodically poll the simulator using a "ps aux | grep" command and when I see that the simulator is no longer working, I can access the results in the results file.
A neater solution was suggested to me by my collaborating researcher in the US yesterday. He advised me to start a VNC server on the remote machine using the command "vnc4server :x" (where x is an integer) and then use a VNC client to access an xWindow session on the remote server. You just need to point your VNC client to "remotemachine:x" (where remotemachine is either your remote machine's IP address or DNS name, and x is the same integer used in the vnc4server command). This is much neater.
But the story doesn't end there. I had an interesting problem. My VNC client keeps telling me that "no password configured for vnc auth." I kept scratching my head and noticed that there is nothing in "$HOME/.vnc/passwd." At that point, I sought help from my collaborator who told me that we're running out of disk space on the remote machine. I deleted several debug files from my home folder and that fixed the problem. Now that was hard to figure out. No error messages and nothing to suggest that it could be a disk space issue.

No comments: