I want to create a collection of scripts to send code from local machine to ssh remote server and execute there and return the results.
Preliminary Steps
Setup ssh config:
-
Setup AWS server with Ubuntu. Install the VPN software in AWS.
-
Save the following script in local home directory:
ssh -i "AWS_MyServer.pem" \
ubuntu@ec2-3-149-229-219.us-east-2.compute.amazonaws.com -X \
-L8787:lynx.dfci.harvard.edu:8787 \
-L2222:lynx.dfci.harvard.edu:22 \
-L8000:localhost:8000
Give it a name, e.g., ssh_to_server and make it executable.
Run ssh_to_server and authenticate the VPN in AWS terminal.
I used /opt/cisco/anyconnect/bin/vpnui to run the VPN.
TIP
If you want to keep the VPN running even when you disconnect local machine from AWS, use
tmuxandxpra. In AWS terminal, runtmuxto launch a detachable session. Inside that, runxpra start --start=/opt/cisco/anyconnect/bin/vpnui --bind-tcp=0.0.0.0:8000Now you can access the VPN window in your browser by vising
localhost:8000. Then you can detach the tmux session withCtrl+b, then d. To reattach again, runtmux a.
NOTE
In the SSH command, we are setting up three port forwarding:
8787for RStudio Server2222for SSH to lynx server8000forxpra html5
Check that you can ssh directly to lynx using
ssh -p 2222 saha@localhost
from another terminal in local machine while the first connection is active.
- To avoid using password each time you
ssh, generate ssh key-pair:
# on your local machine:
# If the prompt says file id_ed25519 already exists, don't overwrite, go to next command.
ssh-keygen -t ed25519 -C "saha@lynx" # press Enter through prompts
# copy the key to the remote (note the -p 2222)
ssh-copy-id -p 2222 saha@localhost
- Make a ssh config file to keep the ssh connection active and shared between different calls coming from potentially different terminals.
Add this to
~/.ssh/configon your local machine:
Host lynx
HostName localhost
Port 2222
User saha
ControlMaster auto
ControlPath ~/.ssh/cm-%r@%h:%p
ControlPersist 10m
Check if you can directly access lynx server without password:
ssh lynx
When exiting lynx i.e., logout, note if the terminal says “Shared connection to localhost closed.”.
It indicates that the ControlMaster is working.
To permanently close it, run ssh -O exit lynx.
- Now we need to setup a shell script to transfer contents between local and remote machines.
Make a folder for your project, say
proj, in your local machine. Inside theprojfolder, accessRand runrenv::init(). Now put the following insync-and-run.sh:
#!/usr/bin/env bash
# # Usage:
# ./sync-and-run.sh [--no-sync] [--no-run] 'args for run.R'
#
# Examples:
# ./sync-and-run.sh 'lr=0.05 epochs=200' # sync + run + no pull (default)
# ./sync-and-run.sh --no-sync 'lr=0.05' # only run
# ./sync-and-run.sh --no-run # only sync
# ./sync-and-run.sh --pull-env # pull .RData file from remote
# ./sync-and-run.sh --onsite # Skip AWS and directly connect to cluster
# ./sync-and-run.sh --help # show this help
#
set -euo pipefail
# ===== Defaults to customize =====
# REMOTE=saha@localhost
REMOTE=lynx
REMOTE_RUN_DIR='tmp/proj' # disposable copy for runs
REMOTE_ON_SITE=saha@lynx.dfci.harvard.edu
# ===== Flag parsing =====
DO_SYNC=1
DO_RUN=1
PULL_ENV=0
ON_SITE=0
while [[ $#--gt-0-| -gt 0 ]]; do
case "$1" in
--help)
sed -n '2,50p' "$0" | sed '/^set -euo pipefail/,$d'
exit 0
;;
--no-sync)
DO_SYNC=0
shift
;;
--no-run)
DO_RUN=0
shift
;;
--pull-env)
PULL_ENV=1
shift
;;
--onsite)
ON_SITE=1
shift
;;
*)
# First non-flag arg → treat remaining as RUN_ARGS
break
;;
esac
done
# Args left go to R Script
RUN_ARGS="$*"
# Change remote if onsite
if [[ $ON_SITE -eq 1 ]]; then
REMOTE="$REMOTE_ON_SITE"
fi
# Sync working dir (exclude big/irrelevant stuff)
if [[ $DO_SYNC -eq 1 ]]; then
rsync -avzP --delete \
-e "ssh" \
--exclude '.git' \
--exclude 'renv' \
--exclude '.Rproj.user' \
--exclude '.RData' \
./ "$REMOTE:$REMOTE_RUN_DIR/"
fi
# Run remotely
if [[ $DO_RUN -eq 1 ]]; then
ssh -t "$REMOTE" bash -lc "
set -euo pipefail
cd $REMOTE_RUN_DIR
mkdir -p _outputs
Rscript -e \"renv::restore()\"
R --quiet -e \"source('run.R'); save.image('remote_env.RData')\" --args $RUN_ARGS
"
fi
# Pull remote_env.RData from remote
if [[ $PULL_ENV -eq 1 ]];then
rsync -avzP \
-e "ssh" \
"$REMOTE:$REMOTE_RUN_DIR/remote_env.RData" ./
fi
This script allows you to: A) Send code from local machine to remote machine. B) Execute code in remote machine. C) Pull the R environment data from remote machine to local machine.
- For long-running tasks, use the above script with
--no-runbecause SSH connection may die. Instead, dossh lynxand open atmuxsession:
tmux new -s rsession
Inside the tmux session, run your R script.
Then you can detach the tmux session with Ctrl+b, then d.
Now you can logout and close all connections.
Later, check the status of R script by simply connecting to lynx and attaching the tmux session:
tmux attach -t rsession
TIP
Tip:
tmux lsshows all running tmux sessions.
Daily Workflow
- In your local machine, open terminal, launch tmux and connect to AWS:
tmux
# Inside tmux
./ssh_to_server
Then detach the tmux session: Ctrl+b, then d.
- In your browser, visit
localhost:8000. Authenticate the VPN.
NOTE
If it looks like the xpra server is not running:
tmux a # To open the terminal connected to AWS xpra stop xpra start --start=/opt/cisco/anyconnect/bin/vpnui --bind-tcp=0.0.0.0:8000 # now detach tmux again
-
If this is a new project,
ssh lynxand setup project directory, associatedrenvetc. -
In local machine, do the main coding work. Test code locally.
-
To send code to cluster, edit
sync-and-run.sshto choose the project directory, R command etc. Then run./sync-and-run.ssh. Utilize the flags--no-run, no-sync, --pull-env -
If the connection is ever frozen,
ssh -O exit lynxresets the persistent connection. Also, the VPN stays active for max. 12 hours.