I want to create a collection of scripts to send code from local machine to ssh remote server and execute there and return the results.


Preliminary Steps

Setup ssh config:

  1. Setup AWS server with Ubuntu. Install the VPN software in AWS.

  2. Save the following script in local home directory:

ssh -i "AWS_MyServer.pem" \
  ubuntu@ec2-3-149-229-219.us-east-2.compute.amazonaws.com -X \
  -L8787:lynx.dfci.harvard.edu:8787 \
  -L2222:lynx.dfci.harvard.edu:22 \
  -L8000:localhost:8000

Give it a name, e.g., ssh_to_server and make it executable. Run ssh_to_server and authenticate the VPN in AWS terminal. I used /opt/cisco/anyconnect/bin/vpnui to run the VPN.

TIP

If you want to keep the VPN running even when you disconnect local machine from AWS, use tmux and xpra. In AWS terminal, run tmux to launch a detachable session. Inside that, run

xpra start --start=/opt/cisco/anyconnect/bin/vpnui --bind-tcp=0.0.0.0:8000

Now you can access the VPN window in your browser by vising localhost:8000. Then you can detach the tmux session with Ctrl+b, then d. To reattach again, run tmux a.

NOTE

In the SSH command, we are setting up three port forwarding:

  1. 8787 for RStudio Server
  2. 2222 for SSH to lynx server
  3. 8000 for xpra html5

Check that you can ssh directly to lynx using

ssh -p 2222 saha@localhost

from another terminal in local machine while the first connection is active.

  1. To avoid using password each time you ssh, generate ssh key-pair:
# on your local machine:
# If the prompt says file id_ed25519 already exists, don't overwrite, go to next command.
ssh-keygen -t ed25519 -C "saha@lynx"   # press Enter through prompts

# copy the key to the remote (note the -p 2222)
ssh-copy-id -p 2222 saha@localhost

  1. Make a ssh config file to keep the ssh connection active and shared between different calls coming from potentially different terminals. Add this to ~/.ssh/config on your local machine:
Host lynx
  HostName localhost
  Port 2222
  User saha
  ControlMaster auto
  ControlPath ~/.ssh/cm-%r@%h:%p
  ControlPersist 10m

Check if you can directly access lynx server without password:

ssh lynx

When exiting lynx i.e., logout, note if the terminal says “Shared connection to localhost closed.”. It indicates that the ControlMaster is working. To permanently close it, run ssh -O exit lynx.

  1. Now we need to setup a shell script to transfer contents between local and remote machines. Make a folder for your project, say proj, in your local machine. Inside the proj folder, access R and run renv::init(). Now put the following in sync-and-run.sh:
#!/usr/bin/env bash
# # Usage:
#   ./sync-and-run.sh [--no-sync] [--no-run] 'args for run.R'
#
# Examples:
#   ./sync-and-run.sh 'lr=0.05 epochs=200'   # sync + run + no pull (default)
#   ./sync-and-run.sh --no-sync 'lr=0.05'    # only run
#   ./sync-and-run.sh --no-run               # only sync
#   ./sync-and-run.sh --pull-env             # pull .RData file from remote
#   ./sync-and-run.sh --onsite               # Skip AWS and directly connect to cluster
#   ./sync-and-run.sh --help                 # show this help
#
set -euo pipefail

# ===== Defaults to customize =====
# REMOTE=saha@localhost
REMOTE=lynx
REMOTE_RUN_DIR='tmp/proj'   # disposable copy for runs
REMOTE_ON_SITE=saha@lynx.dfci.harvard.edu

# ===== Flag parsing =====
DO_SYNC=1
DO_RUN=1
PULL_ENV=0
ON_SITE=0

while [[ $#--gt-0-| -gt 0 ]]; do
  case "$1" in
    --help)
      sed -n '2,50p' "$0" | sed '/^set -euo pipefail/,$d'
      exit 0
      ;;
    --no-sync)
      DO_SYNC=0
      shift
      ;;
    --no-run)
      DO_RUN=0
      shift
      ;;
    --pull-env)
      PULL_ENV=1
      shift
      ;;
    --onsite)
      ON_SITE=1
      shift
      ;;
    *)
      # First non-flag arg → treat remaining as RUN_ARGS
      break
      ;;
  esac
done

# Args left go to R Script
RUN_ARGS="$*"

# Change remote if onsite
if [[ $ON_SITE -eq 1 ]]; then
  REMOTE="$REMOTE_ON_SITE"
fi

# Sync working dir (exclude big/irrelevant stuff)
if [[ $DO_SYNC -eq 1 ]]; then
  rsync -avzP --delete \
    -e "ssh" \
    --exclude '.git' \
    --exclude 'renv' \
    --exclude '.Rproj.user' \
    --exclude '.RData' \
    ./ "$REMOTE:$REMOTE_RUN_DIR/"
fi

# Run remotely
if [[ $DO_RUN -eq 1 ]]; then
  ssh -t "$REMOTE" bash -lc "
  set -euo pipefail
  cd $REMOTE_RUN_DIR
  mkdir -p _outputs
  Rscript -e \"renv::restore()\"
  R --quiet -e \"source('run.R'); save.image('remote_env.RData')\" --args $RUN_ARGS
  "
fi

# Pull remote_env.RData from remote
if [[ $PULL_ENV -eq 1 ]];then
  rsync -avzP \
    -e "ssh" \
    "$REMOTE:$REMOTE_RUN_DIR/remote_env.RData" ./
fi

This script allows you to: A) Send code from local machine to remote machine. B) Execute code in remote machine. C) Pull the R environment data from remote machine to local machine.

  1. For long-running tasks, use the above script with --no-run because SSH connection may die. Instead, do ssh lynx and open a tmux session:
tmux new -s rsession

Inside the tmux session, run your R script. Then you can detach the tmux session with Ctrl+b, then d. Now you can logout and close all connections. Later, check the status of R script by simply connecting to lynx and attaching the tmux session:

tmux attach -t rsession

TIP

Tip: tmux ls shows all running tmux sessions.

Daily Workflow

  1. In your local machine, open terminal, launch tmux and connect to AWS:
tmux

# Inside tmux
./ssh_to_server

Then detach the tmux session: Ctrl+b, then d.

  1. In your browser, visit localhost:8000. Authenticate the VPN.

NOTE

If it looks like the xpra server is not running:

tmux a # To open the terminal connected to AWS
xpra stop
xpra start --start=/opt/cisco/anyconnect/bin/vpnui --bind-tcp=0.0.0.0:8000
# now detach tmux again
  1. If this is a new project, ssh lynx and setup project directory, associated renv etc.

  2. In local machine, do the main coding work. Test code locally.

  3. To send code to cluster, edit sync-and-run.ssh to choose the project directory, R command etc. Then run ./sync-and-run.ssh. Utilize the flags --no-run, no-sync, --pull-env

  4. If the connection is ever frozen, ssh -O exit lynx resets the persistent connection. Also, the VPN stays active for max. 12 hours.