I am wondering if it is possible to create a virtual cluster with Docker so that I can run scripts designed for HPC clusters using SGE cluster management. These are fairly large / complex workflows, so this is not something I can rewrite, say, for TORQUE / PBS. Theoretically, I should be able to trick Docker into thinking that there are several nodes, like my internal HPC cluster. If someone can save me, the pain tells me that this is impossible, I would be very grateful.
Warning: I am not a cluster administrator. I am more like an end user. I work on Mac OSX 10.9.5
Client version: 1.7.0 Client API version: 1.19 Go version (client): go1.4.2 Git commit (client): 0baf609 OS/Arch (client): darwin/amd64 Server version: 1.7.0 Server API version: 1.19 Go version (server): go1.4.2 Git commit (server): 0baf609 OS/Arch (server): linux/amd64 bash-3.2$ boot2docker version Boot2Docker-cli version: v1.7.0 Git commit: 7d89508
I used the derivative of the image ( Dockerfile here ). My steps are pretty simple and follow the instructions on the website :
docker-machine create -d virtualbox local
- Make it active.
eval "$(docker-machine env local)"
- Get a swarm image
docker run --rm swarm create
- Create a swarm master
docker-machine create \ -d virtualbox \
- Use a token to create swarm nodes
docker-machine create \ -d virtualbox \
- Add another node
docker-machine create \ -d virtualbox \
Now here is the crazy part. When I try to load the original image with this command: eval "$(docker-machine env --swarm swarm-master)" I get this dumb thing Cannot connect to the Docker daemon. Is 'docker -d' running on this host? Cannot connect to the Docker daemon. Is 'docker -d' running on this host? . Then I tried eval $(docker-machine env swarm-master) and it works, but I'm not 100% sure that this is correct:
NAME ACTIVE DRIVER STATE URL SWARM local virtualbox Running tcp://192.168.99.105:2376 swarm-agent-00 virtualbox Running tcp://192.168.99.107:2376 swarm-master swarm-agent-01 virtualbox Running tcp://192.168.99.108:2376 swarm-master swarm-master * virtualbox Running tcp://192.168.99.106:2376 swarm-master (master)
- At this point, I create my application with several containers using this yaml file:
bior: image: stevenhart/bior_annotate command: login -f sgeadmin volumes: - .:/Data links: - sge sge: build: . ports: - "6444" - "6445" - "6446"
using docker-compose up
- And then finally open a new image
docker run -it --rm dockersge_sge login -f sgeadmin
But here is the problem
when i start qhost i get the following:
HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS ---------------------------------------------------------------------------------------------- global - - - - - - - - - - 6bf6f6fda409 lx-amd64 1 1 1 1 0.01 996.2M 96.2M 1.1G 0.0
Should we not think that there are several processors, i.e. each of my nodes a swarm?