Using Docker for HPC with the Sun Grid Engine

I am wondering if it is possible to create a virtual cluster with Docker so that I can run scripts designed for HPC clusters using SGE cluster management. These are fairly large / complex workflows, so this is not something I can rewrite, say, for TORQUE / PBS. Theoretically, I should be able to trick Docker into thinking that there are several nodes, like my internal HPC cluster. If someone can save me, the pain tells me that this is impossible, I would be very grateful.

Warning: I am not a cluster administrator. I am more like an end user. I work on Mac OSX 10.9.5

Client version: 1.7.0 Client API version: 1.19 Go version (client): go1.4.2 Git commit (client): 0baf609 OS/Arch (client): darwin/amd64 Server version: 1.7.0 Server API version: 1.19 Go version (server): go1.4.2 Git commit (server): 0baf609 OS/Arch (server): linux/amd64 bash-3.2$ boot2docker version Boot2Docker-cli version: v1.7.0 Git commit: 7d89508 

I used the derivative of the image ( Dockerfile here ). My steps are pretty simple and follow the instructions on the website :

  • Create image
 docker-machine create -d virtualbox local 
  1. Make it active.
 eval "$(docker-machine env local)" 
  1. Get a swarm image
 docker run --rm swarm create 
  1. Create a swarm master
 docker-machine create \ -d virtualbox \ --swarm \ --swarm-master \ --swarm-discovery token://$TOKEN \ swarm-master 
  1. Use a token to create swarm nodes
 docker-machine create \ -d virtualbox \ --swarm \ --swarm-discovery token://$TOKEN \ swarm-agent-00 
  1. Add another node
  docker-machine create \ -d virtualbox \ --swarm \ --swarm-discovery token://$TOKEN \ swarm-agent-01 

Now here is the crazy part. When I try to load the original image with this command: eval "$(docker-machine env --swarm swarm-master)" I get this dumb thing Cannot connect to the Docker daemon. Is 'docker -d' running on this host? Cannot connect to the Docker daemon. Is 'docker -d' running on this host? . Then I tried eval $(docker-machine env swarm-master) and it works, but I'm not 100% sure that this is correct:

 NAME ACTIVE DRIVER STATE URL SWARM local virtualbox Running tcp://192.168.99.105:2376 swarm-agent-00 virtualbox Running tcp://192.168.99.107:2376 swarm-master swarm-agent-01 virtualbox Running tcp://192.168.99.108:2376 swarm-master swarm-master * virtualbox Running tcp://192.168.99.106:2376 swarm-master (master) 
  1. At this point, I create my application with several containers using this yaml file:
 bior: image: stevenhart/bior_annotate command: login -f sgeadmin volumes: - .:/Data links: - sge sge: build: . ports: - "6444" - "6445" - "6446" 

using docker-compose up

  1. And then finally open a new image

docker run -it --rm dockersge_sge login -f sgeadmin

But here is the problem

when i start qhost i get the following:

  HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS ---------------------------------------------------------------------------------------------- global - - - - - - - - - - 6bf6f6fda409 lx-amd64 1 1 1 1 0.01 996.2M 96.2M 1.1G 0.0 

Should we not think that there are several processors, i.e. each of my nodes a swarm?

+5
source share
1 answer

I assume that you are running qhost inside your docker.

The thing with the swarm is that it does not combine all the hosts into one big machine (I thought so).

Instead, you have, for example, 5 single-core machines, then the swarm will select a machine with as few dockers as possible and run the docker on that machine.

Thus, a swarm is a controller that distributes dockers in a cluster, rather than combining hosts into one.

Hope this helps! If you have further questions, ask :)

UPDATE

I'm not sure if this is right for you, but if you didn’t get it with a swarm, I would recommend kubernetes. I use it on my crimson pussy. It’s very healthy and mature than a swarm, with things like automatic healing and so on.

I don’t know, but of course there is a way to integrate docker with haop too ...

+3
source

All Articles