HadoopFS (HDFS) as a distribution file repository

I consider using HDFS as a storage system with horizontal scaling for our client video hosting. My main problem with the fact that HDFS was not developed for this is more of an "open source system currently used in situations where you need to process huge amounts of data." We don’t want to process the data, we just save it, create something like a small internal analogue of Amazon S3 based on HDFS.

Probably the important point is that the saved file size will be quite git from 100Mb to 10Gb.

Has anyone used HDFS for this purpose?

+5
source share
3 answers

If you use the equivalent of S3, then it should already provide a distributed, mounted file system no? Perhaps you can check OpenStack at http://openstack.org/projects/storage/ .

+1
source

The main drawback will be the lack of POSIX semantics. You cannot connect a drive, and you need special APIs for reading and writing. The Java API is core. There is a project called libhdfs that does the C API on JNI, but I never used it. Thriftfs is another option.

. , - . , Lustre?

0

You might want to consider MongoDB. They have GridFS, which will allow you to use it as storage. Then you can scale your storage horizontally through shards and provide fault tolerance with replication.

0
source

All Articles