HadoopFS (HDFS) as a distribution file repository

Question

HadoopFS (HDFS) as a distribution file repository

I consider using HDFS as a storage system with horizontal scaling for our client video hosting. My main problem with the fact that HDFS was not developed for this is more of an "open source system currently used in situations where you need to process huge amounts of data." We don’t want to process the data, we just save it, create something like a small internal analogue of Amazon S3 based on HDFS.

Probably the important point is that the saved file size will be quite git from 100Mb to 10Gb.

Has anyone used HDFS for this purpose?

+5

hadoop hdfs

abovesun May 26, '11 at 13:46

source share

3 answers

David medinets · Answer 1 · 2011-06-15T15:37:27+0000

If you use the equivalent of S3, then it should already provide a distributed, mounted file system no? Perhaps you can check OpenStack at http://openstack.org/projects/storage/ .

Tim Yates · Answer 2 · 2011-05-26T14:57:07+0000

The main drawback will be the lack of POSIX semantics. You cannot connect a drive, and you need special APIs for reading and writing. The Java API is core. There is a project called libhdfs that does the C API on JNI, but I never used it. Thriftfs is another option.

. , - . , Lustre?

CenterOrbit · Answer 3 · 2014-01-26T21:25:46+0000

You might want to consider MongoDB. They have GridFS, which will allow you to use it as storage. Then you can scale your storage horizontally through shards and provide fault tolerance with replication.

HadoopFS (HDFS) as a distribution file repository

More articles: