I do some tests with nutch and hadoop and I need a huge amount of data. I want to start with 20 GB, switch to 100 GB, 500 GB and eventually reach 1-2 TB.
The problem is that I donβt have that much data, so Iβm thinking about how to create it.
The data itself can be of any kind. One idea is to take the original dataset and duplicate them. But it is not enough, because you need files that are different from each other (identical files are ignored).
Another idea is to write a program that will create files with dummy data.
Any other idea?
java hadoop nutch bigdata
AAaa
source share