Hadoop vs teradata what's the difference

I touched Teradata. I never touched howtop, but since yesterday I have been doing some research. According to the description of both, they seem to be completely interchangeable, but some documents say that they serve for different purposes. But everything I found is vague. I'm confused.

Does anyone have experience with both of them? What is the difference between the two?

A simple example: I want to create an ETL that converts billions of rows of raw data and organizes them into DWH. Then make some resources an expensive analysis on them. Why use TD? Why Hadoop or why not?

+7
source share
3 answers

I think in this article entitled "MapReduce and Parallel DBMSs: Friends or Enemies" does a good job of describing situations where each technology works best. In a nutshell, Hadoop is great for storing unstructured data and performing parallel transformations to "disinfect" incoming data where DBMS outperform complex queries.

+6
source

Hadoop, Hadoop with Extensions, Feature / Feature Comparison of RDBMS

I am not an expert in this field, but on coursera.com's course "Introduction to Data Science" there is a lecture entitled "Comparing MapReduce and Databases", as well as a lecture on parallel databases in the section Map Reduction Course.

Here is a summary of these lectures on comparing MapReduce and RDBMS (not necessarily parallel to RDMBS). Keep in mind that the comparison is different if you enable extensions for Hadoop such as PIG, Hive, etc. I will add () MapReduce extensions that will add some of this functionality / properties.

Some functions / properties that RDBMS have, but are not native to MapReduce:

  • Declarative query languages ​​- (Pig, HIVE)
  • Schemes (Hive, Pig, DyradLINQ, Hadapt)
  • Logical data independence
  • Indexing (Hbase)
  • Algebraic Optimization (Pig, Dryad, HIVE)
  • Caching / Materialized Views
  • ACID / Operations

MapReduce (relatively regular RDBMS not necessarily Parallel RDMBS)

  • High scalability
  • fault tolerance
  • One-Person Deployment
+3
source

For starters, Vanilla Apache Hadoop is 100% open source. But if you need commercial support along with advice, there are companies like Cloudera, MapR, HortonWorks, etc.

Hadoop is supported by a growing community of bug fixes and constantly improving improvements. The Hadoop HDFS storage model is based on Google GFS , which has been proven to process a large amount of data. In addition, the Hadoop Map Reduce analysis model is based on the Google Map Reduce Model .

Hadoop is used by Tech Giants like Facebook, Yahoo, Twitter, EBay, etc. to store and analyze their large amount of data in real time, as well as passively.

For your question, ETL systems read these slides where you will see.

Ok now. Why Hadoop?

  • Open source
  • Proven storage and analysis model for large volumes of data
  • Minimum hardware requirement for setup and startup.

Ok now. Why TD?

  • Commercial support
0
source

All Articles