Is the web service suitable for ETL?

My company is considering using a web service as an ETL method. However, I do not think that the web service is suitable for this purpose for several reasons: 1. The web service can consume a lot of memory when creating large xml. 2. xml - bloated format. 3. Perhaps a timeout if the server needs a huge amount of time to generate data 4. file size limitation? (for windows it's 2Gb, if my memory suits me)

I'm not a web services expert, so I need your opinions. :)

Thanks.

+4
source share
6 answers

There are many technologies in the web services tool that get around all the problems you are developing. There is streaming XML shredding, there are XML compression formats for delivery, protocols that deal with fragmentation and fairness, and there are many storage systems that can store terabytes per terabyte of data.

If you imagine in a web service how your team starts homework using an interface that takes a single glop argument with a 2 GB serialized table, then all your arguments are valid. But if you give your requirements to an experienced team with knowledge of the concepts related to WS-ReliableMessaging and WS-Transaction , then there is no reason not to have an ETL process around web services. Please note that I am not a supporter of SOAP protocols, but I protect the knowledge and understanding of the concepts involved.

Now that itโ€™s said whether the ETL process that focuses on web services makes sense for you or not, it depends on a number of other reasons. However, your denial of web service technology does not hold water.

+6
source

I would not use a web service for the ETL task. There are special tools for this task (for example, Ab Initio, Informatica, etc.), which are better suited.

If you have a lot of data, I would say that the price of the extra delay that the network introduces would be prohibitive.

+1
source

It really depends on what you are doing and how you are trying to do it. In general, web services require more care and feeding than you usually introduce into the ETL process, but they can be surprisingly effective in this task as well. I did not have enough details for your scenario to say if this would work.

I worked on Webservices, which send and receive documents of more than 100 MB, some of them are encoded in XML, and some in seconds (on a closed local network). These services required a lot of setup and planning, but they worked well for our scenario, and they allowed a wide range of clients to connect and transfer different amounts of data through a fairly standard interface. This was different from some of the other ETL tasks that we had, because the work was specific to each client and had to be configured and maintained for each client.

It all depends on what you are doing and what your limitations are.

If you intend to follow this route, sit down and draw the process from start to finish, including how you want clients to connect, verify that the data has been received, and make sure that the task is completed. Consider some of the scenarios, clients, and types of data transferred, and then find out what is required. Compare this to what is already available in other tools, and how much time you need to do.

+1
source

I'm really wondering why your company is not , given the use of a real ETL tool, such as the one mentioned by duffymo in its answer or, Talend or CloverETL , if the source is open - this is an option.

I am not an expert on ETL products and I have not tested all of them, but I am sure that this is something that needs to be considered.

+1
source

Take a look at MTOM, for starters, which lets you pass arbitrary non-XML data to a web service.

0
source

Web services are great for ETL tasks. Remember that each task will be processed in its flow for free, and you are guaranteed the correct cleaning between requests. Using web services inside something like Tomcat will not be as difficult as you think.

If you are concerned about XML bloating, consider the JSON format.

0
source

All Articles