Apache NiFi setup issues

Question

Apache NiFi setup issues

I developed a prototype NiFi stream to receive data in HDFS. Now I would like to improve the overall performance, but it seems that I can not move forward.

the stream accepts csv input files (each line has 80 fields), splits them at the line level, applies some conversions to the fields (using 4 user processors executed in series), buffers new lines to csv files, outputs them to HDFS. I designed the processors in such a way that the contents of the stream file are available only once, when each individual record is read, and its fields are moved to the stream attributes. Tests were carried out on an instance of amazon EC2 m4.4xlarge (16-core processor, 64 GB of RAM).

This is what I have tried so far:

Moved the streaming file repository and content repository to different SSDs
Moved the Provence repository to memory (NiFi could not keep up with the speed of events)
System configuration according to configuration settings
I tried to assign multiple threads to each of the processors in order to achieve a different number of threads.
I tried to increase the nifi.queue.swap.threshold threshold and set the back pressure so that I never reach the swap limit
Tried various JVM memory settings from 8 to 32 GB (in conjunction with G1GC)
I tried to increase the spec specs, nothing has changed

From the monitoring that I performed, it seems that the disks are not a bottleneck (they mostly stand idle most of the time, showing that the calculation is actually performed in memory), and the average CPU load is below 60%,

The most I can get is 215 thousand rows per minute, which is 3.5 thousand rows per second. In terms of volume, it is only 4.7 MB / s . I am striving for something definitely larger than that. Like the comparison, I created a stream that reads the file, breaks it into lines, combines them into blocks and outputs on the disk. Here I get 12 thousand lines per second or 17 MB / s. Not too surprisingly fast, and let me think that I'm probably doing something wrong. Anyone have any suggestions for improving performances? How much will I benefit from running NiFi on a cluster instead of growing with instance specs? Thanks to everyone.

+5

performance apache-nifi

riccamini Sep 27 '16 at 13:41

source share

1 answer

riccamini · Accepted Answer · 2016-10-03T10:56:50+0000

It turned out that the poor performance was a combination of both developed custom processors and an integrated merge processor. the same question reflected on the hortonworks community forum received interesting feedback.

Regarding the first issue, we recommend adding the SupportsBatching annotation to the processors. This allows processors to combine multiple commits and allows the NiFi user to maintain latency or throughput when executing the processor from the configuration menu. Further information can be found in the documentation here .

Another conclusion was that the integrated MergeContent processor does not seem to have optimal characteristics, so if possible, consider changing the flow and avoiding the merge phase.

Apache NiFi setup issues

More articles: