MultiThread is slower than one process

for the task at school, I was asked to create a simple program that creates 1000 text files, each with a random number of lines, counts how many lines exist through a multi-threaded \ single process. than delete these files.

now a strange thing happens during testing - linear counting of all files is always a little faster than counting them in multithreaded mode, which caused a rather academic theorizing session in my class.

when using Scanner to read all files everything works as intended - 1000 files are read with a linear time of 500 ms and a scroll time of 400 ms

but when I use BufferedReader times, the drop is about 110 ms in a linear and 130 ms stream.

what part of the code causes this bottleneck and why?

EDIT: just for clarification, I am not asking why Scanner is slower than BufferedReader .

full compiled code: (although you should change the output of the file creation path)

 import java.io.*; import java.util.Random; import java.util.Scanner; /** * Builds text files with random amount of lines and counts them with * one process or multi-threading. * @author Hazir */// CLASS MATALA_4A START: public class Matala_4A { /* Finals: */ private static final String MSG = "Hello World"; /* Privates: */ private static int count; private static Random rand; /* Private Methods: */ /** * Increases the random generator. * @return The new random value. */ private static synchronized int getRand() { return rand.nextInt(1000); } /** * Increments the lines-read counter by a value. * @param val The amount to be incremented by. */ private static synchronized void incrementCount(int val) { count+=val; } /** * Sets lines-read counter to 0 and Initializes random generator * by the seed - 123. */ private static void Initialize() { count=0; rand = new Random(123); } /* Public Methods: */ /** * Creates n files with random amount of lines. * @param n The amount of files to be created. * @return String array with all the file paths. */ public static String[] createFiles(int n) { String[] array = new String[n]; for (int i=0; i<n; i++) { array[i] = String.format("C:\\Files\\File_%d.txt", i+1); try ( // Try with Resources: FileWriter fw = new FileWriter(array[i]); PrintWriter pw = new PrintWriter(fw); ) { int numLines = getRand(); for (int j=0; j<numLines; j++) pw.println(MSG); } catch (IOException ex) { System.err.println(String.format("Failed Writing to file: %s", array[i])); } } return array; } /** * Deletes all the files who file paths are specified * in the fileNames array. * @param fileNames The files to be deleted. */ public static void deleteFiles(String[] fileNames) { for (String fileName : fileNames) { File file = new File(fileName); if (file.exists()) { file.delete(); } } } /** * Creates numFiles amount of files.<br> * Counts how many lines are in all the files via Multi-threading.<br> * Deletes all the files when finished. * @param numFiles The amount of files to be created. */ public static void countLinesThread(int numFiles) { Initialize(); /* Create Files */ String[] fileNames = createFiles(numFiles); Thread[] running = new Thread[numFiles]; int k=0; long start = System.currentTimeMillis(); /* Start all threads */ for (String fileName : fileNames) { LineCounter thread = new LineCounter(fileName); running[k++] = thread; thread.start(); } /* Join all threads */ for (Thread thread : running) { try { thread.join(); } catch (InterruptedException e) { // Shouldn't happen. } } long end = System.currentTimeMillis(); System.out.println(String.format("threads time = %d ms, lines = %d", end-start,count)); /* Delete all files */ deleteFiles(fileNames); } @SuppressWarnings("CallToThreadRun") /** * Creates numFiles amount of files.<br> * Counts how many lines are in all the files in one process.<br> * Deletes all the files when finished. * @param numFiles The amount of files to be created. */ public static void countLinesOneProcess(int numFiles) { Initialize(); /* Create Files */ String[] fileNames = createFiles(numFiles); /* Iterate Files*/ long start = System.currentTimeMillis(); LineCounter thread; for (String fileName : fileNames) { thread = new LineCounter(fileName); thread.run(); // same process } long end = System.currentTimeMillis(); System.out.println(String.format("linear time = %d ms, lines = %d", end-start,count)); /* Delete all files */ deleteFiles(fileNames); } public static void main(String[] args) { int num = 1000; countLinesThread(num); countLinesOneProcess(num); } /** * Auxiliary class designed to count the amount of lines in a text file. */// NESTED CLASS LINECOUNTER START: private static class LineCounter extends Thread { /* Privates: */ private String fileName; /* Constructor: */ private LineCounter(String fileName) { this.fileName=fileName; } /* Methods: */ /** * Reads a file and counts the amount of lines it has. */ @Override public void run() { int count=0; try ( // Try with Resources: FileReader fr = new FileReader(fileName); //Scanner sc = new Scanner(fr); BufferedReader br = new BufferedReader(fr); ) { String str; for (str=br.readLine(); str!=null; str=br.readLine()) count++; //for (; sc.hasNext(); sc.nextLine()) count++; incrementCount(count); } catch (IOException e) { System.err.println(String.format("Failed Reading from file: %s", fileName)); } } } // NESTED CLASS LINECOUNTER END; } // CLASS MATALA_4A END; 
+7
java java.util.scanner multithreading bufferedreader
source share
4 answers

There may be different factors:

  • The most important thing is to avoid accessing the drive from multiple threads simultaneously (but since you're on an SSD, you can handle this). However, on a regular hard drive, switching from one file to another may cost you 10 ms of search time (depending on how the data is cached).

  • 1000 threads are too many, try to use the number of cores * 2. Too much time will be lost just to switch contexts.

  • Try using a thread pool. The total time is from 110 ms to 130 ms, some of which will be associated with the creation of threads.

  • Do some more work in the test as a whole. Timing of 110ms is not always accurate. It also depends on what other processes or threads are running at this time.

  • Try switching the order of your tests to see if it matters (caching can be an important factor)

     countLinesThread(num); countLinesOneProcess(num); 

In addition, depending on the system, currentTimeMillis() may have a resolution of 10 to 15 ms. Thus, this is not a very accurate time for short runs.

 long start = System.currentTimeMillis(); long end = System.currentTimeMillis(); 
+5
source share

The bottleneck is the disk.

You can access a drive with only one thread at a time, so using multiple threads does not help, and instead, the overtime needed to switch threads will slow down your global operations.

The use of multithreading is only interesting if you need to divide the work, waiting for long I / O operations in different sources (for example, on a network and on a disk, or on two different disks or in many network streams), or if you have an intensive operation CPUs that can be shared between different cores.

Remember that for a good multi-threaded program you should always consider:

  • switching time between threads
  • long I / O operations can be performed in parallel or not
  • There is an intensive processor time for calculations.
  • cpu calculations can be divided into subtasks or not
  • complexity of data exchange between threads (semaphores or synchronization)
  • difficult to read, write and manage multi-threaded code compared to a single streaming application
+10
source share

It is very important to use the number of threads used. the only process trying to switch between 1000 threads (you created a new thread per file) is probably the main reason for the slow one.

try using let say 10 threads to read 1000 files, then you will see a noticeable increase in speed

+1
source share

If the actual time required for the calculation is insignificant compared to the time required for the I / O, the potential benefits of reuse are also insignificant: one thread is well able to saturate the I / O and then will make a very fast calculation; more threads cannot speed up much. Instead, normal in-line overheads will be applied, plus possibly a blocking penalty in I / O implementation, which actually reduces throughput.

I think that the potential benefits are greatest when the processor time required to work with the data block is longer compared to the time to get it from disk. In this case, all threads except the currently read (if any) can be calculated, and the execution speed should scale well with the number of cores. Try to check a large number of simple candidates from a file or hack encrypted strings (which, like, is also stupid).

0
source share

All Articles