Removing duplicate lines from text using Java

I was wondering if anyone has any logic in java that removes duplicate lines while preserving the order of the lines.

I would prefer not to use regex.

+4
source share
6 answers
public class UniqueLineReader extends BufferedReader { Set<String> lines = new HashSet<String>(); public UniqueLineReader(Reader arg0) { super(arg0); } @Override public String readLine() throws IOException { String uniqueLine; if (lines.add(uniqueLine = super.readLine())) return uniqueLine; return ""; } //for testing.. public static void main(String args[]) { try { // Open the file that is the first // command line parameter FileInputStream fstream = new FileInputStream( "test.txt"); UniqueLineReader br = new UniqueLineReader(new InputStreamReader(fstream)); String strLine; // Read File Line By Line while ((strLine = br.readLine()) != null) { // Print the content on the console if (strLine != "") System.out.println(strLine); } // Close the input stream in.close(); } catch (Exception e) {// Catch exception if any System.err.println("Error: " + e.getMessage()); } } } 

Changed Version:

 public class UniqueLineReader extends BufferedReader { Set<String> lines = new HashSet<String>(); public UniqueLineReader(Reader arg0) { super(arg0); } @Override public String readLine() throws IOException { String uniqueLine; while (lines.add(uniqueLine = super.readLine()) == false); //read until encountering a unique line return uniqueLine; } public static void main(String args[]) { try { // Open the file that is the first // command line parameter FileInputStream fstream = new FileInputStream( "/home/emil/Desktop/ff.txt"); UniqueLineReader br = new UniqueLineReader(new InputStreamReader(fstream)); String strLine; // Read File Line By Line while ((strLine = br.readLine()) != null) { // Print the content on the console System.out.println(strLine); } // Close the input stream in.close(); } catch (Exception e) {// Catch exception if any System.err.println("Error: " + e.getMessage()); } } } 
+4
source

If you feed strings to a LinkedHashSet , they ignore duplicates as it is a set, but preserves the order as it is connected. If you just want to know if you have seena given string before, feed them into a simple Set as you continue, and ignore those that the Set already contains / contains.

+2
source

Read the text file using BufferedReader and save it to LinkedHashSet. Print it out.

Here is an example:

 public class DuplicateRemover { public String stripDuplicates(String aHunk) { StringBuilder result = new StringBuilder(); Set<String> uniqueLines = new LinkedHashSet<String>(); String[] chunks = aHunk.split("\n"); uniqueLines.addAll(Arrays.asList(chunks)); for (String chunk : uniqueLines) { result.append(chunk).append("\n"); } return result.toString(); } } 

Here are some unit tests to check (ignore my evil copy-paste;)):

 import org.junit.Test; import static org.junit.Assert.*; public class DuplicateRemoverTest { @Test public void removesDuplicateLines() { String input = "a\nb\nc\nb\nd\n"; String expected = "a\nb\nc\nd\n"; DuplicateRemover remover = new DuplicateRemover(); String actual = remover.stripDuplicates(input); assertEquals(expected, actual); } @Test public void removesDuplicateLinesUnalphabetized() { String input = "z\nb\nc\nb\nz\n"; String expected = "z\nb\nc\n"; DuplicateRemover remover = new DuplicateRemover(); String actual = remover.stripDuplicates(input); assertEquals(expected, actual); } } 
+1
source

Here is another solution. Just use UNIX!

 cat MyFile.java | uniq > MyFile.java 

Edit: Oh wait, I'll re-read the topic. Is this a legitimate decision as I managed to be an agnostic of the language?

+1
source

Easily remove duplicate line from text or file using the new Java API. Stream supports various aggregate functions such as sorting, reporting, and working with various existing Java data structures and their methods. The following example can be used to remove duplicates or sort content in a file using the Stream API

 package removeword; import java.io.IOException; import java.nio.file.Files; import java.nio.file.OpenOption; import java.nio.file.Path; import java.nio.file.Paths; import java.util.Arrays; import java.util.Scanner; import java.util.stream.Stream; import static java.nio.file.StandardOpenOption.*; import static java.util.stream.Collectors.joining; public class Java8UniqueWords { public static void main(String[] args) throws IOException { Path sourcePath = Paths.get("C:/Users/source.txt"); Path changedPath = Paths.get("C:/Users/removedDouplicate_file.txt"); try (final Stream<String> lines = Files.lines(sourcePath ) // .map(line -> line.toLowerCase()) /*optional to use existing string methods*/ .distinct() // .sorted()) /*aggregrate function to sort disctincted line*/ { final String uniqueWords = lines.collect(joining("\n")); System.out.println("Final Output:" + uniqueWords); Files.write(changedPath , uniqueWords.getBytes(),WRITE, TRUNCATE_EXISTING); } } } 
+1
source

here i use hashset to store noticed lines

 Scanner scan;//input Set<String> lines = new HashSet<String>(); StringBuilder strb = new StringBuilder(); while(scan.hasNextLine()){ String line = scan.nextLine(); if(lines.add(line)) strb.append(line); } 
0
source

All Articles