This should (and will be copied with large files).
Note that it only removes duplicate consecutive lines, i.e.
a b b c b d
will end like
a b c b d
If you do not want to duplicate anywhere, you will need to save a set of lines that you have already seen.
using System; using System.IO; class DeDuper { static void Main(string[] args) { if (args.Length != 2) { Console.WriteLine("Usage: DeDuper <input file> <output file>"); return; } using (TextReader reader = File.OpenText(args[0])) using (TextWriter writer = File.CreateText(args[1])) { string currentLine; string lastLine = null; while ((currentLine = reader.ReadLine()) != null) { if (currentLine != lastLine) { writer.WriteLine(currentLine); lastLine = currentLine; } } } } }
Note that this assumes Encoding.UTF8 and that you want to use files. It is easy to generalize as a method:
static void CopyLinesRemovingConsecutiveDupes (TextReader reader, TextWriter writer) { string currentLine; string lastLine = null; while ((currentLine = reader.ReadLine()) != null) { if (currentLine != lastLine) { writer.WriteLine(currentLine); lastLine = currentLine; } } }
(Note that this does not close anything - the caller must do this.)
Here is the version that will remove all duplicates, not just sequential ones:
static void CopyLinesRemovingAllDupes(TextReader reader, TextWriter writer) { string currentLine; HashSet<string> previousLines = new HashSet<string>(); while ((currentLine = reader.ReadLine()) != null) {
Jon skeet
source share