Efficient File I/O From C#

This article describes and benchmarks different ways to do file I/O from C#.  All of the code referenced in this article is available for download and is free to use.

There are many different ways to do file I/O in C#.  The .NET framework provides the following classes:

  • File – Provides static methods for the creation, opening of files, copying, deleting, moving, reading, and writing of files.
  • BinaryReader – Reads primitive data types as binary values using a specific encoding.
  • BinaryWriter – Writes primitive types in binary to a stream and supports writing strings using a specific encoding.
  • StreamReader – Implements a TextReader that reads characters from a byte stream using a particular encoding.
  • StreamWriter – Implements a TextWriter for writing characters to a stream using a particular encoding.
  • FileStream   – Exposes a Stream around a file, supporting both synchronous and asynchronous read and write operations.

The Windows operating system also provides at least two functions to read and write files:

  • ReadFile  – Reads data from a file.  This function supports both synchronous and asynchronous operations.
  • WriteFile – Writes data to a file. This function supports both synchronous and asynchronous operation.

One small problem with using the Windows ReadFile and WriteFile functions is that the interface requires using pointers.  In C#, pointers can only be used in a method or class declared unsafe.  So, since some organizations are somewhat leery of using pointers and unsafe code, all of this code has been put into its own class and the calling methods do not have to declared unsafe.  The name of this class is named:

WinFileIO – provides the capability to utilize the ReadFile and Writefile windows IO functions.  Based upon extensive testing, these functions are the most efficient way to perform file I/O from C# and C++.

How The Test Program Works

The user interface provides the following buttons:

  • Run All – tests the Read File and Write File functions.  Similar to pressing the Read File and Write File buttons.
  • Read File – tests the read file methods for each class listed above.  Each test consists of reading in 3 text files with sizes of roughly < 1mb, 10mb, and 50mb.
  • Write File – tests the write file methods for each class listed above.  Each test consists of reading in 3 text files with sizes of roughly < 1mb, 10mb, and 50mb.
  • WinFileIO Unit Tests – tests each public interface I/O method in the class to show that it works correctly.

Testing Methodology:

Each test is consistent in the way benchmarking is done.  For each test function, the current time is obtained before the I/O function begins, and is then retrieved right after the test ends.  Benchmarking includes the time it takes to open and close the file, except for the last set of tests which only measure the time it takes to read or write the files.

Each test consists of reading in or writing out 3 files:

  • < 1 MB – approximately .66 megabytes.
  • 10 MB – approximately 10 megabytes.
  • 50 MB – approximately 50 megabytes.

The test machine consists of the following parts:

  • CPU – INTEL i7-950.
  • Memory – 6 GB
  • OS – Windows 7 Pro which is contained on a solid state drive.
  • Hard drive –  Western Digital 1TB SATA III 6GB/S 7200RPM 64MB CACHE, which is where the data files reside.
  • IDE – Visual Studio 2008 standard edition with .NET framework version 3.5 SP1.

Read Methods Tested:

File class methods:

  • ReadAllLines – reads all lines of the file into a string array.  See TestReadAllLines.
  • ReadAllText – reads the entire contents of the file into a string.  See TestReadAllText.
  • ReadAllBytes – reads the entire contents of the file into a byte array. See TestReadAllBytes.

BinaryReader methods:

  • Read – reads the entire contents of the file into a character array.  See TestBinaryReader1.

StreamReader methods:

Read(1) – reads the entire contents of the file into a character array using the single argument constructor.  See TestStreamReader1.

  • Read(2) – reads the entire contents of the file into a character array.  Uses a constructor to specify a sector aligned buffer size.  See TestStreamReader2.
  • Read(3) – a loop is used to read in the entire contents of the file into a character array.  Uses a constructor to specify a sector aligned buffer size.  The loop is terminated when the Peek function indicates there is no more data.
  • ReadBlock – reads the entire contents of the file into a character array.  Uses a constructor to specify a sector aligned buffer size.  See TestStreamReader5.
  • ReadToEnd – a loop is used to read in the entire contents of the file into a character array.  Uses a constructor to specify a sector aligned buffer size.  The loop is terminated when the Peek function indicates there is no more data.  See TestStreamReader4.
  • ReadBlock – reads the entire contents of the file into a character array.  Uses a constructor to specify a sector aligned buffer   See TestStreamReader3.
  • ReadLine – reads the entire contents of the file line by line into a string.  Uses a constructor to specify a sector aligned buffer size.  See TestStreamReader3.

FileStream methods:

  • Read(1) – reads the entire contents of the file into a byte array.  See TestFileStreamRead1.
  • Read(2) – reads the entire contents of the file into a byte array and parse into lines.  See TestFileStreamRead2.
  • Read(3) – reads the entire contents of the file into a byte array and parse into lines.  See TestFileStreamRead2A.
  • Read(4) – reads the entire contents of the file into a byte array.  Uses the RandomAccess option in the constructor to determine if there is any impact on performance.  See TestFileStreamRead3.
  • BeginRead(1) – reads the entire contents of the file into a byte array asynchronously.  See TestFileStreamRead4.
  • BeginRead(2) – reads the entire contents of the file into a byte array asynchronously and parses into lines.  See TestFileStreamRead5.
  • BeginRead(3) – reads the entire contents of the file into a byte array asynchronously and parses into lines.  Identical to TestFileStreamRead5 except for using a different threading locking mechanism.  See TestFileStreamRead6.
  • BeginRead(4) – reads the entire contents of the file into a byte array asynchronously.  Uses no locking mechanism.  See TestFileStreamRead7.
  • BeginRead(5) – reads the entire contents of the file into a byte array asynchronously and parses into lines.  See TestFileStreamRead8.

WinFileIO methods:

  • Read – reads the specified number of bytes into an array.
  • ReadUntilEOF – reads the entire contents of a file into an array.
  • ReadBlocks – reads the specified number of bytes into an array.

Write Methods Tested:

File class methods:

  • WriteAllLines – writes all lines in a string array to a file.  See TestWriteAllLines.
  • WriteAllText – writes the entire contents in a string to a file.  See TestWriteAllText.
  • WriteAllBytes – writes the entire contents of a byte buffer to a file.  See TestWriteAllBytes.

BinaryWriter methods:

  • Write – writes the entire contents of a character array to a file.  See TestBinaryWriter1.

StreamWriter methods:

  • Write – writes the entire contents a character array to a file.  See TestStreamWriter1.

FileStream methods:

  • Write(1) – writes the entire contents of a byte array to a file.  See TestFileStreamWrite1.
  • Write(2) – writes the entire contents of a byte array to a file.  See TestFileStreamWrite2.
  • Write(3) – writes the entire contents of a byte array to a file.  See TestFileStreamWrite3.

WinFileIO methods:

  • Write – writes the entire contents of an array to a file.  See TestWriteFileWinAPI1.
  • WriteBlocks – writes the entire contents of an array to a file.  See TestWriteFileWinAPI2.

WinFileIO Class:

This class was designed to make it easy to use the Windows ReadFile and WriteFile methods.  It handles all of the unsafe operations with pointers.  Calling methods do not have to be declared unsafe.  Implements the IDisposable interface which means that the Dispose method should be called when the object is no longer needed.  If there is a problem with any method in this class, it will throw an exception with the Windows error information.  If the function returns, then this indicates success.  The only exception to this is the Close method.

Constructors:

  • WinFileIO() – default.  If this constructor is used, then the PinBuffer function must be called.
  • WinFileIO(Array Buffer) – this constructor should be used most of the time.  The Buffer is used to read in or write out the data.  The array passed in can be of any type provided it does not contain any references or pointers.  So, byte, char, int, long, and double arrays should all work.  But string arrays will not since strings use pointers.  The code has only been tested with byte arrays.

Methods:

  • void PinBuffer(Array Buffer) – this method pins the buffer in memory and retrieves a pointer to it which is used for all I/O operations.  UnpinBuffer is called by this function so it need not be called by the user.  This function only needs to be called if the default constructor is used or a different buffer needs to be used for reading or writing.
  • void OpenForReading(string FileName) – opens a file for reading.  The argument FileName must contain the path and filename of the file to be read.
  • void OpenForWriting(string FileName) – opens a file for writing.  If the file exists, it will be overwritten.
  • int Read(int BytesToRead) – reads in a file up to BytesToRead The return value is the number of bytes read.  BytesToRead must not be larger than the size of the buffer specified in the constructor or PinBuffer.
  • int ReadUntilEOF() – reads in the entire contents of the file.  The file must be <= 2GB.  If the buffer is not large enough to read the file, then an ApplicationException will be thrown.  No check is made to see if the buffer is large enough to hold the file.  If this is needed, then use the ReadBlocks method.
  • int ReadBlocks(int BytesToRead) – reads a total of BytesToRead at a time.  There is a limit of 2gb per call.  BytesToRead should not be larger than the size of the buffer specified in the constructor or PinBuffer.
  • int Write(int BytesToWrite) – writes a buffer out to a file.  The return value is the number of bytes written to the file.
  • int WriteBlocks(int NumBytesToWrite) – writes a buffer out to a file.  The return value is the number of bytes written to the file.
  • bool Close() – closes the file.  If this method succeeds, then true is returned.  Otherwise, false is returned.

BenchMark Read Results::

Running the read file tests:
Total time reading < 1MB with File.ReadAllLines                            = 00:00:00.0030002
Total time reading 10MB with File.ReadAllLines                             = 00:00:00.0640037
Total time reading 50MB with File.ReadAllLines                             = 00:00:00.3540202
Total time reading < 1MB with File.ReadAllText                             = 00:00:00.0040002
Total time reading 10MB with File.ReadAllText                              = 00:00:00.0360020
Total time reading 50MB with File.ReadAllText                              = 00:00:00.1630093
Total time reading < 1MB with File.ReadAllBytes                            = 00:00:00
Total time reading 10MB with File.ReadAllBytes                             = 00:00:00.0050003
Total time reading 50MB with File.ReadAllBytes                             = 00:00:00.0260015

Total time reading < 1MB with BinaryReader.Read                         = 00:00:00.0020001
Total time reading 10MB with BinaryReader.Read                          = 00:00:00.0270016
Total time reading 50MB with BinaryReader.Read                          = 00:00:00.1260072

Total time reading < 1MB with StreamReader1.Read                      = 00:00:00.0010001
Total time reading 10MB with StreamReader1.Read                       = 00:00:00.0200011
Total time reading 50MB with StreamReader1.Read                       = 00:00:00.0960055
Total time reading < 1MB with StreamReader2.Read(large buf)    = 00:00:00.0010001
Total time reading 10MB with StreamReader2.Read(large buf)     = 00:00:00.0160009
Total time reading 50MB with StreamReader2.Read(large buf)     = 00:00:00.0750043
Total time reading < 1MB with StreamReader3.ReadBlock             = 00:00:00.0010001
Total time reading 10MB with StreamReader3.ReadBlock              = 00:00:00.0150008
Total time reading 50MB with StreamReader3.ReadBlock              = 00:00:00.0750043
Total time reading < 1MB with StreamReader4.ReadToEnd           = 00:00:00.0020001
Total time reading 10MB with StreamReader4.ReadToEnd            = 00:00:00.0320018
Total time reading 50MB with StreamReader4.ReadToEnd            = 00:00:00.1720099
Total time reading < 1MB with mult StreamReader5.Read              = 00:00:00.0020001
Total time reading 10MB with mult StreamReader5.Read               = 00:00:00.0430025
Total time reading 50MB with mult StreamReader5.Read               = 00:00:00.0850048
Total time reading < 1MB with StreamReader6.ReadLine                = 00:00:00.0020002
Total time reading 10MB with StreamReader6.ReadLine                 = 00:00:00.0310017
Total time reading 50MB with StreamReader6.ReadLine                 = 00:00:00.1510087
Total time reading < 1MB with StreamReader7.Read parsing          = 00:00:00.1470084
Total time reading 10MB with StreamReader7.Read parsing           = 00:00:00.1600091
Total time reading 50MB with StreamReader7.Read parsing           = 00:00:00.2260129

Total time reading < 1MB with FileStream1.Read no parsing           = 00:00:00.0080005
Total time reading 10MB with FileStream1.Read no parsing            = 00:00:00.0040002
Total time reading 50MB with FileStream1.Read no parsing            = 00:00:00.0190011
Total time reading < 1MB with FileStream2.Read parsing                 = 00:00:00.1220070
Total time reading 10MB with FileStream2.Read parsing                  = 00:00:00.1220069
Total time reading 50MB with FileStream2.Read parsing                  = 00:00:00.1370079
Total time reading < 1MB with multiFileStream2A.Read parsing     = 00:00:00.1180067
Total time reading 10MB with multiFileStream2A.Read parsing      = 00:00:00.1210070
Total time reading 50MB with multiFileStream2A.Read parsing      = 00:00:00.1320075
Total time reading < 1MB with FileStream3.Read(Rand) no parsing= 00:00:00
Total time reading 10MB with FileStream3.Read(Rand) no parsing = 0:00:00.0030002
Total time reading 50MB with FileStream3.Read(Rand) no parsing = 00:00:00.0170009
Total time reading < 1MB with FileStream4.BeginRead no parsing  = 0:00:00.0020001
Total time reading 10MB with FileStream4.BeginRead no parsing   = 0:00:00.0040002
Total time reading 50MB with FileStream4.BeginRead no parsing   = 00:00:00.0180011
Total time reading < 1MB with FileStream5.BeginRead parsing        = 0:00:00.0020001
Total time reading 10MB with FileStream5.BeginRead parsing         = 0:00:00.0280016
Total time reading 50MB with FileStream5.BeginRead parsing         = 0:00:00.1370079
Total time reading < 1MB with FileStream6.BeginRead parsing       = 00:00:00.0030002
Total time reading 10MB with FileStream6.BeginRead parsing         = 00:00:00.0280016
Total time reading 50MB with FileStream6.BeginRead parsing         = 00:00:00.1360077
Total time reading < 1MB with FileStream7.BeginRead                       = 00:00:00
Total time reading 10MB with FileStream7.BeginRead                      = 00:00:00.0050003
Total time reading 50MB with FileStream7.BeginRead                      = 00:00:00.0240014
Total time reading < 1MB with FileStream8.BeginRead parsing       = 00:00:00.0020001
Total time reading 10MB with FileStream8.BeginRead parsing        = 00:00:00.0310018
Total time reading 50MB with FileStream8.BeginRead parsing        = 00:00:00.1480085

Total time reading < 1MB with WFIO1.Read No Parsing                   = 00:00:00.0020001
Total time reading 10MB with WFIO1.Read No Parsing                    = 00:00:00.0020001
Total time reading 50MB with WFIO1.Read No Parsing                    = 00:00:00.0120007
Total time reading < 1MB with WFIO2.ReadUntilEOF No Parsing  = 00:00:00.0010001
Total time reading 10MB with WFIO2.ReadUntilEOF No Parsing   = 00:00:00.0030001
Total time reading 50MB with WFIO2.ReadUntilEOF No Parsing   = 00:00:00.0140008
Total time reading < 1MB with WFIO3.ReadBlocks API No Parsing= 00:00:00.0010001
Total time reading 10MB with WFIO3.ReadBlocks API No Parsing = 00:00:00.0030002
Total time reading 50MB with WFIO3.ReadBlocks API No Parsing = 00:00:00.0130008

Total time reading < 1MB with BinaryReader.Read                            = 00:00:00.0010001
Total time reading 10MB with BinaryReader.Read                             = 00:00:00.0220012
Total time reading 50MB with BinaryReader.Read                             = 00:00:00.1080062
Total time reading < 1MB with StreamReader2.Read(large buf)      = 00:00:00.0010001
Total time reading 10MB with StreamReader2.Read(large buf)       = 00:00:00.0150008
Total time reading 50MB with StreamReader2.Read(large buf)       = 00:00:00.0690040
Total time reading < 1MB with FileStream1.Read no parsing            = 00:00:00.0010000
Total time reading 10MB with FileStream1.Read no parsing             = 00:00:00.0030002
Total time reading 50MB with FileStream1.Read no parsing             = 00:00:00.0130008
Total time reading < 1MB with WFIO.Read No Open/Close              = 00:00:00.0010001
Total time reading 10MB with WFIO.Read No Open/Close               = 00:00:00.0030001
Total time reading 50MB with WFIO.Read No Open/Close               = 00:00:00.0130008
Read file tests have completed.

Analysis Of Read Results:

The File class provides the simplest way to read in a class.  The ReadAllBytes method of this class provides a fairly efficient way to read in a file and is only bested by the read methods in the FileStream and WinFileIO classes.  From the results, it appears that the best StreamReader and BinaryReader read methods are roughly 3 to 5 times slower than the ReadAllBytes method.

The FileStream read methods were shown to be the fastest way to read a file into memory using a method from the .NET Framework.  The synchronous method of reading the entire file into memory in TestFileStreamRead1 and TestFileStreamRead3 proved to be the best of this set of tests with TestFileStreamRead3 taking top honors by a hair.  The only difference between these two tests is that the file is opened with the SequentialScan option in TestFileStreamRead1 .vs. opening the file with RandomAccess in TestFileStreamRead3.  Since there are always other OS activities going on while running a benchmark, it is hard to know if one method is superior to another when it is this close.  However, these tests have been tested on other systems multiple times with different Windows OSs with the same results, so in this case it appears that the TestFileStreamRead3 method is marginally superior.

The biggest disappointment came with the 5 FileStream asynchronous tests given in TestFileStreamRead4 – TestFileStreamRead8.  These tests all show that reading in a file asynchronously is inferior to reading it in synchronously.  This is even true if other activities like parsing a file is done in between reads.  For example, compare the results of TestFileStreamRead2A which reads in a file synchronously and parses the data against the results of TestFileStreamRead5 which reads in a file asynchronously and parses the data while the next block is read in asynchronously.  Even when the locks have been removed (see TestFileStreamRead8), it is still at least 10% slower than reading the file in synchronously and then parsing the file afterwards (see TestFileStreamRead2A).

The WinFileIO class proved to be the fastest way to read a file in.  It is between 33% to 50% faster than the fastest FileStream read method based upon the measured times above.  However, the last set of tests – TestBinaryReader1NoOpenClose through TestReadFileWinAPINoOpenClose measure how quickly the files are read in after the file is opened.  According to the results, the FileStream read method is just as fast as any of the WinFileIO read methods.  So, it looks like the .NET framework takes longer to open a file than the windows CreateFile function.

Benchmark Write Results:

Running the write file Tests:
Total time writing < 1MB with File.WriteAllLines                             = 00:00:00.0050003
Total time writing 10MB with File.WriteAllLines                              = 00:00:00.0350020
Total time writing 50MB with File.WriteAllLines                              = 00:00:00.1620093
Total time writing < 1MB with File.TestWriteAllText                      = 00:00:00.0040002
Total time writing 10MB with File.TestWriteAllText                       = 00:00:00.0270016
Total time writing 50MB with File.TestWriteAllText                       = 00:00:00.1440082
Total time writing < 1MB with File.WriteAllBytes                             = 00:00:00.3560204
Total time writing 10MB with File.WriteAllBytes                              = 00:00:00.3390194
Total time writing 50MB with File.WriteAllBytes                              = 00:00:00.3530202

Total time writing < 1MB with BinaryWriter.Write                           = 00:00:00.0010001
Total time writing 10MB with BinaryWriter.Write                            = 00:00:00.0050003
Total time writing 50MB with BinaryWriter.Write                            = 00:00:00.3040174

Total time writing < 1MB with StreamWriter1.Write                        = 00:00:00.0030002
Total time writing 10MB with StreamWriter1.Write                         = 00:00:00.0230013
Total time writing 50MB with StreamWriter1.Write                         = 00:00:00.1140065

Total time writing < 1MB with FileStream1.Write no parsing          = 00:00:00.0010001
Total time writing 10MB with FileStream1.Write no parsing           = 00:00:00.0050003
Total time writing 50MB with FileStream1.Write no parsing           = 00:00:00.3670210
Total time writing < 1MB with FileStream2.Write no parsing          = 00:00:00.0070004
Total time writing 10MB with FileStream2.Write no parsing           = 00:00:00.1060061
Total time writing 50MB with FileStream2.Write no parsing           = 00:00:00.5000286
Total time writing < 1MB with FileStream3.Write no parsing          = 00:00:00.0100006
Total time writing 10MB with FileStream3.Write no parsing           = 00:00:00.1150066
Total time writing 50MB with FileStream3.Write no parsing           = 00:00:00.5840334

Total time writing < 1MB with WFIO1.Write No Parsing                  = 00:00:00.0020001
Total time writing 10MB with WFIO1.Write No Parsing                   = 00:00:00.0050003
Total time writing 50MB with WFIO1.Write No Parsing                   = 00:00:00.3530202
Total time writing < 1MB with WFIO2.WriteBlocks No Parsing       = 00:00:00.0010001
Total time writing 10MB with WFIO2.WriteBlocks No Parsing        = 00:00:00.0060003
Total time writing 50MB with WFIO2.WriteBlocks No Parsing        = 00:00:00.0260015
Write file tests have completed.

ANALYSIS OF WRITE RESULTS:

The File class provides the simplest way to write a file out.  Unlike the ReadAllBytes method for reading files, WriteAllBytes is less efficient than WriteAllText according to the results above.

One interesting result is that the times to write out the < 1MB file and 10 MB file for the BinaryWriter, FileStream, and WinFileIO classes are quite similar and fast.  However, the time to write out the 50 MB file takes around 60 times longer than the 10 MB file.  This does not apply to the WinFileIO.WriteBlocks method which proved to be the fastest way to write a file out.  The most likely reason for this is that WriteBlocks writes the file out in 65,536 byte chunks.  However, the TestFileStreamWrite3 test also writes out the file in 65,536 byte chunks and proved to be the slowest method.  I can’t think of a good explanation for this other than perhaps the FileStream.Write method has some issues.

Conclusion:

Any time a benchmark is done trying to test out file I/O methods, it is very difficult to completely trust the results due to the operating system caching files and other OS activities.  If different files are used, then they can be placed on areas of the drive that will yield better performance simply because the drive can access them faster and can impact the results.  So, take a little grain of salt with these benchmark results.  To achieve the best performance for your environment, I would recommend trying out different classes in your production environment to see which yields the best performance.

Having said that and after testing on 3 different machines with similar results, I believe that the best performing file I/O can be obtained from the WinFileIO read methods for reading a file and the WinFileIO.WriteBlocks method for writing files.

I have done similar tests with C++ which are not shown here and believe that the Windows ReadFile and WriteFile methods are the most efficient way to do file I/O from that language as well.

Download and installation:

Click this link.  This file is a zipped file containing 2 visual studio projects.  Extract it into the folder of your choice and leave the folder hierarchy intact.  The code was built and tested with Visual Studio 2008, but it should work with Visual Studio 2015 with little if any modification.  To make it work with previous versions of Visual Studio, you may have to open a new project and  add the individual files to each project.

The following code files are contained in the in the FileTestsForEfficiency folder:

  • TestsForEfficiency.cs – entry point to the application.
  • MainForm.cs – holds the UI designer and button events.
  • FileEffTests.cs – holds the file I/O benchmark tests, which is contained in the FileEfficientTests class.
  • Win32FileIO.cs – holds the class used to implement the Windows ReadFile and WriteFile functionality.
  • WinFileIOUnitTests.cs – holds the unit tests that test out the I/O methods of the WinFileIO class.

The following code files are contained in the FileTestsForEfficiency folder:

  • Win32FileIO.cs – holds the class used to implement file I/O using the Windows ReadFile and Writefile methods.

About Bob Bryan

Software developer for over 20 years. Interested in efficient software methodology, user requirements, design, implementation, and testing. Experienced with C#, WPF, C++ , VB, Sql Server, stored procedures, and office tools. MCSD.
This entry was posted in C#, efficient software development, software development, Software Productivity and tagged , , . Bookmark the permalink.

35 Responses to Efficient File I/O From C#

  1. M says:

    I dont see any download button to download the code. Can i see the code? Would love to compare the efficiency of the program with ours (which use FileStream Stream Writer)

  2. Bob Bryan says:

    The download is stored with Box.com. The link is located in the upper right corner of the web page of this blog under the category “meta” and the title at the top is listed as flash_widget. The name of the file is called FileTestsForEfficiency.zip.

  3. abraham says:

    Can you post a link in comment for all of us who are viewing mobile site?

  4. Bob Bryan says:

    I appologize for the problems with the download file. It is a WordPress / Box thing. If this happens again, you can download the file via this link:

    http://www.box.net/shared/om28ouh3l9rymb5335z2

  5. vadim says:

    Great article! Best complete perfand efficiency analysis i’ve seen. Thank you.

  6. leftler says:

    I love the article, a followup article I would be very interested in would be similar benchmarks but on random access instead of streaming a file in sequentially. Also you left out MemoryMappedFile and its MemoryMappedViewStream and MemoryMappedViewAccessor functions (the random access times of a ViewAccessor vs a Filestream is the real benchmark I am curious about).

  7. leftler says:

    I love the article, a followup article I would be very interested in would be similar benchmarks with .net 4 and adding MemoryMappedFile and its MemoryMappedViewStream and MemoryMappedViewAccessor functions (the random access times of a ViewAccessor vs a Filestream is the real benchmark I am curious about).

  8. Simply Amazing. Can’t wait to make some benchmark myself. Thank you.

  9. John says:

    Thanks – this ready-to-use-code helped a lot within minutes. On the machine I use native writing of bytes is 6 times as fast as any of the methods offered in C#. (.NET 4/ compiled for x64 platform on a Win64 2x Xeon machine, on a Win32-machine the writing speed is ok).

  10. John O says:

    Great article and code. I’ve integrated this into a DBF and SHP data reader library. The calls will read the data file and automatically populate a DataTable-style object model with the contents.
    Using your class, I have managed to shave ~10% off the processing time.
    From taking ~5.7 seconds to read and process a 138Mb SHP file down to ~5.3 seconds (I have not looked at file read only times though).
    I had some trouble implementing the ReadBlocks() method, so instead went with reading the entire file (invalid memory access I believe the error was).

    One addition I reckon would be handy :

    /// Read an entire file into an array.
    /// The type of the elements of the array.
    /// The name of the file to read.
    /// An array of the entire file.
    public static T[] ReadFileBuffer(string fileName)
    {
    Win32FileIO.WinFileIO reader = null;

    try
    {
    T[] buffer = new T[(new System.IO.FileInfo(fileName)).Length];

    reader = new WinFileIO(buffer);
    reader.OpenForReading(fileName);
    reader.Read((int)buffer.Length);

    return buffer;
    }
    finally
    {
    if (reader != null)
    {
    reader.Close();
    reader.Dispose();
    }
    }
    }

    Usage would be :
    byte[] buffer = Win32FileIO.WinFileIO.ReadFileBuffer(fileName);

  11. Pingback: File.Copy in Parallel.ForEach | PHP Developer Resource

  12. Thansk for sharing many different ways to do file I/O in C# with classes inthe .NET framework

  13. Matt says:

    I will be reading from a text large file which is (unfortunately) not character delimited OR fixed width. I’ll have to read an entire line, use some logic to split things out, and then I want to semi-normalize and force data types and insert it all into Sql Server tables. The files could potentially be gigabytes in size. Can you make any suggestions on handling that type of scenario? Commit sizes and that sort of thing maybe?

  14. Bob Bryan says:

    You can use the WinFileIO class to read/write the data in/out. Try using a fairly large block size to read the data in and out initially – something like 500MB or larger since this tends to be a little more efficient. It really depends on how much resources your production environment has. Using something smaller like 50MB is also quite reasonable. Drop the block size down if you run low on memory, you are processing multiple files simultaneously, or any of the class methods throws an exception during testing.

    So, sounds like you will have to read a large block of data into memory, parse it and move the results to another large block of memory which is your output buffer. You will need to keep track of how much data has been processed in the read buffer and when close to the end of the buffer and the remaining number of bytes in the buffer is less than some minimal amount (which is based upon the largest expected line you are processing) then move those last few bytes to the start of the read buffer and then read in the next line to the next byte after that. This technique is called wrap around buffering or using a circular buffer. So, the block size should really be defined to be something like 500MB + “largest expected line” in order to properly handle the wrap around buffer technique.

    You will also have to keep track of the number of bytes in the output buffer and write the buffer out when it is full. When you move the data to the output buffer, you need to make sure that you keep within its bounds. This can be done by testing if the length of the next line to be added + the total number of bytes in the output buffer is greater than or equal to the length of the output buffer. If true, then only move the number of bytes to completely fill the output buffer and after writing out the data – move the rest of the line to the start of the output buffer.

  15. Anonymous says:

    Thanks for sharing your code! Used it to replace my slow BinaryReader/Writer file copying code, speed over network went from about 4MB/sec to 40MB/sec!

  16. Anonymous says:

    The results are absolutely unreliable. Look at the source code – DateTime.Now is used to calculate millisecond time intervals.

  17. Jacob says:

    Try this for writing the file, it seems to be better than trying to write the large file as one big chunk:

    using (Stream stream = File.Open(FileNamesW[FileLoop], FileMode.Create, FileAccess.Write, FileShare.None))
    {
    int offset = 0;
    int bytesToWrite = 65536;
    int totalSize = BytesInFiles[FileLoop];

    do
    {
    if (offset + bytesToWrite > totalSize)
    {
    stream.Write(ByteBuf, offset, totalSize – offset);
    }
    else
    {
    stream.Write(ByteBuf, offset, bytesToWrite);
    }
    offset += bytesToWrite;
    }
    while (offset < totalSize);
    }

  18. Anonymous says:

    Your tests with readers do not seem to use a BufferedStream. That makes all the difference in terms of performance http://stackoverflow.com/a/9643111/141172

    • Johnny Boy says:

      BufferedREader isn’t the fastest according to this. For those who are interested in micro-optimization techniques, the absolute fastest way in most cases is by the following:

      using (StreamReader sr = File.OpenText(fileName))
      {
      string s = String.Empty;
      while ((s = sr.ReadLine()) != null)
      {
      //we’re just testing read speeds
      }
      }

      Put up against several other techniques, it won out most of the time, including against the BufferedReader.

      Here’s the article which benchmarks multiple techniques to determine the fastest way.

      http://blogs.davelozinski.com/curiousconsultant/csharp-net-fastest-way-to-read-text-files

      Definitely worth a look for those interested in the various speed performances on multiple techniques.

      _

  19. Chakravarthi says:

    Can you please implement Seek method?

  20. Paul Chernoch says:

    Excellent article, but in running my own performance tests I found something strange. When I loop over the same file and read it using your WFIO object and do nothing else, then this is 1.4 times faster than File.ReadAllBytes. But when I follow the call to read the bytes by a parser that scans through the filled buffer and creates records, then the WFIO tests run 2.5 times slower! I put separate timers around the file read and parsing steps. The WFIO read time is always faster than File.ReadAllBytes, but the exact same parsing code takes a different amount of time! Could this be an artifact of the pinning/unpinning of the buffer? I tried moving where I do my dispose of the WFIO, but it did not help. I tried creating a new WFIO for each file, and I tried reusing the same WFIO. Any thoughts?

    • Paul Chernoch says:

      Figured it out. I failed to specify the amount of the buffer that was in use when calling the parser. ReadAllBytes returns a full buffer perfectly sized, so it was running faster.

  21. Pingback: New project: Prime# (moving to GitHub) | OronDF343

  22. Pingback: Efficient File I/O From C# | Designing Efficient Software - daxmax.com

  23. Vijay says:

    Nice.. Thanks for sharing information.

  24. Anonymous says:

    Links are broken.

    • Bob Bryan says:

      The links seem to be a never ending problem with Box. So, I have ported all my download files to Google Drive. They offer 15GB for free, which is 50% more than Box.

      Firefox seems to have some issues with initially displaying the download screen with Google Drive. So, if you have any issues with that then try Google Chrome or MS Edge.

  25. gggustafson says:

    In Win32FileIO.cs, at line 226, the error message is incorrect.

  26. Pete S says:

    So I know this is old but do you have any examples as to how you might go about making this work in a multi-threaded environment? I’d like to transfer a bunch of files to multiple USB drives at the same time and I’m wondering if this could help my performance.

    • Bob Bryan says:

      That is something that you will have to experiment with since much will depend on your environment and hardware. My initial approach would be to use a single thread for each USB drive to copy the file and then measure the performance. After that, try using 5, 10, 25, 50, etc. threads per USB drive to see if performance improves.

      There are copy utilities out there. See:

      http://www.techsupportalert.com/content/faster-way-copy-files.htm
      http://www.online-tech-tips.com/software-reviews/tools-for-copying-many-files/

      But, from my brief look, none of them look like they can be set up to handle copying files from one drive to many USB drives simultaneously. So, it looks like you will have to write your own little utility to do that. You can also add features to your utility like specifying the number of threads for each USB drive, specifying the files to be copied to specific drives, verification options, etc.

      You should also try using the fastest .NET file I/O, which according to the benchmarks above are TestStreamReader2 and TestStreamWriter1 and compare the results to the corresponding methods from the WinFileIO class, which include the Read and WriteBlocks methods.

      As Microsoft improves .NET with each release, they may actually improve the performance to the point where this class is no longer needed. I don’t know if that is the case today though.

      • Pete S says:

        Thanks Bob. Good point, it seems (on my system at least) that FileStream1.Read (<1MB) is the second fastest but there's not a lot in it (0.0004501 vs 0.0004216) and FileStream1.Write (<1MB no parsing) is the fastest (again, not by much – 0.0042047 vs 0.0044363). So it seems the inbuilt .NET classes are already close or faster than WinFileIO operations. I think now it's a matter of playing around with buffer sizes to see if I can optimise the operations. Of course that may well have impacts on what method is faster…

  27. Frederik says:

    You mention that you’re disappointed with the performance of the async reads. When reading from the file asynchronously, did you specify that the File must be opened for async usage ? (It’s a parameter in one of the overloads of the FileStream constructor).

    ps: why not put the code on github ?

  28. Eujine Tatarinov says:

    Hi Bob, this is a great article and the testing app is really impressive.

    Based on your code, I made benchmarks for testing IO.NET functions on a cold start with clearing the IO cache and for sequentially reading / writing the same file several times during iteration.
    Creating your application, you did a great job and I want to mention you in my repository on
    GitHub https://github.com/Wizard2007/Win32FileIOBenchmark .

    Please, contact me convenient for you way.

    p.s.: If you have any suggestions for improving benchmarks please contact me.

Leave a comment