Why should I use int instead of byte or short in C #

I found several threads related to this problem. Most people prefer to use int in their C # code on board, even if the byte or smallint will process the data if it is not a mobile application. I do not understand why. Doesn't it make sense to define your C # data type as the same data type that would be in your data storage solution?

My room: If I use a typed dataset, the Linq2SQL, POCO classes, one way or another, I will encounter problems with the conversion of compiler data types if I do not synchronize my data types at my levels. I don’t really like to do System.Convert all the time just because it was easier to use int through the board in C # code. I have always used any smallest data type needed to process the data in the database, as well as in the code, so that my database interface is clean. Therefore, I would put that 75% of my C # code uses bytes or short, not int, because this is what is in the database.

Opportunities: Does this mean that most people who just use int for all code also use int data type for their sql storage data types and can take less care of the total size of their database or do they make system.convert in the code everywhere Where applicable?

Why I don’t care: I worked forever and I just want to keep abreast of best practices and standard coding conventions.

+56
c # types sql-server
Jul 08 '09 at 11:28
source share
7 answers

By performance, in all cases, int is faster. The CPU is designed to work efficiently with 32-bit values.

Shorter values ​​are difficult to solve. To read one byte, say, the CPU must read the 32-bit block that contains it, and then mask the top 24 bits.

To write a byte, it must read the 32-bit destination block, overwrite the lower 8 bits with the desired byte value, and write the entire 32-bit block again.

Cosmically, of course, you save several bytes using smaller data types. Therefore, if you create a table with several million rows, then shorter data types may be considered. (And the same could be a good reason why you should use smaller data types in your database)

And correctness, int is not easy to overflow. What if you think your value will correspond to a byte, and then at some point in the future, some harmless code changes mean that larger values ​​are stored in it?

Here are some of the reasons why int should be your default data type for all integral data. Use only bytes if you really want to store machine bytes. Use only short shorts if you are dealing with a file format or protocol or similar that actually indicates 16-bit integer values. If you are just dealing with integers in general, make them ints.

+78
Jul 18 '09 at 20:38
source share

I was only 6 years late, but maybe I can help someone else.

Here are some guidelines I would use:

  • If it is likely that the data will not match in the future, use a larger int type.
  • If a variable is used as a struct / class field, then by default it will be filled in order to occupy all 32-bit values ​​anyway, so using byte / int16 will not save memory.
  • If a variable is short-lived (as inside a function), then smaller data types will help little.
  • "byte" or "char" can sometimes better describe the data and can perform compile-time checks to make sure that large values ​​were not assigned to it by accident. For example, if you save the day of the month (1-31) with a byte and try to assign 1000 to it, this will cause an error.
  • If a variable is used in an array of about 100 or more, I would use a smaller data type if that makes sense.
  • The byte and int16 arrays are not as thread-safe as int (primitive) arrays.

One topic that no one has touched on is the limited processor cache. Smaller programs run faster than larger ones because the processor can accommodate most of the program in faster L1 / L2 / L3 caches.

Using the int type can result in fewer CPU instructions, however, it also means that a higher percentage of data memory will not fit in the CPU cache. Instructions are cheap to execute. Modern processor cores can execute 3-7 instructions per cycle, however, on the one hand, a cache miss can cost 1000-2000 cycles, since it must go to RAM.

When the memory is saved, this also leads to the fact that the rest of the application works better, since it is not squeezed out of the cache.

I ran a quick sum test with random access to random data using both a byte array and an int array.

const int SIZE = 10000000, LOOPS = 80000; byte[] array = Enumerable.Repeat(0, SIZE).Select(i => (byte)r.Next(10)).ToArray(); int[] visitOrder = Enumerable.Repeat(0, LOOPS).Select(i => r.Next(SIZE)).ToArray(); System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch(); sw.Start(); int sum = 0; foreach (int v in visitOrder) sum += array[v]; sw.Stop(); 

Here are the results in time (tics): (x86, release mode, without debugger, .NET 4.5, I7-3930k) (the smaller the better)

 ________________ Array Size __________________ 10 100 1K 10K 100K 1M 10M byte: 549 559 552 552 568 632 3041 int : 549 566 552 562 590 1803 4206 
  • random access to 1M elements using bytes on my processor increased performance by 285%!
  • Anything under 10,000 was barely noticeable.
  • int has never been faster than a byte for this basic sum test.
  • These values ​​will vary for different processors with different cache sizes.

One final note: sometimes I look at the open source .NET platform to find out what Microsoft experts are doing. The .NET Framework uses byte / int16 surprisingly little. I could not really find.

+15
Aug 09 '15 at 0:54
source share

You will have to deal with several rows of BILLION before this will have a significant difference in storage volumes. Suppose you have three columns, and instead of using a byte equivalent database type, you use the int equivalent.

This gives us 3 (columns) x 3 (bytes optional) for each row or 9 bytes in a row.

This means that for "a few million rows" (say, three million) you consume only 27 megabytes of disk space! Fortunately, since we no longer live in the 1970s, you don’t have to worry about that :)

As mentioned above, stop microoptimization - the performance obtained by converting to / from different integer numeric types will hit you much harder than the cost of bandwidth / disk space if you are not dealing with very, very, very large datasets.

+8
Jul 18 '09 at 20:57
source share

For the most part, no.

If you do not know in advance that you will be dealing with 100 million lines, this is micro-optimization.

Do what works best for your domain model. Later, if you have performance issues, a breakpoint and a profile to indicate where they occur.

+7
Jul 08 '09 at 11:36
source share

It’s not that I didn’t believe Jon Grant and others, but I had to make sure of our “millionth table” myself. The table has 1 018 000. I converted 11 rows of tinyint and 6 small columns to int, there were already 5 int and 3 smalldatetimes. 4 different indexes used a combination of different data types, but obviously new indexes now all use int columns.

Making changes only cost me 40 mb, calculating the use of the base table without using indexes. When I added that the indices in the total change were only 30 mb of the difference as a whole. Therefore, I was surprised because I thought that the size of the index would be larger.

Thus, 30 MB is worth using all the different data types, No Way! I am leaving for the country of INT, thank you all for making this anal retentive programmer return to a happy, happy life, not exceeding whole conversions ... yippeee!

+5
Jul 21 '09 at 8:11
source share

If int is used everywhere, casting or conversion is not required. This is a bigger hit for the dollar than the memory you save using several integer sizes.

It just makes life easier.

+4
Jul 18 '09 at 20:18
source share

.NET runtime is optimized for Int32. See the previous discussion in .NET Integer vs Int16?

+4
Jul 18 '09 at 20:35
source share



All Articles