This article shows how to initialize arrays in a C program with values from text files.

This article shows how to initialize arrays in a C program with values from text files. The data is not stored in the source files. The files are read when the program is compiled. One dimensional and multi-dimensional arrays are considered. Examples also show how to control putting arrays in RAM or non-volatile memory and selecting which data files to use for initialization.

The compiler used for the examples is GCC for ARM with a 32-bit microcontroller as the target.  All the examples use standard C and worked with this compiler.

 

Basics of Initializing an Array

An array can be initialized with values when it is “declared”. A typical declaration is shown here. The values within the curly braces are called “initializers”. 


If the size of the array is not specified inside the brackets, the size will be the number of initializers. If there are fewer initializers than the size of the array, the extra elements are set to 0. It is an error to have more initializers than the size of the array.

 

White Space

Initializers must be separated with commas. Adding “white space” is OK. In this case, the white space is “blanks” or spaces. The set of white space characters include blank (or space), tab, newline, carriage return, vertical tab, and form feed. Newline and carriage return are used to indicate the end of a line in C source code. I know form feed but vertical tab?

In general, C does not care if a statement contains white space or is continued on another line. The statement here is equivalent to the one above. It is common to see many, many lines of initializers for large arrays. Maybe even pages. At some point, we might say, “Is there a better way?”


 

Initializing an Array from a File

C source code is run through a preprocessor before compilation. A commonly used feature of C preprocessors is “file inclusion”. Here is a quote from the famous book “The C Programming Language” by Kernighan and Ritchie.

“File inclusion makes it easy to handle collections of #defines and declarations (among other things).”

I added the italics for “among other things”. While we commonly include “.c” and “.h” files, the preprocessor does not care about the name extension of a file. Any text file is OK. So, the following syntax works to initialize an array.


 

The file must not contain any special characters which are sometimes hidden for formatting a document. Keep it simple. No rich text format. No column headers. Only numbers, commas, and white space. Here is a file created with Windows Notepad.


 

Here is the array in memory shown with a debugger. In this case, the array is in RAM as indicated by the high addresses in the Location column.


 

Storing an Array in Non-Volatile Memory and Selecting a Data File

In the example above, the array is a global variable and nothing specifies where to put the array. The compiler and linker assume the array may be modified by the program, and it is placed in RAM. The initial values are in non-volatile memory (“NVM”, typically Flash memory), and the array in RAM is initialized from this data by code which runs before the main program. This data in NVM is not accessed by the program. If the array will not be modified (it is a “constant”) it is put only in NVM and accessed directly by the program. This saves RAM which is often in short supply. Telling the compiler and linker that an array is not going to be changed and to locate it in NVM is typically done with the “const” qualifier. Here is an example and a look at the result. The Location column shows it low in the memory map which for this microcontroller is Flash memory.


The #define and #if preprocessing statements can be used to give options for locating the array and selecting which data files are used for initialization. Here is an example which gives a choice of locating an array in RAM or NVM.


The #if construct is an example of “conditional inclusion”. In this case, it controls if the “const” qualifier is used when declaring the array. It works because the declaration can be on more than one line or, said another way, white space is OK.

 

Here is an example of using conditional inclusion to select the file for initialization.


 

Testing with a Large Array

I had a large file of random data depicting a noise waveform and used it to test initialization of a large array in NVM. Here is a plot of the data and the declaration.



 

Here is the beginning of the file.


The original csv file did not have the comma after the values. These were easily added by using an editor which could use expressions in Find/Replace operations. In this case, I used the expression for a line delimiter, “\R”. The Find was “\R” and the Replace was “,\R”. One Find/Replace operation added all the commas for 10,000 values.

 

Everything worked great and compiled very fast! Here is the beginning of the array in memory. The debugger nicely broke up the display into groups of 100 elements each.


 

Multi-Dimensional Arrays

What if the data is organized in two or more dimensions? Let’s look at a two-dimensional array declared as uint16_t test[2][3]. In C, the right subscript (3) is a one-dimensional array with elements contiguous in memory. The left subscript (2) means there are two of these three-element arrays. This is the memory arrangement of the six elements:

 

[0,0] [0,1] [0,2] [1,0] [1,1] [1,2]

 

The ordering in memory is important because accessing consecutive elements in memory by incrementing the right subscript is faster than accessing elements by incrementing the left subscript which requires “hops” through memory. If the array held two vectors of 1,000 elements, the organization should be test[2][1000] for the fastest access. 

 

Here is an example of initializing a two-dimensional array. Notice the initializers are grouped with additional curly braces grouping the initializers for the one-dimensional arrays of the right subscript.


This format creates a problem for a data file which can only have numbers, commas, and white space. What happens if the additional curly braces are omitted?


The compiler fills the array by going left-to-right through the initializers with the right subscript filling first. The compiler I am using gives a warning: “missing braces around initializer”. There is no problem if the number of initializers is exactly the same as the number of elements in the array. However, if not equal it is not clear how to fill the array if there are no curly braces to act as guides.

 

The array can be filled from multiple files with multiple #include statements. Here is an example where the initialization is completely bracketed with pairs of curly braces. I leave out the details shown in previous examples.


 

Initializing Arrays in Unions

A union is a variable which can hold objects of different types that share the same memory and the compiler keeps track of the objects as if they were different things. This arrangement could be useful for an embedded application short of memory. Here is an example with vector[6] with one dimension and matrix[2][3] with two dimensions. They are two arrays which occupy the same locations in memory. 


The rule for initializing a union is the first thing in the union (vector[6]) is filled with the initializers. If the order of the arrays were reversed, the compiler gives a warning because the initializers are not completely bracketed with curly braces. Notice the curly braces around the #include are doubled. I think the outer set encloses any initializers for the union and the inner set is for an array type.

 

Here is the file. I have two rows but it doesn’t matter. Just more white space.


Here is the array in memory. Notice the starting Location of vector[ ] and matrix[ ][ ] are the same.


Are there other ways to initialize multi-dimensional arrays from a single file with only numbers, commas, and white space? Please tell us by adding a comment.

 

Bonus Tip: Strings

What about strings? Here is an example of initializing a string.


An #include within the quotation marks does not work. My editor, which is aware of C syntax, gives me lots of question marks and squiggly underlines. Characters for the new lines and the #include itself are initializers! The poor editor is confused. This mess compiles but the string is filled with the characters we see here and not from the file.


The solution is to put the quotation marks in the file.


Then, use a statement like this one.


Note, the quotations marks around the file name are part of the #include syntax and do not control initializers. Here is the result in RAM.

 

 


 

It's important to note that the examples should all work in theory with any compiler. However, some examples may be uncommon and could tickle issues with some compilers. Please let us know in the comments if you find a problem.

 

Comments

3 Comments


  • Bianor Neto 2019-10-04

    Nice tutorial, thanks!

  • chunghsienlee 2019-10-04

    I learned something,

    Thank you!

  • randika2000e 2019-10-12

    It is quite interesting explanation,  I have learnt some new thing with this tutorial. furthermore could you please add an another tutorial which is explain about how to link external memory (FLASH/EEPROM communicating via SPI or I2C) convolution with little MCU (as well as not an ARM architect) program memory. (it’s better If you would able do an example with using free compiler such as arduino IDE)
    Thank you so much sharing your knowledge!