  # Data Analysis - reading text files and processing them with Matlab

In this article, we're going to read text files with Matlab, perform data analysis or processing, and finally we are going to write out our results to another text file. The procedure is easily adaptable to many situations.

Let's assume that we have 3 text files (it could be hundreds). They all have to have the same format, and have to have a basic file name, with numbered tag endings (otherwise it is harder to automate the reading process).

For example, we have files 'data_sheet1.txt', 'data_sheet2.txt' and 'data_sheet3.txt'.

So, the basic file name is 'data_sheet'; then, the numbered tag is 1, 2 or 3, respectively, and they all end with the '.txt' extension.

Let the content for each file be something simple, for example, for 'data_sheet1.txt'  the hypothetical content is:

dummy line 1
dummy line 2
dummy line 3

1 1234
2 2345
3 4320
4 4567
5 9876

This file has four text lines (three dummy lines and one blank line) at the beginning, and then the real data in two columns.

In our case, the content for 'data_sheet2.txt'  is:

dummy line 1
dummy line 2
dummy line 3

1 12340
2 23452
3 43203
4 45674
5 98765

and the content for 'data_sheet3.txt' is

dummy line 1
dummy line 2
dummy line 3

1 123
2 234
3 432
4 456
5 987

Note that all the three files have four text lines at the beginning and all of them have the relevant data in the same format, with the same number of elements (two columns and five rows). The number of columns or rows is not relevant for our purpose, but the files have to keep the same format or structure.

We are going to use Matlab functions 'fopen', 'textscan' and 'num2str' to read data from all those '.txt' files (it's a good idea if you investigate those three functions a little bit, but I'll give you the recipe).

We are not interested in the four text lines at the beginning of the files, and we want to read the first column of the first file (which is the same for all the files, let's say for identification purposes) and the second column of each of the files, so, we want to end with something like

1        1234       12340         123
2        2345       23452         234
3        4320       43203         432
4        4567       45674         456
5        9876       98765         987

In this way, we now have the information in one matrix, and we can do data analysis thereafter.

This is the function that I propose to read the files. You have two input parameters (the base file name and the number of files to read) and one output (one cell array with your relevant data). Fair, isn't it?

To automatically change the name of the file we use an array in this form:

[BaseFile num2str(i) '.txt']

This array concatenates the string BaseFile name (input parameter) with a counting number (by changing the iteration counter into a string), and then concatenates the '.txt' extension.

For the first file, the idea could be represented by:

[BaseFile '1' '.txt'], or better [BaseFile '1.txt']

The full code would be:

function R = get_data(BaseFile, n)
% Open the first file
d(1) = fopen([BaseFile '1.txt']);
R = textscan(d(1), '%f %f', 'headerLines', 4);
% Close the file, you don't need it any longer
fclose(d(1));

for i = 2 : n

% Open consecutively each of the remaining files
d(i) = fopen([BaseFile num2str(i) '.txt']);

% Skip the first column of the new file (an '*' to do this)     % and keep on building the array
R = [R textscan(d(i), '%*f %f', 'headerLines', 4)];
% Close the file
fclose(d(i));
end

You end with your data in cell array R. Instruction 'textscan' produces a cell array (not an ordinary array) so you have to alter this (only if necessary).

How are you going to use the above function to read text files and process data from Matlab?

This is one suggestion. You may process it the way you want...

clear; clc

% Provide base file name and number of files to be read
BaseFile = 'data_sheet';
n = 3;

% Use the developed function to read data
R = get_data(BaseFile, n);

% Transform your cell array into an ordinary matrix

my_data = cell2mat(R)

At this point 'my_data' is a matrix that has the information as you need it (exactly as shown before).

You can study it, or plot it... or perform data analysis of any kind...

% Calculate the average of all of the columns and show
my_average = mean(my_data)
% Calculate the standard deviation for each column
my_std = std(my_data)
% Calculate the maximum
my_max = max(my_data)
% Calculate the minimum
my_min = min(my_data)

% Arrange your information to be saved
my_results = [my_average' my_std' my_max' my_min']

% Save your 'my_results' matrix in file 'data_out.txt'
save data_out.txt -ascii my_results

Done!
Now, you have a text file with your data analysis or processed information.

From 'Data Analysis' to home

From 'Data Analysis' to 'Matlab Cookbook Menu'  