Data Analysis - reading text files and processing them with Matlab
In this article, we're going to read text files with Matlab,
perform data analysis or processing, and finally we are going to
write out our results to another text file. The procedure is easily
adaptable to many situations.
Let's assume that we have 3 text
files (it could be hundreds). They all have to have the same format,
and have to have a basic file name, with numbered tag endings
(otherwise it is harder to automate the reading process).
For example, we have files 'data_sheet1.txt', 'data_sheet2.txt' and 'data_sheet3.txt'.
So,
the basic file name is 'data_sheet'; then, the numbered tag is 1,
2 or 3, respectively, and they all end with the '.txt' extension.
Let the content for each file be something simple, for example, for 'data_sheet1.txt' the hypothetical content is:
dummy line 1 dummy line 2 dummy line 3
1 1234 2 2345 3 4320 4 4567 5 9876
This file has four text lines (three dummy lines and one blank line) at the beginning, and then the real data in two columns.
In our case, the content for 'data_sheet2.txt' is:
dummy line 1 dummy line 2 dummy line 3
1 12340 2 23452 3 43203 4 45674 5 98765
and the content for 'data_sheet3.txt' is
dummy line 1 dummy line 2 dummy line 3
1 123 2 234 3 432 4 456 5 987
Note
that all the three files have four text lines at the beginning and all
of them have the relevant data in the same format, with the same number
of elements (two columns and five rows). The number of columns or rows
is not relevant for our purpose, but the files have to keep the same
format or structure.
We are going to use Matlab functions
'fopen', 'textscan' and 'num2str' to read data from all those '.txt'
files (it's a good idea if you investigate those three functions a
little bit, but I'll give you the recipe).
We are not interested
in the four text lines at the beginning of the files, and we want
to read the first column of the first file (which is the same for all
the files, let's say for identification purposes) and the second column
of each of the files, so, we want to end with something like
1
1234
12340 123
2
2345
23452 234
3
4320
43203 432
4
4567
45674 456
5
9876
98765 987
In this way, we now have the information in one matrix, and we can do data analysis thereafter.
This
is the function that I propose to read the files. You have two input
parameters (the base file name and the number of files to read) and one
output (one cell array with your relevant data). Fair, isn't it?
To automatically change the name of the file we use an array in this form:
[BaseFile num2str(i) '.txt']
This
array concatenates the string BaseFile name (input parameter) with a
counting number (by changing the iteration counter into a string), and
then concatenates
the '.txt' extension.
For the first file, the idea can be replaced by:
[BaseFile '1' '.txt'], or better [BaseFile '1.txt']
The full code would be:
function R =
get_data(BaseFile, n) % Open the first file d(1) = fopen([BaseFile '1.txt']); % Read the first two columns, skip the first 4
headerlines R = textscan(d(1), '%f %f', 'headerLines', 4); % Close the file, you don't need it any longer fclose(d(1));
for i = 2 : n % Open consecutively each of the remaining files d(i) =
fopen([BaseFile num2str(i) '.txt']); % Skip the first column of the new file (an '*' to do this) % and keep on building the array R = [R
textscan(d(i), '%*f %f', 'headerLines', 4)]; % Close the file fclose(d(i)); end
You end with your data in cell array R. Instruction 'textscan' produces
a cell array (not an ordinary array) so you have to alter this (only if
necessary).
How are you going to use the above function to read text files and process data from Matlab?
This is one suggestion. You may process it the way you want...
% Reset your memory and clear your screen clear; clc
% Provide base file name and number of files to be
read BaseFile = 'data_sheet'; n = 3;
% Use the developed function to read data R = get_data(BaseFile, n);
% Transform your cell array into an ordinary matrix my_data = R{1}; for i = 2 : n+1 my_data =
[my_data R{i}]; end
% Show your data my_data At this point 'my_data' is a matrix that has the information as you need it (exactly as shown before).
You can study it, or plot it or perform data analysis of any kind...% Calculate the average of all of the columns and show my_average = mean(my_data) % Calculate the standard deviation for each column my_std = std(my_data) % Calculate the maximum my_max = max(my_data) % Calculate the minimum my_min = min(my_data)
% Arrange your information to be saved my_results = [my_average' my_std' my_max' my_min']
% Save your my_results matrix in file 'data.txt' save data.txt -ascii my_results
Done! Now, you have a text file with your data analysis or processed information.
From 'Data Analysis' to home From 'Data Analysis' to 'Matlab Cookbook Menu'


|