| Search perlmeme.org | |
| Home » Tutorials » Parsing csv |
In this tutorial you will learn how to parse a simple CSV (comma separated values) file. This is the sort of file produced by spreadsheets and other programs when a text-only, portable format is required for exporting data.
The file we will use in the examples below is a text file called
prospects.csv, and was produced by the OpenOffice.org 'calc' spreadsheet.
"Name","Address","Floors","Donated last year","Contact" "Charlotte French Cakes","1179 Glenhuntly Rd",1,"Y","John" "Glenhuntly Pharmacy","1181 Glenhuntly Rd",1,"Y","Paul" "Dick Wicks Magnetic Pain Relief","1183-1185 Glenhuntly Rd",1,"Y","George" "Gilmour's Shoes","1187 Glenhuntly Rd",1,"Y","Ringo"
The format is the same as that produced by any program that claims to produce a valid CSV file, namely the field delimiter (the thing that separates the fields), is a comma, and textual data is surrounded by double quote characters. Numeric data is typically not quoted.
The Text::CSV module provides functions for both parsing and producing CSV data. However, we'll focus on the parsing functionality here. The following code
sample opens the prospects.csv file and parses each line in turn, printing
out all the fields it finds.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $file = 'prospects.csv';
my $csv = Text::CSV->new();
open (CSV, "<", $file) or die $!;
while (<CSV>) {
if ($csv->parse($_)) {
my @columns = $csv->fields();
print "@columns\n";
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
close CSV;
Running the code produces the following output:
Name Address Floors Donated last year Contact
Charlotte French Cakes 1179 Glenhuntly Rd 1 Y John
Glenhuntly Pharmacy 1181 Glenhuntly Rd 1 Y Paul
Dick Wicks Magnetic Pain Relief 1183-1185 Glenhuntly Rd 1 Y George
Gilmour's Shoes 1187 Glenhuntly Rd 1 Y Ringo
And by replacing the line:
print "@columns\n";
with:
print "Name: $columns[0]\n\tContact: $columns[4]\n";
we can get more particular about which fields we want to output. And while we're at it, let's skip past the first line of our csv file, since it's only a list of column names.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $file = 'prospects.csv';
my $csv = Text::CSV->new();
open (CSV, "<", $file) or die $!;
while (<CSV>) {
next if ($. == 1);
if ($csv->parse($_)) {
my @columns = $csv->fields();
print "Name: $columns[0]\n\tContact: $columns[4]\n";
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
close CSV;
Running this code will give us the following output:
Name: Charlotte French Cakes
Contact: John
Name: Glenhuntly Pharmacy
Contact: Paul
Name: Dick Wicks Magnetic Pain Relief
Contact: George
Name: Gilmour's Shoes
Contact: Ringo
You can use this module to manipulate a csv file as an array of arrays. For instance to access a field on a specific row, you might write:
#!/usr/bin/perl
use strict;
use warnings;
use Tie::CSV_File;
tie my @csvdata, 'Tie::CSV_File', 'prospects.csv';
print "Line 5, column 1: ", $csvdata[4][0] . "\n";
untie @csvdata;
which produces:
Line 5, column 1: Gilmour's Shoes
If you'd prefer to access the fields in your CSV file by name, use Tie:Handle::CSV. Here is some code that displays the Contact and Address field from each data row in the csv file.
#!/usr/bin/perl
use strict;
use warnings;
use Tie::Handle::CSV;
my $fh = Tie::Handle::CSV->new('prospects.csv', header => 1);
while (my $csv_line = <$fh>) {
print $csv_line->{'Contact'} . ": " . $csv_line->{'Address'} . "\n";
}
close $fh;
This produces:
John: 1179 Glenhuntly Rd
Paul: 1181 Glenhuntly Rd
George: 1183-1185 Glenhuntly Rd
Ringo: 1187 Glenhuntly Rd
And now for a slightly surreal twist, (and a demonstration of some of the genuinely novel functionality you'll find on cpan), let's finish with an examination of the DBD::CSV module, which allows us to access data in a csv file with SQL statements. A significant subset of SQL is supported by this module, it's well worth looking at.
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
# Connect to the database, (the directory containing our csv file(s))
my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;");
# Associate our csv file with the table name 'prospects'
$dbh->{'csv_tables'}->{'prospects'} = { 'file' => 'prospects.csv'};
# Output the name and contact field from each row
my $sth = $dbh->prepare("SELECT * FROM prospects WHERE name LIKE 'G%'");
$sth->execute();
while (my $row = $sth->fetchrow_hashref) {
print("name = ", $row->{'Name'}, " contact = ", $row->{'Contact'}. "\n");
}
$sth->finish();
When run, the code above produces this output:
name = Glenhuntly Pharmacy contact = Paul
name = Gilmour's Shoes contact = Ringo
Now for an example that manually overrides the columns names in our table with names of our choosing.
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
# Connect to the database, (the directory containing our csv file(s))
my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;");
# Associate our csv file with the table name 'prospects' and
# manually declare names for each of the columns
$dbh->{'csv_tables'}->{'prospects.csv'} = {
'col_names' => ["name", "address", "floors", "donated", "contact"]
};
# Output the name and number of floors using our column names
my $sth = $dbh->prepare("SELECT * FROM prospects.csv WHERE name LIKE 'G%'");
$sth->execute();
while (my $row = $sth->fetchrow_hashref) {
print("name = ", $row->{'name'},
", Number of floors = ", $row->{'floors'}, "\n");
}
$sth->finish();
which produces:
name = Glenhuntly Pharmacy, Number of floors = 1
name = Gilmour's Shoes, Number of floors = 1
perldoc Text::CSV (See the CAVEATS section at the end of the file
for a working description of the commaa separated values format.
perldoc Text::CSV_XS
--
Tie::Handle::CSV