Parsing CSV files

In this tutorial you will learn how to parse a simple CSV (comma separated values) file. This is the sort of file produced by spreadsheets and other programs when a text-only, portable format is required for exporting data.

The file we will use in the examples below is a text file called prospects.csv, and was produced by the OpenOffice.org 'calc' spreadsheet.

  "Name","Address","Floors","Donated last year","Contact"
  "Charlotte French Cakes","1179 Glenhuntly Rd",1,"Y","John"
  "Glenhuntly Pharmacy","1181 Glenhuntly Rd",1,"Y","Paul"
  "Dick Wicks Magnetic Pain Relief","1183-1185 Glenhuntly Rd",1,"Y","George"
  "Gilmour's Shoes","1187 Glenhuntly Rd",1,"Y","Ringo"

The format is the same as that produced by any program that claims to produce a valid CSV file, namely the field delimiter (the thing that separates the fields), is a comma, and textual data is surrounded by double quote characters. Numeric data is typically not quoted.

Text::CSV

The Text::CSV module provides functions for both parsing and producing CSV data. However, we'll focus on the parsing functionality here. The following code sample opens the prospects.csv file and parses each line in turn, printing out all the fields it finds.

    #!/usr/bin/perl
    use strict;
    use warnings;
    use Text::CSV;

    my $file = 'prospects.csv';

    my $csv = Text::CSV->new();

    open (CSV, "<", $file) or die $!;

    while (<CSV>) {
        if ($csv->parse($_)) {
            my @columns = $csv->fields();
            print "@columns\n";
        } else {
            my $err = $csv->error_input;
            print "Failed to parse line: $err";
        }
    }
    close CSV;

Running the code produces the following output:

    Name Address Floors Donated last year Contact
    Charlotte French Cakes 1179 Glenhuntly Rd 1 Y John
    Glenhuntly Pharmacy 1181 Glenhuntly Rd 1 Y Paul
    Dick Wicks Magnetic Pain Relief 1183-1185 Glenhuntly Rd 1 Y George
    Gilmour's Shoes 1187 Glenhuntly Rd 1 Y Ringo

And by replacing the line:

            print "@columns\n";

with:

            print "Name: $columns[0]\n\tContact: $columns[4]\n";

we can get more particular about which fields we want to output. And while we're at it, let's skip past the first line of our csv file, since it's only a list of column names.

    #!/usr/bin/perl
    use strict;
    use warnings;
    use Text::CSV;

    my $file = 'prospects.csv';

    my $csv = Text::CSV->new();

    open (CSV, "<", $file) or die $!;

    while (<CSV>) {
        next if ($. == 1);
        if ($csv->parse($_)) {
            my @columns = $csv->fields();
            print "Name: $columns[0]\n\tContact: $columns[4]\n";
        } else {
            my $err = $csv->error_input;
            print "Failed to parse line: $err";
        }
    }
    close CSV;

Running this code will give us the following output:

    Name: Charlotte French Cakes
            Contact: John
    Name: Glenhuntly Pharmacy
            Contact: Paul
    Name: Dick Wicks Magnetic Pain Relief
            Contact: George
    Name: Gilmour's Shoes
            Contact: Ringo

Tie::CSV_File

You can use this module to manipulate a csv file as an array of arrays. For instance to access a field on a specific row, you might write:

    #!/usr/bin/perl
    use strict;
    use warnings;

    use Tie::CSV_File;

    tie my @csvdata, 'Tie::CSV_File', 'prospects.csv';
    print "Line 5, column 1: ", $csvdata[4][0] . "\n";
    untie @csvdata;

which produces:

    Line 5, column 1: Gilmour's Shoes

Tie::Handle::CSV

If you'd prefer to access the fields in your CSV file by name, use Tie:Handle::CSV. Here is some code that displays the Contact and Address field from each data row in the csv file.

    #!/usr/bin/perl
    use strict;
    use warnings;

    use Tie::Handle::CSV;

    my $fh = Tie::Handle::CSV->new('prospects.csv', header => 1);

    while (my $csv_line = <$fh>) {
        print $csv_line->{'Contact'} . ": " . $csv_line->{'Address'} . "\n";
    }

    close $fh;

This produces:

    John: 1179 Glenhuntly Rd
    Paul: 1181 Glenhuntly Rd
    George: 1183-1185 Glenhuntly Rd
    Ringo: 1187 Glenhuntly Rd

DBD::CSV

And now for a slightly surreal twist, (and a demonstration of some of the genuinely novel functionality you'll find on cpan), let's finish with an examination of the DBD::CSV module, which allows us to access data in a csv file with SQL statements. A significant subset of SQL is supported by this module, it's well worth looking at.

    #!/usr/bin/perl
    use strict;
    use warnings;
    use DBI;

    # Connect to the database, (the directory containing our csv file(s))

    my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;");

    # Associate our csv file with the table name 'prospects'

    $dbh->{'csv_tables'}->{'prospects'} = { 'file' => 'prospects.csv'};

    # Output the name and contact field from each row

    my $sth = $dbh->prepare("SELECT * FROM prospects WHERE name LIKE 'G%'");
    $sth->execute();
    while (my $row = $sth->fetchrow_hashref) {
        print("name = ", $row->{'Name'}, "  contact = ", $row->{'Contact'}. "\n");
    }
    $sth->finish();

When run, the code above produces this output:

    name = Glenhuntly Pharmacy  contact = Paul
    name = Gilmour's Shoes  contact = Ringo

Now for an example that manually overrides the columns names in our table with names of our choosing.

    #!/usr/bin/perl
    use strict;
    use warnings;
    use DBI;

    # Connect to the database, (the directory containing our csv file(s))

    my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;");

    # Associate our csv file with the table name 'prospects' and
    # manually declare names for each of the columns

    $dbh->{'csv_tables'}->{'prospects.csv'} = {
            'col_names' => ["name", "address", "floors", "donated", "contact"]
    };

    # Output the name and number of floors using our column names

    my $sth = $dbh->prepare("SELECT * FROM prospects.csv WHERE name LIKE 'G%'");
    $sth->execute();
    while (my $row = $sth->fetchrow_hashref) {
        print("name = ", $row->{'name'},
              ",   Number of floors = ", $row->{'floors'}, "\n");
    }
    $sth->finish();

which produces:

    name = Glenhuntly Pharmacy,   Number of floors = 1
    name = Gilmour's Shoes,   Number of floors = 1

See also

    perldoc Text::CSV (See the CAVEATS section at the end of the file
    for a working description of the commaa separated values format.
    perldoc Text::CSV_XS 
    --
    Tie::Handle::CSV
Revision: 1.0 Reviewed:  

[Top]