Lesson 8: CSV Files: Reading and Writing

While working with machine learning projects in Python, you will need to work with CSV and other files: both creating them and reading from them. In this lesson, let's learn how to do it.


Reading And Writing Files

Writing the classic way

In other languages, it is common to follow these steps:

    1. open file descriptor
    1. read from or write to file
    1. close the file

That is precisely what this Python code does.

file = open('document.txt', 'w', newline='')
file.write('My paragraph')
file.close()

But there is no way it is perfect. One of the concerns is that if write() fails, the file will not be properly closed and can cause unexpected issues.

We could handle that Exception, but there's a better way!

Writing with the context manager

The context manager is implemented using the with statement along with the open() function for file operations and is recommended practice:

with open('document.txt', 'w', newline='') as file:
file.write('My paragraph')

The primary reason is that it helps manage the opening and closing of the file automatically, even if an Exception occurs during the execution of the code.

Reading from the file

It is as easy to read files like writing:

with open('secret.txt', 'r') as file:
print(file.read())
 
# nothing to see there

Built-in open(path, mode) function returns a file object we can interact with. The second string is optional, but you can specify the mode in which the file should be opened.

The common mode values of the open() function are:

Character Meaning
'r' open for reading (default)
'w' open for writing, truncating the file first
'x' open for exclusive creation, failing if the file already exists
'a' open for writing, appending to the end of the file if it exists

Generate, Write, And Read The CSV File

Now, let's apply all our knowledge and extend our first module that we created in the previous lesson.

If you skipped that, create a new salary_generator.py file in your project's root directory:

salary_generator.py

import random
import csv
 
 
def generate_salary_by_experience(years_experience):
base_salary = 200 * years_experience + 2000
 
return base_salary + random.randint(1, 199)
 
 
def entry_generator(num_rows=100):
i = 0
 
while i < num_rows:
years_of_experience = random.randint(1, 10)
 
yield years_of_experience, generate_salary_by_experience(years_of_experience)
i += 1
 
 
def generate_csv_data(num_rows=100):
header = 'years_of_experience', 'salary'
data = list(entry_generator(num_rows))
 
return [header] + data
 
 
def write_csv(filename, data):
with open(filename, 'w', newline='') as file:
writer = csv.writer(file)
writer.writerows(data)
 
 
def read_csv(filename):
with open(filename, 'r') as file:
reader = csv.reader(file)
rows = [row for row in reader]
 
return rows

If you plan to reuse your module, separating concerns into smaller functions is essential. It would be no use of one mega-function.

Let's review the functions:

  • generate_salary_by_experience - generates only a single salary.
  • entry_generator - generator function that generates a given amount of tuples (years, salary), can be used in for, comprehensions, cast to list().
  • generate_csv_data - returns the complete generated data set with headers.
  • write_csv - writes CSV file with the data provided
  • read_csv - reads CSV file and returns all data as a list

Then, we can import the required functions in our main.py file to read, write, and generate data.

main.py

from salary_generator import generate_csv_data, write_csv, read_csv
 
if __name__ == '__main__':
filename = 'salaries_100.csv'
 
# write data
data = generate_csv_data(100)
 
write_csv(filename, data)
 
# read data
data2 = read_csv(filename)
print(data2)

Output:

[['years_of_experience', 'salary'], ['7', '3540'], ['7', '3576'], ...

Reading CSV with Pandas

If you work with machine learning projects, you will likely use the library called pandas. It will help you read the CSV file contents with one line, structuring it properly for modeling.

import pandas as pd
 
dataFrame = pd.read_csv('salaries.csv')

We will cover other functions of the pandas library in other courses and tutorials, specifically related to machine learning.


So, that's it for this introductory course about basic Python, targeted mostly at PHP developers.

I hope it will help you when writing/reading the code in the upcoming ML courses. See you inside of those!


Tobias Platen avatar

In Windows, with open(filename, 'w', newline='') as file: must be used. Without newline='', a blank line will be appended after each line.

def write_csv(filename, data):
with open(filename, 'w', newline='') as file:
writer = csv.writer(file)
writer.writerows(data)

Povilas avatar

Thank you for the correction, will update the tutorial!

Muhammad Uzair avatar

Great!