Lesson 8: CSV Files: Reading and Writing
While working with machine learning projects in Python, you will need to work with CSV and other files: both creating them and reading from them. In this lesson, let's learn how to do it.
Reading And Writing Files
Writing the classic way
In other languages, it is common to follow these steps:
-
- open file descriptor
-
- read from or write to file
-
- close the file
That is precisely what this Python code does.
file = open('document.txt', 'w', newline='')file.write('My paragraph')file.close()
But there is no way it is perfect. One of the concerns is that if write()
fails, the file will not be properly closed and can cause unexpected issues.
We could handle that Exception, but there's a better way!
Writing with the context manager
The context manager is implemented using the with
statement along with the open()
function for file operations and is recommended practice:
with open('document.txt', 'w', newline='') as file: file.write('My paragraph')
The primary reason is that it helps manage the opening and closing of the file automatically, even if an Exception occurs during the execution of the code.
Reading from the file
It is as easy to read files like writing:
with open('secret.txt', 'r') as file: print(file.read()) # nothing to see there
Built-in open(path, mode)
function returns a file object we can interact with. The second string is optional, but you can specify the mode in which the file should be opened.
The common mode
values of the open()
function are:
Character | Meaning |
---|---|
'r' |
open for reading (default) |
'w' |
open for writing, truncating the file first |
'x' |
open for exclusive creation, failing if the file already exists |
'a' |
open for writing, appending to the end of the file if it exists |
Generate, Write, And Read The CSV File
Now, let's apply all our knowledge and extend our first module that we created in the previous lesson.
If you skipped that, create a new salary_generator.py
file in your project's root directory:
salary_generator.py
import randomimport csv def generate_salary_by_experience(years_experience): base_salary = 200 * years_experience + 2000 return base_salary + random.randint(1, 199) def entry_generator(num_rows=100): i = 0 while i < num_rows: years_of_experience = random.randint(1, 10) yield years_of_experience, generate_salary_by_experience(years_of_experience) i += 1 def generate_csv_data(num_rows=100): header = 'years_of_experience', 'salary' data = list(entry_generator(num_rows)) return [header] + data def write_csv(filename, data): with open(filename, 'w', newline='') as file: writer = csv.writer(file) writer.writerows(data) def read_csv(filename): with open(filename, 'r') as file: reader = csv.reader(file) rows = [row for row in reader] return rows
If you plan to reuse your module, separating concerns into smaller functions is essential. It would be no use of one mega-function.
Let's review the functions:
-
generate_salary_by_experience
- generates only a single salary. -
entry_generator
- generator function that generates a given amount of tuples(years, salary)
, can be used infor
, comprehensions, cast tolist()
. -
generate_csv_data
- returns the complete generated data set with headers. -
write_csv
- writes CSV file with the data provided -
read_csv
- reads CSV file and returns all data as a list
Then, we can import the required functions in our main.py
file to read, write, and generate data.
main.py
from salary_generator import generate_csv_data, write_csv, read_csv if __name__ == '__main__': filename = 'salaries_100.csv' # write data data = generate_csv_data(100) write_csv(filename, data) # read data data2 = read_csv(filename) print(data2)
Output:
[['years_of_experience', 'salary'], ['7', '3540'], ['7', '3576'], ...
Reading CSV with Pandas
If you work with machine learning projects, you will likely use the library called pandas
. It will help you read the CSV file contents with one line, structuring it properly for modeling.
import pandas as pd dataFrame = pd.read_csv('salaries.csv')
We will cover other functions of the pandas
library in other courses and tutorials, specifically related to machine learning.
So, that's it for this introductory course about basic Python, targeted mostly at PHP developers.
I hope it will help you when writing/reading the code in the upcoming ML courses. See you inside of those!
-
- 1. Tools & Your First Python Program
- 2. Python vs PHP: Main Syntax Differences
- 3. Basic Data Types: string / int / float / bool
- 4. Complex Data Types: list / tuple / set / dictionary
- 5. For/While Loops and Comprehensions
- 6. Defining Your Own Functions
- 7. Importing Libraries
- 8. CSV Files: Reading and Writing
In Windows,
with open(filename, 'w', newline='') as file:
must be used. Withoutnewline=''
, a blank line will be appended after each line.def write_csv(filename, data):
with open(filename, 'w', newline='') as file:
writer = csv.writer(file)
writer.writerows(data)
Thank you for the correction, will update the tutorial!
Great!