Last Updated on 2021-08-02 by Clay
The csv file (Comma-Separated Values) is a very convenient format for displaying data. It is a plain text format for storing table data. This article mainly introduces how to read and write by Python.
The famous word processing software Excel has two most common file extensions, one is xlsx and the other is csv. The meaning of "Comma-Separated Values" may not be easy to understand just by looking at the explanation, let's look at an example directly.
The picture above is a csv file I created and opened with Excel. Now I use Notepad++ to open it.
We can see that there is only one "," between the text and the number to separate them as data for different lines. In this way, do you have a little understanding of what is meant by "Comma-Separated values" ?
Let's enter today's topic. I will record how to use Python to read and write csv files. This is one of the indispensable skills in various data analysis fields.
Read and write csv file
We will use the IMDB data set as an example. IMDB is a classic movie review data set, which is often used to test the classification model (because there are two labels "positive" and "negative")
We can download the IMDB data set here: https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews/download
After downloading and decompressing, we should get the IMDB csv file. I renamed it to imdb.csv here.
The reading method is very simple:
# -*- coding: utf-8 -*- import csv with open('imdb.csv', newline='', encoding='utf-8') as csvfile: rows = csv.reader(csvfile) for row in rows: print(row)
There are too many output items, so I won't list them here. You can look at the printed items. Basically, there are two elements in a row, the former is a movie review, and the latter is a positive or negative label.
In addition to this reading method, in fact, we can also use the name at the beginning of each line to print out the data.
Here is a brief demonstration, basically the two categories are review and sentiment.
# -*- coding: utf-8 -*- import csv with open('imdb.csv', newline='', encoding='utf-8') as csvfile: rows = csv.DictReader(csvfile) for row in rows: print(row['sentiment'])
Here we only choose to print out the sentiment part, so our output results will only see positive and negative two types.
After reading, let's see how to write.
# -*- coding: utf-8 -*- import csv with open('test.csv', 'w', newline='', encoding='utf-8') as csvfile: rows = csv.writer(csvfile) rows.writerow(['Today', 'is', 'a', 'nice', 'day', '.']) rows.writerow(['Today', 'is', 'a', 'bad', 'day', '.'])
Output:
Just as easy!
Supplement
If you want to organize the data into a csv file, as mentioned earlier, csv file is actually just "Comma-Separated Values". Think about it this way, in fact, we can directly use Python strings to archive csv file.
# -*- coding: utf-8 -*- a = ['Today', 'is', 'a', 'nice', 'day', '.'] b = ['Today', 'is', 'a', 'bad', 'day', '.'] csvFile = '' ab = [a, b] for line in ab: for word in line: csvFile += '{},'.format(word) csvFile += '\n' open('test2.csv', 'w', encoding='utf-8').write(csvFile)
Output:
This storage is quite intuitive. The above is the experience notes on how to read and write csv file in Python.
References
- https://docs.python.org/3/library/csv.html
- https://realpython.com/python-csv/
- http://zetcode.com/python/csv/