Last Updated on 2022-08-04 by Clay
In the data processing, sometimes we want to replace some column values into others. It may be missing value, or the wrong value..., of course, python have many tools, packages, functions can do the replacement tasks.
Today I want to record, how to replace the specific column values in the pandas DataFame data structure.
replace()
There is a replace()
built in pandas DataFrame. Assume we have some data likes:
# coding: utf-8
import pandas as pd
def main():
df = pd.DataFrame(
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
columns=["A", "B", "C"],
)
print(df.head())
if __name__ == "__main__":
main()
Output:
A B C
0 1 2 3
1 4 5 6
2 7 8 9
So, if we want to replace all values in C column, such as change to zero:
# coding: utf-8
import pandas as pd
def main():
df = pd.DataFrame(
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
columns=["A", "B", "C"],
)
df["C"].replace({3: 0, 6: 0, 9: 0}, inplace=True)
print(df.head())
if __name__ == "__main__":
main()
Output:
A B C
0 1 2 0
1 4 5 0
2 7 8 0
Use customize rule to replace
Then, maybe you want to say, the values you want to replace are more complex.
Many people may find, the raw value and the new value we just use a dict data structure to store. It means if we can write a rule to build a dictionary to map old value to new value, we can easily pass the dictionary to the DataFrame.
Suppose I want to multiply all the values in column C by 2, but the maximum value is 15, no more, how should I do it?
# coding: utf-8
from typing import Dict
import pandas as pd
def replace_map(df) -> Dict:
replace_values = dict()
for index, row in df.iterrows():
replace_values[row["C"]] = min(15, row["C"]*2)
return replace_values
def main():
df = pd.DataFrame(
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
columns=["A", "B", "C"],
)
df["C"].replace(replace_map(df), inplace=True)
print(df.head())
if __name__ == "__main__":
main()
Output:
A B C
0 1 2 6
1 4 5 12
2 7 8 15
I cannot guarantee this is the best way, but it solves my problem nicely so far, If there is a better way, please let me know at any time. I'd like to learn better ways!
References
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html
- https://www.geeksforgeeks.org/python-pandas-dataframe-replace/