Skip to content

[Python] How To Replace The Value Of Pandas DataFrame

Last Updated on 2022-08-04 by Clay

In the data processing, sometimes we want to replace some column values into others. It may be missing value, or the wrong value…, of course, python have many tools, packages, functions can do the replacement tasks.

Today I want to record, how to replace the specific column values in the pandas DataFame data structure.


replace()

There is a replace() built in pandas DataFrame. Assume we have some data likes:

# coding: utf-8
import pandas as pd


def main():
    df = pd.DataFrame(
        [[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]],
        columns=["A", "B", "C"],
    )

    print(df.head())


if __name__ == "__main__":
    main()


Output:

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9


So, if we want to replace all values in C column, such as change to zero:

# coding: utf-8
import pandas as pd


def main():
    df = pd.DataFrame(
        [[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]],
        columns=["A", "B", "C"],
    )

    df["C"].replace({3: 0, 6: 0, 9: 0}, inplace=True)

    print(df.head())


if __name__ == "__main__":
    main()


Output:

   A  B  C
0  1  2  0
1  4  5  0
2  7  8  0


Use customize rule to replace

Then, maybe you want to say, the values you want to replace are more complex.

Many people may find, the raw value and the new value we just use a dict data structure to store. It means if we can write a rule to build a dictionary to map old value to new value, we can easily pass the dictionary to the DataFrame.

Suppose I want to multiply all the values in column C by 2, but the maximum value is 15, no more, how should I do it?

# coding: utf-8
from typing import Dict
import pandas as pd


def replace_map(df) -> Dict:
    replace_values = dict()

    for index, row in df.iterrows():
        replace_values[row["C"]] = min(15, row["C"]*2)

    return replace_values



def main():
    df = pd.DataFrame(
        [[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]],
        columns=["A", "B", "C"],
    )

    df["C"].replace(replace_map(df), inplace=True)

    print(df.head())


if __name__ == "__main__":
    main()


Output:

   A  B   C
0  1  2   6
1  4  5  12
2  7  8  15


I cannot guarantee this is the best way, but it solves my problem nicely so far, If there is a better way, please let me know at any time. I’d like to learn better ways!


References


Read More

Leave a Reply