Skip to content

[WordPress] Get All Published Articles

In the process of using WordPress to build a website, we may have published an uncountable number of articles. So, if one day we need to get information about all published articles, how should we do?

In WordPress, there are many ways to get all article information, such as REST API or using PHP functions to obtain article information. These are all good methods.

But in fact, we can directly export article information from WordPress backend. If you can use some program to delete unnecessary XML tags in the exported information, you can quickly get the information such as title of article.

In below, this article introduce how to use Python Regular Expression to process the XML file exported by WordPress. It takes no more than 10 minutes, which is very convenient and fast. (Note: This is refers to the case where python is already installed.)


Get the number and name of articles

In fact, we can get a lot of information from the exported XML file. In order to facilitate demonstration, only the article number and article title are taken below.


Step 1: Export articles from WordPress backend

First, go to the background of wp-admin backend and select Tools > Export Program from the options on the left.

Then select the exported item. In here, I will only export the published article information, which means that the draft will not be stored in the XML file I exported.

The download speed is usually very fast, after all, it is only a few MB.

Open the downloaded XML file, it should look like the following picture:


Step 2: Use a program to delete unnecessary tags (take Python as an example)

You should find that in this XML file, all article titles have the following format:

<title><![CDATA[TITLE]]></title>


So we can use Regular Expression to extract the title of the aritcles.

# coding: utf-8
import re


def main():
    text = open("YOUR_WORDPRESS_WEBSITE.xml", "r").read()
    matches = re.findall("<title><!\[CDATA\[(.+)\]\]></title>", text)

    for index, item in enumerate(matches, 1):
        print(index, item)


if __name__ == "__main__":
    main()


Output:

In this way, you can get the titles of all published articles. Of course if you want to extract other information, you need to observe the format of the other information in the XML file


References


Read More

Leave a Reply