Your Sitemap Already Has All Your Post URLs — Here's How to Extract Them — WordsByEkta🌿

WordsByEkta🌿

How to Extract All Your Blog Post URLs from Your Sitemap — Using Only MS Word

No tools, no plugins, no coding. Just Find & Replace. Learn it once, use it forever.


1. What Is a Sitemap?

A sitemap is a file that keeps a list of every page and post on your blog — all in one place, in one format. It is written in XML (a structured text format) and is mainly meant for search engines like Google and Bing, so they can find and crawl your blog easily.

For a Blogger blog, your sitemap is usually available at:

https://yourblog.blogspot.com/sitemap.xml
https://yourcustomdomain.com/sitemap.xml

Like mine is:

https://wordsbyektaa.blogspot.com/sitemap.xml

When you open that file, it looks something like this:

<urlset>
  <url>
    <loc>https://wordsbyektaa.blogspot.com/2026/03/indexnow-blogger-make-automation.html</loc>
    <lastmod>2026-03-12T06:27:53Z</lastmod>
  </url>
  <url>
    <loc>https://wordsbyektaa.blogspot.com/2026/03/is-bug-bounty-actually-possible-for.html</loc>
    <lastmod>2026-02-28T09:14:22Z</lastmod>
  </url>
  <url>
    <loc>https://wordsbyektaa.blogspot.com/2025/07/letters-for-quiet-moments-collection-of.html</loc>
    <lastmod>2026-02-28T09:14:22Z</lastmod>
  </url>
</urlset>

Every post URL is sitting inside <loc> and </loc> tags, with a last-modified timestamp next to it. Our job is simple: keep only the URLs, remove everything else.


2. Why Would You Need a List of All Your URLs?

Instead of copying post links one by one from your blog (imagine doing that for 150 posts!), your sitemap already has every URL ready. Here is what you can do with a clean URL list:

  • Submit URLs to Google Search Console — get new posts indexed faster, one by one or in bulk
  • Submit to Bing Webmaster Tools — paste up to 500 URLs at once in their URL submission box
  • Create a blog index page — show all your posts in one place for your readers
  • Check for broken links — paste the list into a link-checker tool to find any 404 errors
  • Content audit in a spreadsheet — import into Excel or Google Sheets and analyze your archive
  • Share in a newsletter or social media — send your full archive to subscribers
  • Backup and documentation — keep a record of every post you have ever published

If your blog has 50, 100, or 200+ posts, copying links manually is simply not practical. Your sitemap is the shortcut — and MS Word is the only tool you need to process it.


3. How to Open Your Sitemap in MS Word

1
Open your sitemap in a browser Go to your browser and type your sitemap URL:
https://yourblog.com/sitemap.xml
The browser will display the raw XML content.
2
Select everything on the page Press Ctrl + A to select all the XML content.
3
Copy it Press Ctrl + C.
4
Open a new blank document in MS Word Open Word → New Blank Document → press Ctrl + V to paste.
💡
What it looks like after pasting: You will see a long run of text with <loc>, </loc>, <lastmod>, timestamps, and other tags all mixed together. That is completely normal — this is what we will clean up now.
5
Manually delete everything above the first URL At the very beginning of the document you will see something like:
<?xml version="1.0"?><urlset xmlns="..."><url><loc>

Use your mouse to select everything before your first https:// and press Backspace to delete it. You can also press Ctrl + Home to jump to the top, then click and drag to select the junk, and delete.

After this, your document should start directly with https://...

4. Method A — The Backslash Trick 🏆 (Recommended)

This is the fastest and cleanest approach. Just one Find & Replace operation extracts every URL and strips out the tags and timestamps all at the same time. It only requires Wildcards to be turned ON.

Why do we need backslashes here?
In Word's Wildcards mode, the < and > characters are special — they mean "start of word" and "end of word" in wildcard syntax. So if you type <loc> with Wildcards ON, Word does not look for the literal tag — it misreads it.

The fix: put a backslash \ before each angle bracket. Writing \<loc\> tells Word: "find this literally, character by character." The backslash is an escape character — it turns off the special meaning of the next character.

Open Find & Replace: Ctrl + H
Click "More >>" at the bottom left → tick "Use wildcards".

The Formula

FieldWhat to Type
Find What\<loc\>(*)\</loc\>*\<url\>\<loc\>
Replace With\1^p
Use WildcardsON ✓

Click Replace All.

🔍
Breaking down what this formula does:

\<loc\> — finds the literal opening <loc> tag (backslashes escape the angle brackets)
(*) — the * means "anything" — this captures your URL; the parentheses save it as Group 1
\</loc\> — finds the literal closing </loc> tag
* — matches everything in between (the timestamp, </url>, <url>) — all discarded
\<url\>\<loc\> — matches the opening of the next URL entry

In the Replace: \1 puts back the captured URL (Group 1), and ^p adds a new line after it.
⚠️
One thing to tidy up after: This formula handles everything between consecutive URLs. The very last entry will still have a trailing </url></urlset> — just manually select and delete those two tags at the end. It takes two seconds.
⚠️
If this method gives no results: When you paste the sitemap, Word may have broken the XML across multiple lines (paragraphs). In that case, Method A — which relies on everything being on one continuous line — will not match. Switch to Method B below, which handles the content line by line and always works.

5. Method B — Step-by-Step Cleanup

If Method A does not work, or if you simply prefer to see exactly what is happening at each stage, this approach removes one type of content at a time. Every step is straightforward.

Open Find & Replace with Ctrl + H before each step.

Step 1 — Remove the opening <loc> tag

FieldValue
Find What<loc>
Replace With(leave blank)
Use WildcardsOFF

Step 2 — Remove the closing </loc> tag

FieldValue
Find What</loc>
Replace With(leave blank)
Use WildcardsOFF

Step 3 — Remove the opening <lastmod> tag

FieldValue
Find What<lastmod>
Replace With(leave blank)
Use WildcardsOFF

Step 4 — Remove the closing </lastmod> tag

FieldValue
Find What</lastmod>
Replace With(leave blank)
Use WildcardsOFF

Step 5 — Remove the year from the timestamp (Wildcards ON)

FieldValue
Find What[0-9][0-9][0-9][0-9]-
Replace With(leave blank)
Use WildcardsON ✓

Step 6 — Remove the month and day

FieldValue
Find What[0-9][0-9]-[0-9][0-9]T
Replace With(leave blank)
Use WildcardsON ✓

Step 7 — Remove the time (HH:MM:SSZ)

FieldValue
Find What[0-9][0-9]:[0-9][0-9]:[0-9][0-9]Z
Replace With(leave blank)
Use WildcardsON ✓

Step 8 — Replace the XML separators between entries with a new line

FieldValue
Find What</url><url><loc>
Replace With^p
Use WildcardsOFF
💡
What is ^p? In Word's Find & Replace, ^p means "paragraph break" — the same as pressing Enter. When you type ^p in the Replace With box, Word inserts a new line at every match. This is what puts each URL on its own line.

6. What the Final Result Looks Like

After either method, your document will look like this — one clean URL per line, nothing else:

https://wordsbyektaa.blogspot.com/2025/07/103-free-places-to-submit-personal.html
https://wordsbyektaa.blogspot.com/2025/07/how-to-set-up-your-blogger-about-me-or.html
https://wordsbyektaa.blogspot.com/2025/07/in-world-of-shiny-promises-be-soulful.html
https://wordsbyektaa.blogspot.com/2025/07/chapter-1-exploring-significance-of.html
https://wordsbyektaa.blogspot.com/2025/07/week-wise-skincare-plan-for-beginners.html
...

No tags, no timestamps, no XML clutter. This list is now ready to use directly:

  • Paste into Google Search Console → URL Inspection to request indexing
  • Paste into Bing Webmaster Tools → Submit URLs for bulk submission
  • Import into Excel or Google Sheets for a content audit
  • Run through a broken link checker to find any 404 errors
  • Use as the source for an All Posts index page on your blog

🗂️ Quick Reference — Both Methods

Method A Wildcards ON — extracts URLs and removes timestamps in one step
Find: \<loc\>(*)\</loc\>*\<url\>\<loc\>  →  Replace: \1^p
Method B 8 steps — remove each piece separately — always works
Remove tags → remove lastmod tags → remove date → remove time → replace separators with ^p
🎉
Remember: Your sitemap is not just for search engines — it is a powerful tool for you too. Whenever you need a full list of your post URLs, open your sitemap, paste it into Word, and you have every link ready in under a minute. No copy-pasting one by one, ever again.

Words by Ekta · MS Word Tips · Blogger Series
If any step gives you trouble, drop a comment — happy to help! 😊

Comments

Popular posts from this blog

How to Set Up Your Blogger About Me or Profile Page — WordsByEkta🌿

Where Is Danielle DiLorenzo from Survivor Now? Here's all you want to know about her — WordsbyEkta🌿

Explore All — WordsByEkta🌿