Your Sitemap Already Has All Your Post URLs — Here's How to Extract Them — WordsByEkta🌿
WordsByEkta🌿
MS Word · Blogger Tips · SEOHow to Extract All Your Blog Post URLs from Your Sitemap — Using Only MS Word
No tools, no plugins, no coding. Just Find & Replace. Learn it once, use it forever.
1. What Is a Sitemap?
A sitemap is a file that keeps a list of every page and post on your blog — all in one place, in one format. It is written in XML (a structured text format) and is mainly meant for search engines like Google and Bing, so they can find and crawl your blog easily.
For a Blogger blog, your sitemap is usually available at:
https://yourblog.blogspot.com/sitemap.xml https://yourcustomdomain.com/sitemap.xml
Like mine is:
https://wordsbyektaa.blogspot.com/sitemap.xml
When you open that file, it looks something like this:
<urlset> <url> <loc>https://wordsbyektaa.blogspot.com/2026/03/indexnow-blogger-make-automation.html</loc> <lastmod>2026-03-12T06:27:53Z</lastmod> </url> <url> <loc>https://wordsbyektaa.blogspot.com/2026/03/is-bug-bounty-actually-possible-for.html</loc> <lastmod>2026-02-28T09:14:22Z</lastmod> </url> <url> <loc>https://wordsbyektaa.blogspot.com/2025/07/letters-for-quiet-moments-collection-of.html</loc> <lastmod>2026-02-28T09:14:22Z</lastmod> </url> </urlset>
Every post URL is sitting inside <loc> and </loc> tags, with a last-modified timestamp next to it. Our job is simple: keep only the URLs, remove everything else.
2. Why Would You Need a List of All Your URLs?
Instead of copying post links one by one from your blog (imagine doing that for 150 posts!), your sitemap already has every URL ready. Here is what you can do with a clean URL list:
- Submit URLs to Google Search Console — get new posts indexed faster, one by one or in bulk
- Submit to Bing Webmaster Tools — paste up to 500 URLs at once in their URL submission box
- Create a blog index page — show all your posts in one place for your readers
- Check for broken links — paste the list into a link-checker tool to find any 404 errors
- Content audit in a spreadsheet — import into Excel or Google Sheets and analyze your archive
- Share in a newsletter or social media — send your full archive to subscribers
- Backup and documentation — keep a record of every post you have ever published
If your blog has 50, 100, or 200+ posts, copying links manually is simply not practical. Your sitemap is the shortcut — and MS Word is the only tool you need to process it.
3. How to Open Your Sitemap in MS Word
https://yourblog.com/sitemap.xmlThe browser will display the raw XML content.
<loc>, </loc>, <lastmod>, timestamps, and other tags all mixed together. That is completely normal — this is what we will clean up now.<?xml version="1.0"?><urlset xmlns="..."><url><loc>Use your mouse to select everything before your first
https:// and press Backspace to delete it. You can also press Ctrl + Home to jump to the top, then click and drag to select the junk, and delete.After this, your document should start directly with
https://...
4. Method A — The Backslash Trick 🏆 (Recommended)
This is the fastest and cleanest approach. Just one Find & Replace operation extracts every URL and strips out the tags and timestamps all at the same time. It only requires Wildcards to be turned ON.
In Word's Wildcards mode, the
< and > characters are special — they mean "start of word" and "end of word" in wildcard syntax. So if you type <loc> with Wildcards ON, Word does not look for the literal tag — it misreads it.The fix: put a backslash
\ before each angle bracket. Writing \<loc\> tells Word: "find this literally, character by character." The backslash is an escape character — it turns off the special meaning of the next character.
Open Find & Replace: Ctrl + H
Click "More >>" at the bottom left → tick "Use wildcards".
The Formula
| Field | What to Type |
|---|---|
| Find What | \<loc\>(*)\</loc\>*\<url\>\<loc\> |
| Replace With | \1^p |
| Use Wildcards | ON ✓ |
Click Replace All.
\<loc\> — finds the literal opening <loc> tag (backslashes escape the angle brackets)(*) — the * means "anything" — this captures your URL; the parentheses save it as Group 1\</loc\> — finds the literal closing </loc> tag* — matches everything in between (the timestamp, </url>, <url>) — all discarded\<url\>\<loc\> — matches the opening of the next URL entryIn the Replace:
\1 puts back the captured URL (Group 1), and ^p adds a new line after it.
</url></urlset> — just manually select and delete those two tags at the end. It takes two seconds.5. Method B — Step-by-Step Cleanup
If Method A does not work, or if you simply prefer to see exactly what is happening at each stage, this approach removes one type of content at a time. Every step is straightforward.
Open Find & Replace with Ctrl + H before each step.
Step 1 — Remove the opening <loc> tag
| Field | Value |
|---|---|
| Find What | <loc> |
| Replace With | (leave blank) |
| Use Wildcards | OFF |
Step 2 — Remove the closing </loc> tag
| Field | Value |
|---|---|
| Find What | </loc> |
| Replace With | (leave blank) |
| Use Wildcards | OFF |
Step 3 — Remove the opening <lastmod> tag
| Field | Value |
|---|---|
| Find What | <lastmod> |
| Replace With | (leave blank) |
| Use Wildcards | OFF |
Step 4 — Remove the closing </lastmod> tag
| Field | Value |
|---|---|
| Find What | </lastmod> |
| Replace With | (leave blank) |
| Use Wildcards | OFF |
Step 5 — Remove the year from the timestamp (Wildcards ON)
| Field | Value |
|---|---|
| Find What | [0-9][0-9][0-9][0-9]- |
| Replace With | (leave blank) |
| Use Wildcards | ON ✓ |
Step 6 — Remove the month and day
| Field | Value |
|---|---|
| Find What | [0-9][0-9]-[0-9][0-9]T |
| Replace With | (leave blank) |
| Use Wildcards | ON ✓ |
Step 7 — Remove the time (HH:MM:SSZ)
| Field | Value |
|---|---|
| Find What | [0-9][0-9]:[0-9][0-9]:[0-9][0-9]Z |
| Replace With | (leave blank) |
| Use Wildcards | ON ✓ |
Step 8 — Replace the XML separators between entries with a new line
| Field | Value |
|---|---|
| Find What | </url><url><loc> |
| Replace With | ^p |
| Use Wildcards | OFF |
^p? In Word's Find & Replace, ^p means "paragraph break" — the same as pressing Enter. When you type ^p in the Replace With box, Word inserts a new line at every match. This is what puts each URL on its own line.6. What the Final Result Looks Like
After either method, your document will look like this — one clean URL per line, nothing else:
https://wordsbyektaa.blogspot.com/2025/07/103-free-places-to-submit-personal.html https://wordsbyektaa.blogspot.com/2025/07/how-to-set-up-your-blogger-about-me-or.html https://wordsbyektaa.blogspot.com/2025/07/in-world-of-shiny-promises-be-soulful.html https://wordsbyektaa.blogspot.com/2025/07/chapter-1-exploring-significance-of.html https://wordsbyektaa.blogspot.com/2025/07/week-wise-skincare-plan-for-beginners.html ...
No tags, no timestamps, no XML clutter. This list is now ready to use directly:
- Paste into Google Search Console → URL Inspection to request indexing
- Paste into Bing Webmaster Tools → Submit URLs for bulk submission
- Import into Excel or Google Sheets for a content audit
- Run through a broken link checker to find any 404 errors
- Use as the source for an All Posts index page on your blog
🗂️ Quick Reference — Both Methods
\<loc\>(*)\</loc\>*\<url\>\<loc\>
→ Replace: \1^p
Words by Ekta · MS Word Tips · Blogger Series
If any step gives you trouble, drop a comment — happy to help! 😊
Comments
Post a Comment