Extract Blogger Posts to Text: An Alternative to Google Takeout — WordsByEkta🌿
WordsByEkta🌿 · Behind the Blog
The Google Takeout Failure: How I Rescued 95 Posts and Built a Searchable Library from Scratch
Some things break before they get better. This is the story of a failed export, a clever workaround, and a library I now fully own.
The Problem
No One Warns You About This
I have been writing on WordsByEkta for years. Essays on identity, motherhood, feminism, healing — things that took time, thought, and a kind of emotional labour that doesn't show up in word counts. Over time, the blog grew. Posts accumulated. And one day I looked at it and thought: this needs to be organised. Properly.
Not just a list of links somewhere. A real library. Something a reader could walk into and find exactly what they were looking for — whether that was a piece on mom guilt, the Bhagavad Gita, or the peculiar exhaustion of a creative who has too many ideas and not enough hours.
Simple enough plan. Until I tried to actually do it.
Step One
The Official Way Failed
The sensible first move was Google Takeout — the tool Google provides to export your own data from your own blog. I ran it. I waited. The file arrived.
It was an atom file. It contained 13 articles — out of 95.
Thirteen. I sat with that number for a moment. The official export of my own content, from my own blog, returned less than 14% of what existed. There was no error message. No warning. Just a quietly incomplete file and no explanation for where the other 82 posts had gone.
If you have ever felt locked out of something you built yourself, you know exactly what that moment feels like.
The Official Route (If it works for you)
Before trying my JSON trick, here is the official process Google provides. It is a journey through multiple tabs and settings:
- Blogger Settings: Go to Settings > Manage Blog and click Back up content. A dialog box will appear; click Download.
- The Takeout Hand-off: You will be redirected to a new Google Takeout tab. Because you clicked from Blogger, the "Blogger" product is already ticked. Note: The UI shows "multiple formats" (Atom for feeds, JSON for metadata, CSV for reactions) which look like dropdowns but are actually static information.
- The Configuration: Click Next Step.
- Destination: Send download link via email (Auto-selected).
- Frequency: Export once.
- File type & size: .zip and 2 GB.
- The Extraction: Once you receive the email, you'll download a zip (e.g.,
takeout-20260403...zip). Inside, the path is:
Takeout > Blogger > Blogs > [Your Blog Name] >
This folder contains yoursettings.csv,theme-classic.html,theme-layouts.xml,favicon.ico,followers.csv, and thefeed.atomfile where your posts live.
The Warning: This is where I stopped. My feed.atom file only contained 13 of my 95 posts. No explanation. No error. If this happens to you, proceed to Step Two.
Step Two
Finding the Master Key
I did not give up. I started looking for another way in.
Blogger, like most Google products, has a JSON feed — a raw data endpoint that the platform uses internally but rarely advertises to users. The URL pattern is straightforward:
I opened it. The browser filled with a wall of raw JSON — messy, deeply nested, not designed to be read by a human. But it was all there. Every post. Every title. Every URL. 95 entries, not 13.
I copied the entire thing into Notepad and saved it as a text file. That text file became my starting point — the Master File.
Step Three
Where AI Helped — and Where It Didn't
I handed the text file to Gemini as its Source of Truth and asked it to do what seemed logical: read the content and return for each post a title, a set of keywords, and a short description.
It gave me three or four correct ones. Then it started hallucinating.
Names that didn't exist. Topics I had never written about. Descriptions that had the right shape but were pointing at the wrong post entirely. This is a known limitation — when you give a large language model a massive blob of unstructured data and ask it to extract specific meaning from each piece, it starts confabulating. The task is too open, the data too noisy, the structure too loose.
So I stopped fighting the tool's weakness and changed the approach entirely.
Pro Tip: Gathering your URL List
I have been building my "Master Index" manually since Day One, categorizing every article under specific headings as I write. This discipline gave me my list of 95 URLs ready to go.
However, if you are starting this rescue mission after publishing dozens of posts, don't copy them one by one. You can extract every single URL from your Blogger sitemap in seconds using my Microsoft Word Wildcard trick:
The Sitemap Extraction Shortcut →
Step Four
The Pivot That Changed Everything
Using the clean list of URLs I had already curated (see the Pro Tip above), I fed them to Gemini ten at a time. By pointing the AI to specific URLs within my raw JSON text file, I removed the "noise" and focused its attention on a single task: Recovering the metadata.
This "Batch Strategy" was the final breakthrough. Instead of asking the AI to "Read all 95 posts," which leads to hallucinations and data-fatigue, I treated it like an investigative interview. One batch. Ten URLs. Total accuracy.
Instead of asking Gemini to keep guessing from the raw JSON blob, I flipped the approach. I had already shared the master text file with it as its source of truth. So now I gave it the URLs from my existing HTML, ten at a time, with one precise instruction: find these ten URLs in the file I already gave you, extract the actual content for each one, and return them back to me with the title, five keywords, and a soulful description — formatted so I can paste directly into my HTML.
Ten URLs. Ten accurate returns. Copy. Paste. Next ten.
This sounds tedious on paper. In practice it was the opposite — because Gemini was no longer guessing from chaos. It was navigating a file it already held, to a specific address I gave it. The results were accurate every single time.
That pivot — from "extract from noise" to "find this exact thing" — is the move that made everything else possible.
Step Five
What Those Three Attributes Actually Do
For anyone technically curious: each link in the library now carries three invisible attributes alongside its visible text.
The title attribute carries the full branded post name — including the WordsByEkta🌿 mark on every single entry. The data-keywords holds the specific thematic tags. The data-description is the soulful summary — the thing Gemini wrote after actually reading the post.
None of this is visible to a reader browsing normally. It lives in the HTML, quiet and invisible, waiting.
Step Six
The Search Box That Listens to the Soul
Once all 95 posts were enriched, I added a search box. Not a server-side search, not a plugin, not a third-party widget. A small JavaScript function — about twenty lines — that does one specific thing very well.
When you type into the search box on the Library page, it does not just scan the link text you can see. It reads all four layers simultaneously: the visible title, the branded full title, the keywords, and the description. So if you type burnout, you find posts that explicitly mention it. If you type rest or exhaustion or mental load, you find the same post — because those words live in its invisible soul.
The complexity didn't move from "slow server" to "smart code." It moved from the code entirely into the HTML itself. The enrichment is the intelligence. The script just listens to what was already there.
Honest Account
What This Actually Took
I want to be honest about the sequence of failures before the solution, because that is the real story.
| Attempt | Method | Result |
|---|---|---|
| 1 | Google Takeout | Failed — 13 of 95 |
| 2 | Raw JSON → Gemini (No Specific Instructions) | Hallucinated after 4 articles |
| 3 | Raw JSON → Gemini (Targeted URL Lookups in Batches of 10) | Worked every time |
Three attempts. Two failures. One pivot. The pivot only became obvious after the failures — which is, I think, how most real solutions actually arrive.
The manual enrichment — reading 95 posts, writing 95 descriptions, assigning 95 sets of keywords — would have taken days done by hand. Done intelligently, in controlled batches with the right tool at the right task, it took sessions. Not days.
For the Technically Curious
Try This on Your Own Blogger Blog
-
Get your full post data Open this URL (replace yourblog with your subdomain):
yourblog.blogspot.com/feeds/posts/default?alt=json&max-results=150
Over 150 posts? Add &start-index=151 to paginate. -
Don't ask AI to parse the raw JSON directly Share the saved file as a Source of Truth, then feed your post URLs in batches of ten with a precise instruction — find each URL in the file and return the title, keywords, and a one-sentence description.
-
Add three attributes to every link title, data-keywords, data-description — on every single anchor tag in your library HTML.
-
Write one search function Check all three attributes plus the visible text. That is all the "smart search" needs. The search box is twenty lines. The enrichment is the work. Do the work first.
The Library at WordsByEkta now has 95+ posts, nine categories, and a search box that knows what each post is actually about — not just what it is called.
But the thing I keep coming back to is not the technical solution. It is the moment when the official tool returned 13 posts out of 95 and I had a choice: accept the incomplete picture, or find another way in.
I found another way in.
That, more than anything, is what this blog is about.
Meticulously crafting words — and sometimes, the scaffolding that holds them. 🌿
Explore the Master Library →
Comments
Post a Comment