Google Takeout Blogger Not Working: Part 18

WordsByEkta🌿 · Behind the Blog

The Google Takeout Failure: How I Rescued 95 Posts and Built a Searchable Library from Scratch

Some things break before they get better. This is the story of a failed export, a clever workaround, and a library I now fully own.

An editorial illustration split in two halves — on the left a vintage filing cabinet with only 13 folders and a crumpled Takeout Failed paper on the floor, and on the right a glowing laptop showing green JSON code beside a screen with amber Python code, with a triumphant stack of 95 cream folders connected by a glowing thread between them and floating labels reading JSON feed and Python script — WordsByEkta

13 of 95. Takeout failed. JSON and Python found the way in — WordsByEkta

The Problem

No One Warns You About This

I have been writing on WordsByEkta for years. Essays on identity, motherhood, feminism, healing — things that took time, thought, and a kind of emotional labour that doesn't show up in word counts. Over time, the blog grew. Posts accumulated. And one day I looked at it and thought: this needs to be organised. Properly.

Not just a list of links somewhere. A real library. Something a reader could walk into and find exactly what they were looking for — whether that was a piece on mom guilt, the Bhagavad Gita, or the peculiar exhaustion of a creative who has too many ideas and not enough hours.

Simple enough plan. Until I tried to actually do it.

Step One

The Official Way Failed

The sensible first move was Google Takeout — the tool Google provides to export your own data from your own blog. I ran it. I waited. The file arrived.

It was an atom file. It contained 13 articles — out of 95.

Thirteen. I sat with that number for a moment. The official export of my own content, from my own blog, returned less than 14% of what existed. There was no error message. No warning. Just a quietly incomplete file and no explanation for where the other 82 posts had gone.

If you have ever felt locked out of something you built yourself, you know exactly what that moment feels like.

The Official Route (If it works for you)

Before trying my JSON trick, here is the official process Google provides. It is a journey through multiple tabs and settings:

Blogger Settings: Go to Settings > Manage Blog and click Back up content. A dialog box will appear; click Download.
The Takeout Hand-off: You will be redirected to a new Google Takeout tab. Because you clicked from Blogger, the "Blogger" product is already ticked. Note: The UI shows "multiple formats" (Atom for feeds, JSON for metadata, CSV for reactions) which look like dropdowns but are actually static information.
The Configuration: Click Next Step.
- Destination: Send download link via email (Auto-selected).
- Frequency: Export once.
- File type & size: .zip and 2 GB.
The Extraction: Once you receive the email, you'll download a zip (e.g., takeout-20260403...zip). Inside, the path is:
Takeout > Blogger > Blogs > [Your Blog Name] >
This folder contains your settings.csv, theme-classic.html, theme-layouts.xml,favicon.ico, followers.csv, and the feed.atom file where your posts live.

The Warning: This is where I stopped. My feed.atom file only contained 13 of my 95 posts. No explanation. No error. If this happens to you, proceed to Step Two.

Step Two

Finding the Master Key

I did not give up. I started looking for another way in.

Blogger, like most Google products, has a JSON feed — a raw data endpoint that the platform uses internally but rarely advertises to users. The URL pattern is straightforward:

https://yourblog.blogspot.com/feeds/posts/default?alt=json&max-results=150

I opened it. The browser filled with a wall of raw JSON — messy, deeply nested, not designed to be read by a human. But it was all there. Every post. Every title. Every URL. 95 entries, not 13.

I copied the entire thing into Notepad and saved it as a text file. That text file became my starting point — the Master File.

Step Three

Where AI Helped — and Where It Didn't

I handed the text file to Gemini as its Source of Truth and asked it to do what seemed logical: read the content and return for each post a title, a set of keywords, and a short description.

It gave me three or four correct ones. Then it started hallucinating.

Names that didn't exist. Topics I had never written about. Descriptions that had the right shape but were pointing at the wrong post entirely. This is a known limitation — when you give a large language model a massive blob of unstructured data and ask it to extract specific meaning from each piece, it starts confabulating. The task is too open, the data too noisy, the structure too loose.

So I stopped fighting the tool's weakness and changed the approach entirely.

Pro Tip: Gathering your URL List

I have been building my "Master Index" manually since Day One, categorizing every article under specific headings as I write. This discipline gave me my list of 95 URLs ready to go.

However, if you are starting this rescue mission after publishing dozens of posts, don't copy them one by one. You can extract every single URL from your Blogger sitemap in seconds using my Microsoft Word Wildcard trick:

The Sitemap Extraction Shortcut →

Step Four

The Pivot That Changed Everything

Using the clean list of URLs I had already curated (see the Pro Tip above), I fed them to Gemini ten at a time. By pointing the AI to specific URLs within my raw JSON text file, I removed the "noise" and focused its attention on a single task: Recovering the metadata.

This "Batch Strategy" was the final breakthrough. Instead of asking the AI to "Read all 95 posts," which leads to hallucinations and data-fatigue, I treated it like an investigative interview. One batch. Ten URLs. Total accuracy.

Instead of asking Gemini to keep guessing from the raw JSON blob, I flipped the approach. I had already shared the master text file with it as its source of truth. So now I gave it the URLs from my existing HTML, ten at a time, with one precise instruction: find these ten URLs in the file I already gave you, extract the actual content for each one, and return them back to me with the title, five keywords, and a soulful description — formatted so I can paste directly into my HTML.

Ten URLs. Ten accurate returns. Copy. Paste. Next ten.

This sounds tedious on paper. In practice it was the opposite — because Gemini was no longer guessing from chaos. It was navigating a file it already held, to a specific address I gave it. The results were accurate every single time.

That pivot — from "extract from noise" to "find this exact thing" — is the move that made everything else possible.

Step Five

What Those Three Attributes Actually Do

For anyone technically curious: each link in the library now carries three invisible attributes alongside its visible text.

<a href="[post-url]" title="The Weight of That Morning Vol. 03 — WordsByEkta🌿" data-keywords="Mental Load, Burnout, Emotional Weight, Daily Struggle, Resilience" data-description="Analysing the heavy silence of a difficult morning. A deep dive into the mental load and the quiet strength required to move forward."> Vol 3: The Weight of That Morning </a>

The title attribute carries the full branded post name — including the WordsByEkta🌿 mark on every single entry. The data-keywords holds the specific thematic tags. The data-description is the soulful summary — the thing Gemini wrote after actually reading the post.

None of this is visible to a reader browsing normally. It lives in the HTML, quiet and invisible, waiting.

Step Six

The Search Box That Listens to the Soul

Once all 95 posts were enriched, I added a search box. Not a server-side search, not a plugin, not a third-party widget. A small JavaScript function — about twenty lines — that does one specific thing very well.

When you type into the search box on the Library page, it does not just scan the link text you can see. It reads all four layers simultaneously: the visible title, the branded full title, the keywords, and the description. So if you type burnout, you find posts that explicitly mention it. If you type rest or exhaustion or mental load, you find the same post — because those words live in its invisible soul.

The complexity didn't move from "slow server" to "smart code." It moved from the code entirely into the HTML itself. The enrichment is the intelligence. The script just listens to what was already there.

Honest Account

What This Actually Took

I want to be honest about the sequence of failures before the solution, because that is the real story.

Attempt	Method	Result
1	Google Takeout	Failed — 13 of 95
2	Raw JSON → Gemini (No Specific Instructions)	Hallucinated after 4 articles
3	Raw JSON → Gemini (Targeted URL Lookups in Batches of 10)	Worked every time

Three attempts. Two failures. One pivot. The pivot only became obvious after the failures — which is, I think, how most real solutions actually arrive.

The manual enrichment — reading 95 posts, writing 95 descriptions, assigning 95 sets of keywords — would have taken days done by hand. Done intelligently, in controlled batches with the right tool at the right task, it took sessions. Not days.

For the Technically Curious

Try This on Your Own Blogger Blog

Get your full post data Open this URL (replace yourblog with your subdomain):
yourblog.blogspot.com/feeds/posts/default?alt=json&max-results=150
Over 150 posts? Add &start-index=151 to paginate.
Don't ask AI to parse the raw JSON directly Share the saved file as a Source of Truth, then feed your post URLs in batches of ten with a precise instruction — find each URL in the file and return the title, keywords, and a one-sentence description.
Add three attributes to every link title, data-keywords, data-description — on every single anchor tag in your library HTML.
Write one search function Check all three attributes plus the visible text. That is all the "smart search" needs. The search box is twenty lines. The enrichment is the work. Do the work first.

The Library at WordsByEkta now has 95+ posts, nine categories, and a search box that knows what each post is actually about — not just what it is called.

But the thing I keep coming back to is not the technical solution. It is the moment when the official tool returned 13 posts out of 95 and I had a choice: accept the incomplete picture, or find another way in.

I found another way in.

That, more than anything, is what this blog is about.

Meticulously crafting words — and sometimes, the scaffolding that holds them. 🌿

Explore the Master Library →

Advanced Python Method

Alternative: Export Blogger Posts with the Blogger API

If Google Takeout does not give you complete Blogger data, or if it only exports limited posts, you can use the Blogger API instead. This method reads your published posts directly from Blogger and saves the post title, URL, dates, and post HTML into a Word file. If the Blogger API exposes the search description, the script will include it too.

  This script is read-only. It does not edit, delete, publish, or update anything on your blog.

This is useful if you want to keep a local Word file backup of your Blogger post HTML, especially after editing published posts.

What This Script Does

Connects to Blogger using OAuth 2.0.
Fetches all live posts from your blog.
Copies the post title, URL, published date, updated date, and post HTML.
Includes the search description only if Blogger API returns it.
Saves everything into one Word file.
Checks whether posts are new or edited when you run it again.
Rebuilds the Word file only when something has changed.

What This Script Does Not Do

It does not change your Blogger posts.
It does not edit your theme HTML.
It does not publish or delete anything.
It only reads your post data and creates a Word file on your laptop.

Before You Start

You need to enable the Blogger API in Google Cloud and create an OAuth 2.0 client file. Save that client file somewhere safe on your laptop.

You also need your Blogger Blog ID. You can usually see it in your Blogger dashboard URL.

https://www.blogger.com/blog/posts/YOUR_BLOG_ID

Install the required Python packages:

pip install google-api-python-client google-auth-oauthlib google-auth-httplib2 python-docx

Python Script: Export Blogger Posts to Word

In the script below, replace:

PASTE_YOUR_BLOG_ID_HERE with your real Blogger Blog ID.
OAUTH_CLIENT_FILE with the location of your OAuth client JSON file.

import os
import json
from docx import Document
from docx.shared import Pt
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials

BLOG_ID = "PASTE_YOUR_BLOG_ID_HERE"

OAUTH_CLIENT_FILE = r"C:\Users\Varun\Documents\Blogger\client_secret.json"

OUTPUT_DOCX = "blogger_posts_export.docx"
STATE_FILE = "blogger_sync_state.json"
TOKEN_FILE = "token.json"

SCOPES = ["https://www.googleapis.com/auth/blogger.readonly"]



def get_service():
    creds = None

    if os.path.exists(TOKEN_FILE):
        creds = Credentials.from_authorized_user_file(TOKEN_FILE, SCOPES)

    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                OAUTH_CLIENT_FILE,
                SCOPES
            )
            creds = flow.run_local_server(port=0)

        with open(TOKEN_FILE, "w", encoding="utf-8") as token:
            token.write(creds.to_json())

    return build("blogger", "v3", credentials=creds)


def fetch_all_posts(service):
    all_posts = []
    page_token = None

    while True:
        request = service.posts().list(
            blogId=BLOG_ID,
            maxResults=100,
            fetchBodies=True,
            status=["LIVE"],
            pageToken=page_token
        )

        response = request.execute()
        all_posts.extend(response.get("items", []))

        page_token = response.get("nextPageToken")
        if not page_token:
            break

    return all_posts


def load_old_state():
    if not os.path.exists(STATE_FILE):
        return {}

    with open(STATE_FILE, "r", encoding="utf-8") as f:
        return json.load(f)


def make_new_state(posts):
    return {
        post.get("id", ""): {
            "title": post.get("title", ""),
            "updated": post.get("updated", ""),
            "url": post.get("url", ""),
        }
        for post in posts
        if post.get("id")
    }


def save_state(state):
    with open(STATE_FILE, "w", encoding="utf-8") as f:
        json.dump(state, f, indent=2, ensure_ascii=False)


def get_search_description(post):
    return (
        post.get("customMetaData")
        or post.get("description")
        or post.get("searchDescription")
        or "Not available from Blogger API"
    )


def build_word_file(posts):
    doc = Document()
    doc.add_heading("Blogger Posts Export", 0)

    posts = sorted(posts, key=lambda p: p.get("published", ""), reverse=True)

    for index, post in enumerate(posts, start=1):
        title = post.get("title", "Untitled Post")
        post_id = post.get("id", "")
        post_url = post.get("url", "")
        published = post.get("published", "")
        updated = post.get("updated", "")
        html_content = post.get("content", "")
        search_description = get_search_description(post)

        doc.add_heading(f"Post {index}: {title}", level=1)

        doc.add_paragraph(f"Title Box Title: {title}")
        doc.add_paragraph(f"Post ID: {post_id}")
        doc.add_paragraph(f"URL: {post_url}")
        doc.add_paragraph(f"Published: {published}")
        doc.add_paragraph(f"Last Updated: {updated}")
        doc.add_paragraph(f"Search Description: {search_description}")

        doc.add_paragraph("Post HTML:")

        p = doc.add_paragraph()
        run = p.add_run(html_content)
        run.font.name = "Consolas"
        run.font.size = Pt(9)

        doc.add_page_break()

    doc.save(OUTPUT_DOCX)


def describe_changes(old_state, new_state):
    old_ids = set(old_state.keys())
    new_ids = set(new_state.keys())

    added = new_ids - old_ids
    removed = old_ids - new_ids

    edited = {
        post_id
        for post_id in old_ids & new_ids
        if old_state[post_id].get("updated") != new_state[post_id].get("updated")
    }

    return added, edited, removed


def main():
    service = get_service()
    posts = fetch_all_posts(service)

    old_state = load_old_state()
    new_state = make_new_state(posts)

    added, edited, removed = describe_changes(old_state, new_state)

    if not old_state:
        print("First run: creating Word file.")
        build_word_file(posts)
        save_state(new_state)
        print(f"Done. Exported {len(posts)} posts to {OUTPUT_DOCX}")
        return

    if not added and not edited and not removed:
        print("No new or edited posts found. Word file is already up to date.")
        return

    print(f"Changes found: {len(added)} new, {len(edited)} edited, {len(removed)} removed.")
    print("Rebuilding Word file with latest Blogger data.")

    build_word_file(posts)
    save_state(new_state)

    print(f"Done. Updated {OUTPUT_DOCX} with {len(posts)} current posts.")


if __name__ == "__main__":
    main()

Update: When I later ran this Blogger API script, it exported 117 posts successfully with titles, URLs, dates, post IDs, and full post HTML. The search description field showed “Not available from Blogger API,” so treat that field as optional.

Written by Ekta · WordsByEkta🌿
Emotional Storyteller. Blogger Troubleshooter.

Everything I Learned — So You Don't Have To Figure It Out Alone

The technical mistakes I made in year one — the full HTML inside Blogger, the missing meta descriptions, the duplicate H1 tags, the links closing articles — I have written all of it down. Every fix. Every discovery. Every hour of confused trial and error turned into a clear guide.

📊 SEO Health Checker

Analyze your website’s SEO basics instantly. Check titles, meta descriptions, headings, indexing signals and overall SEO health using our free browser-based SEO tool.

Open Free SEO Health Checker

Share on WhatsApp

← Previous 🌿 Explore All Next →

Search This Blog

Google Takeout Blogger Not Working: Part 18 — WordsByEkta🌿

The Google Takeout Failure: How I Rescued 95 Posts and Built a Searchable Library from Scratch

No One Warns You About This

The Official Way Failed

Finding the Master Key

Where AI Helped — and Where It Didn't

The Pivot That Changed Everything

What Those Three Attributes Actually Do

The Search Box That Listens to the Soul

What This Actually Took

Try This on Your Own Blogger Blog

Alternative: Export Blogger Posts with the Blogger API

What This Script Does

What This Script Does Not Do

Before You Start

Python Script: Export Blogger Posts to Word

Everything I Learned — So You Don't Have To Figure It Out Alone

Comments

Post a Comment

Popular Posts

Stop Uploading PDFs Online — Unlock Them Yourself — WordsByEkta🌿

Publish Your Android App on Google Play Store — WordsByEkta🌿

How to Set Up Your Blogger About Me Page: Part 02 — WordsByEkta🌿

Get thoughtful reflections, creator updates, and practical digital insights from WordsByEkta🌿

Google Takeout Blogger Not Working: Part 18 — WordsByEkta🌿

The Google Takeout Failure: How I Rescued 95 Posts and Built a Searchable Library from Scratch

No One Warns You About This

The Official Way Failed

Finding the Master Key

Where AI Helped — and Where It Didn't

The Pivot That Changed Everything

What Those Three Attributes Actually Do

The Search Box That Listens to the Soul

What This Actually Took

Try This on Your Own Blogger Blog

Alternative: Export Blogger Posts with the Blogger API

What This Script Does

What This Script Does Not Do

Before You Start

Python Script: Export Blogger Posts to Word

Everything I Learned — So You Don't Have To Figure It Out Alone

🌿 The WordsByEkta Blogger Technical Series

Comments

Post a Comment

Popular Posts

Stop Uploading PDFs Online — Unlock Them Yourself — WordsByEkta🌿

Publish Your Android App on Google Play Store — WordsByEkta🌿

How to Set Up Your Blogger About Me Page: Part 02 — WordsByEkta🌿

Get thoughtful reflections, creator updates, and practical digital insights from WordsByEkta🌿