Advanced JSON-LD Schema: How the @graph and @id Method Earned AI Overview Citations
By Cap Puckhaber, Reno, Nevada
I wrote about schema aggregation a while back, and the response surprised me. Readers kept asking the same follow-up question. They wanted to know what comes after aggregation, once the basic entities are connected and the validator shows zero errors. This post is that answer, and it gets technical fast.
Most schema advice stops at “add Organization markup and call it done.” That advice gets a site to entry level. It does nothing for the sites competing against hundreds of other businesses for the same AI Overview citation, and it does nothing for sites trying to get pulled into Perplexity or Copilot answers. The method I’m about to walk through is the one I now run on every client site with more than fifty pages.
Why Basic Schema Stops Working Once You Scale
A single Organization block on a homepage works fine for a five-page brochure site. Add fifty blog posts, a dozen authors, a product catalog, and a handful of locations, and that single block starts collapsing under its own weight. Each page ends up declaring its own version of the organization, the author, and the publisher, and none of those versions match perfectly.
I saw this firsthand on a financial services client with 340 published articles. Eleven different author bylines existed in the schema, but the actual writing staff numbered six people. Because each page generated its own JSON-LD independently, three writers who left the company still showed up as active authors in the structured data eight months after their last article.
That kind of drift doesn’t throw an error in the Rich Results Test. The page still validates. But the entity picture Google and AI systems build from that site is fractured, and a fractured entity picture is exactly what tanks citation eligibility in AI Overviews. Search Engine Land’s reporting on schema and AI search backs this up, noting that the real opportunity isn’t schema in isolation but the combination of structured data with proper entity relationships built through @graph and @id connections. The fix isn’t more schema. It’s connected schema, built once and referenced everywhere.
What Black Diamond Schema Actually Means
I use the term “black diamond” the way ski resorts use it. It’s not about adding more markup. It’s about handling the parts of structured data that punish small mistakes and reward precision, the way a steep, narrow run punishes a sloppy turn.
At this level, you’re not asking “what schema type should I use.” You’re asking how forty different entities on a domain reference each other without duplication, without contradiction, and without breaking when one piece of content changes. That’s a different skill than picking Article versus BlogPosting. It requires thinking about your entire site as one connected dataset instead of a stack of independent pages.
The Difference Between a Schema Tag and a Schema Graph
A schema tag describes one thing on one page. A Product tag says “this is a product.” An Article tag says “this is an article.” Each tag works in isolation, and that isolation is exactly the limitation. Nothing in the markup itself tells a crawler how that product or article relates to anything else on the domain.
A schema graph links those tags into a single connected structure using the @id property. The Article references the author by @id instead of repeating the author’s full name and credentials. The author references the organization by @id instead of re-declaring the company name on every byline. Google’s own structured data documentation confirms this is the direction search engines have been pushing toward as they shift from indexing documents to indexing entities, which is exactly why Google’s structured data guidelines emphasize describing the content of a page as it relates to the broader entities behind it.
The @graph Method I Use on Every Client Site
Here’s the actual structure. Instead of writing a separate <script type="application/ld+json"> block for every entity, I wrap everything in a single @graph array. Inside that array, the WebSite entity, the Organization entity, every Person entity, and the page-level content type all live together, referencing each other by @id. Every page on the site can then pull from that same shared structure instead of generating its own isolated version.
This single change solved the duplicate author problem on the financial services site within one sprint. We defined six Person entities once, gave each a stable @id, and pointed every article’s author property at the matching @id instead of re-declaring the author inline. The eleven phantom bylines disappeared because there was no longer a place for an incorrect one to hide.
The practical benefit goes beyond cleanup. Because every page now references the same Organization node, updating the company’s logo URL or social profiles means editing one site-wide template instead of touching three hundred individual pages. Drift becomes structurally difficult, not just procedurally discouraged.
How I Structure the @id Convention
Consistency in the @id format matters more than people expect. I use the page URL plus a hash fragment for every entity, so the organization always reads as https://domain.com/#organization and the website always reads as https://domain.com/#website. Authors get https://domain.com/#author-firstname-lastname.
This convention has to stay identical across every template on the site. If one developer writes #org on the blog template and another writes #organization on the product template, the references silently fail to connect, and you lose the entire benefit of the graph without a single validation error to warn you. I keep the convention documented in a one-page reference sheet that goes into every client’s technical SEO folder.
Since the fragment only needs to be unique within that page’s own JSON-LD block, you don’t need globally unique strings. What you need is discipline. Pick the pattern once, write it down, and never deviate from it across templates, plugins, or future developers touching the code.
The Mistake That Broke an Entire Client Graph
I have to be honest about a mistake here, because it cost a client six weeks of clean data. I built a @graph structure for an ecommerce site and used the product page URL as the @id for the Organization entity instead of a dedicated organization fragment.
Because three different product pages all declared an Organization with the same name but different @id values, the structured data technically validated on each individual page, yet nothing tied those organization mentions back to a single authoritative entity. Google’s crawler had no way to confirm that “Organization on Product A” was the same business as “Organization on Product B.” The knowledge panel for the brand stayed unclaimed for the entire six weeks.
I caught it during a routine connectivity check using a schema visualizer rather than the Rich Results Test, since the Rich Results Test only confirms eligibility and never checks whether your nodes actually connect to each other. Once I rebuilt the Organization entity with one consistent @id and referenced it from every product page, the entity consolidation happened within two crawl cycles. I now run a connectivity check on every project before launch specifically because of that client.
Connecting Author Entities to the Organization Node
Author schema is where most sites leave real authority on the table. A Person entity sitting alone in an article’s byline tells search engines almost nothing verifiable. A Person entity connected to the Organization through worksFor, and connected to external profiles through sameAs, tells a completely different story.
I add three things to every author entity I build. The worksFor property points back to the Organization’s @id, confirming employment in a machine-readable way instead of a text bio buried at the bottom of an article. The sameAs array links to the author’s LinkedIn profile and any other verified public identity, since cross-referencing those external profiles is what allows search engines to confirm a real person exists behind the byline. The knowsAbout property lists the specific topics that person has demonstrated expertise in across their published work.
A B2B SaaS client added these three properties to their twelve most active authors. Within ten weeks, four of those authors began appearing with their name attached in AI-generated answers for branded queries, something that had never happened before the update. We didn’t change a single word of article content. We connected entities that were already true but previously invisible to machines.
What Happened When I Added knowsAbout Declarations
The knowsAbout property doesn’t get enough attention in most schema guides, and that’s a mistake because it’s one of the clearest topical authority signals available. It lets you explicitly declare the subjects your organization and its authors have genuine expertise in, rather than hoping a crawler infers that expertise from body text alone.
I tested this directly on a healthcare client by adding knowsAbout declarations covering six specific clinical specialties to their Organization and Person schema. Before the change, the site appeared in AI Overview citations for exactly two of those six specialty terms. Four months after adding the declarations, citations appeared for five of the six, with the sixth specialty showing improved but inconsistent appearances.
Because knowsAbout works as a direct declaration rather than an inferred signal, it tends to move faster than content-based topical authority building. That doesn’t mean the underlying content can skip quality. It means the declaration removes ambiguity for a system that’s actively trying to match query topics to trustworthy, verified sources.
Using @graph to Fix Schema Drift at Scale
Schema drift happens when your visible content changes but your structured data doesn’t keep up. A price updates in the CMS, but the hardcoded JSON-LD still shows last quarter’s number. An author leaves, but the byline schema still references them months later, the same problem that started this whole conversation.
The @graph method doesn’t eliminate drift by itself, but it makes drift dramatically easier to catch and fix. Since every entity exists once and gets referenced everywhere, fixing a single source of truth fixes every page that points to it. Compare that to the old method, where three hundred individual pages each needed individual correction.
For sites running on WordPress, I pair this structure with Yoast’s site-wide entity settings so the Organization and author data generates from one centralized configuration rather than per-page input fields. That combination, centralized data plus graph-based referencing, is what actually prevents drift instead of just making cleanup faster after the fact.
Extending the Graph with Speakable Schema
Speakable schema marks specific sections of a page as suitable for text-to-speech and voice assistant playback. It’s a smaller piece of the puzzle than Organization or Person schema, but it fits naturally into a connected graph because it references the same WebPage entity everything else already points to.
I added Speakable markup to the summary paragraphs on a news-style client’s top fifty articles, pointing the cssSelector at the lead paragraph and the key takeaway block on each page. Within six weeks, Search Console began showing impressions for voice-related query patterns that hadn’t appeared before. The number was modest, around 400 additional weekly impressions across those fifty pages, but it cost almost nothing to implement once the graph already existed.
Because the WebPage entity in our @graph already had a stable @id, nesting the Speakable property inside it took ten minutes per template rather than a separate implementation project. That’s the real argument for building the graph properly in the first place. Once the foundation exists, adding new schema types becomes incremental work instead of a fresh build every time.
Dataset Schema for Data-Heavy Pages
Dataset schema doesn’t come up in most SEO conversations, but it’s become one of the more interesting tools for sites publishing original research, survey results, or industry benchmarks. It tells search engines and AI systems that a specific page contains a structured dataset, complete with variables, measurement types, and a license.
I implemented this for a market research client publishing quarterly industry surveys. Their data pages were getting crawled like ordinary articles, with no special recognition for the actual numbers buried in their tables and charts. After adding Dataset schema referencing the same Organization @id as the rest of their site, two of their survey pages began appearing in Google’s dataset search results within five weeks.
That visibility channel barely existed for them before. Since their competitors were not using Dataset markup at all, the client picked up citations from academic and journalism sites linking back to their raw numbers, something their content team hadn’t expected from a schema change. Original data plus Dataset schema is a combination most competitors in most industries simply haven’t implemented yet, which makes it one of the lower-difficulty wins available right now.
Programmatic Schema Generation for Large Content Libraries
Manual JSON-LD works fine for a site with a few dozen pages. It breaks down entirely once you’re managing a few thousand. At that scale, the only sustainable approach is generating the @graph programmatically from the same data source that populates the visible page.
On a directory-style client site with over six thousand listing pages, we built a template that pulled location, category, and review data directly from the database and assembled the JSON-LD at render time. Every listing automatically referenced the same Organization @id, the same WebSite @id, and a dynamically generated @id for its own LocalBusiness entity. No human ever touched an individual page’s schema again.
This eliminated an entire category of error. Because the markup generates from the same fields a customer sees on the page, a price change or address update can’t create a mismatch between visible content and structured data. The investment was roughly three developer days. The payoff was permanent consistency across a library that would have taken months to maintain by hand.
Avoiding Duplicate Entity Generation at Scale
The biggest risk in programmatic generation is accidentally creating a new Organization entity on every page instead of referencing the one canonical entity. I’ve audited sites where a templating bug generated a slightly different Organization block on each page, each with its own unique @id, which created thousands of disconnected organization nodes instead of one.
The fix is almost always the same. Pull the Organization and WebSite blocks from a single global include file or site-wide configuration object, and only generate the page-specific entity, whether that’s a Product, an Article, or a LocalBusiness, dynamically. Everything that should stay constant across the site needs to come from one place, not be re-derived on every template render.
Multi-Location Entity Graphs for Local Businesses
Multi-location businesses face a specific version of this problem. Each location needs its own LocalBusiness entity with its own address, phone number, and hours, but all of those locations still belong to the same parent Organization. Getting that relationship right in the graph determines whether Google treats the locations as a connected franchise or as unrelated businesses that happen to share a name.
I structure this with a parentOrganization property on each LocalBusiness entity, pointing back to the central Organization @id. For a regional healthcare client with fourteen locations, this single change helped consolidate scattered location-specific reviews and citations under one recognized brand entity rather than fourteen disconnected ones. Local pack rankings across their service area improved at nine of the fourteen locations within twelve weeks.
The five locations that didn’t see the same lift had a separate issue, inconsistent NAP data across third-party directories, which is a reminder that schema can only declare what’s true. If your external citations contradict your structured data, the graph alone won’t resolve that conflict. Clean up the external inconsistencies first, then let the schema graph reinforce what’s already accurate.
The Validation Workflow I Run Before Every Push
I run two separate checks on every @graph implementation, and they catch different categories of problems. The first is straightforward JSON syntax validation, since a single misplaced comma or unescaped quote can break the entire graph rather than just one entity within it.
The second check is connectivity testing, where I confirm that every @id reference actually resolves to a defined entity within that same graph. A reference pointing to an @id that doesn’t exist anywhere in the block is a silent failure. The page renders fine, the script tag parses as valid JSON, but the relationship the markup was supposed to establish simply doesn’t exist.
I document both checks in a spreadsheet for every client site, noting the date, the pages tested, and any issues found. That documentation became essential after the ecommerce mistake I described earlier, because it gave me a clear record of when the graph was last confirmed healthy and what changed since then.
When @graph Implementation Needs a Developer
Plugins handle foundational schema well, but the @graph method with custom @id conventions across hundreds of templated pages usually requires custom development work. A developer can build the JSON-LD generation directly into the templating layer, pulling from the same database fields that populate the visible page content.
That approach matters because it makes drift structurally difficult rather than just procedurally discouraged. If the schema generates from the same source as the visible price, author name, or product title, those two things cannot fall out of sync without the entire page breaking. For smaller sites without that custom infrastructure, a disciplined plugin configuration plus quarterly manual audits gets you most of the same benefit at a fraction of the cost.
What Black Diamond Schema Won’t Do
This method won’t manufacture topical authority that doesn’t exist. If your author genuinely hasn’t published anything credible on a topic, declaring knowsAbout for that topic is a misrepresentation, and misrepresented schema is exactly the kind of mismatch that damages trust signals instead of building them.
It also won’t fix a site with no real entity behind it. I’ve had prospective clients ask me to build elaborate author and organization graphs for content written entirely by freelancers with no public profile and no verifiable history. Because there’s nothing real for the sameAs references to connect to, the graph ends up empty no matter how carefully it’s structured. Connected schema amplifies a real, verifiable entity. It cannot invent one.
Frequently Asked Questions
What is the @graph method in JSON-LD and why does it matter for SEO?
The @graph method lets you declare multiple schema entities once in a single JSON-LD block, then reference each one by a stable @id from anywhere else on the site. Instead of repeating your organization name, author details, or publisher information on every page, every page points back to the same defined entity. This eliminates duplication, prevents the kind of entity drift that confuses search engines, and gives AI systems a single, verifiable source of truth about who you are.
How is @id different from the url property in schema markup?
The url property tells a search engine the public web address of a page or entity. The @id property is an internal reference system within your JSON-LD that lets one entity point to another inside the same graph. They often share the same value, but @id exists specifically so a Person entity can say “I work for this exact Organization” by pointing at a fragment identifier rather than repeating the organization’s full details inline.
Can schema drift happen even with a well-built @graph structure?
Yes, though it becomes far easier to catch and fix. Since every entity is defined once and referenced everywhere, a single correction to the source entity automatically updates every page that points to it. The biggest remaining risk is human error during template changes, which is why a documented @id convention and a regular connectivity check matter even after the graph is built correctly.
Do I need a developer to implement advanced JSON-LD with @graph and @id?
Smaller sites with a handful of pages can often build this manually or through a well-configured plugin like Yoast. Larger sites with hundreds of templated pages benefit significantly from custom development, since a developer can generate the JSON-LD directly from the same database fields that populate the visible content, which prevents drift at the structural level instead of relying on manual upkeep.
How long does it take to see AI Overview citation changes after implementing entity graphs?
Based on the client work I’ve documented, changes typically show up within four to ten weeks, though the timeline varies by industry, content quality, and how fragmented the schema was before the fix. The healthcare client I mentioned saw citation changes within four months for a full set of specialty terms. A B2B SaaS client saw author-level citation changes within ten weeks. Neither result is guaranteed, since AI citation behavior depends on many factors beyond schema alone.
What’s the most common mistake people make when building a schema graph?
Inconsistent @id values across templates. If one page references an entity as #organization and another references it as #org, the connection silently fails even though both pages still validate without errors. Document your @id convention before you start building, and audit it whenever a new developer or template touches the site.
How to master your Google Business Profile
Ai Schema Aggregation for Beginners
The Truth About Peanut Butter Raises
Diversify Your Business Model Now
My Blogs
Explore the latest in artificial intelligence, advertising and marketing news from Black Diamond. Read my latest business, side projects, and journey on my personal website.
Master your personal finance with my investing guides. And for hiking and backpacking guides, trails and gear check out The Hiking Adventures.


Cap Puckhaber
Backpacker, Marketer, Investor, Blogger, Husband, Dog-Dad, Golfer, Snowboarder
Cap Puckhaber is a marketing strategist, finance writer, and outdoor enthusiast from Reno, Nevada.
He writes across CapPuckhaber.com, TheHikingAdventures.com, SimpleFinanceBlog.com, and BlackDiamondMarketingSolutions.com.
Follow him for honest, real-world advice backed by 20+ years of experience.

