How to Test Figma Plugins Before Publishing

Testing a Figma plugin before publishing means confirming it performs its intended function correctly across the range of files, node structures, and user actions it will actually encounter in the wild — not just the handful of test files a developer happened to build it against. That distinction matters more than it sounds. A plugin can run flawlessly in the file where it was built and still fail the moment a real user points it at a document with nested components, nested pages, or ten thousand layers instead of forty.

The rest of this post walks through a single case: a client project where a plugin passed every check the developer ran and still generated a wave of one-star reviews within its first week live. Working backward through what that testing process missed says more about how to structure pre-publish testing than a generic checklist would.

The Plugin: A Batch Renaming Tool

The plugin in question renamed layers in bulk according to a pattern the user defined — replacing “Rectangle 47” and its siblings with something like “card/header/icon” across an entire selection. Straightforward in concept, and the developer had tested it thoroughly against his own design system file: a clean, well-organized document with maybe 200 layers, consistent naming already in place, and no unusual structures.

It worked perfectly there. Every test case he’d written passed. He published it.

Within the first week, reviews started coming in describing crashes on large files, renamed layers that hadn’t actually matched the intended pattern, and in a couple of cases, layers that lost their content entirely. None of this had shown up in his own testing, because his own testing had never left the conditions of a single, tidy file.

Where the Testing Gap Actually Lived

Working through the bug reports one by one revealed a pattern: every failure traced back to a structural assumption the plugin had made that held true in the developer’s test file but broke down elsewhere.

File size and layer count. The plugin looped through selected layers synchronously, which was fine at 200 layers and produced a visible freeze — sometimes interpreted by users as a crash — once selections climbed into the thousands. His test file simply never had enough layers to expose this. Testing against a large, unwieldy file with tens of thousands of nodes needs to be a standard part of pre-publish verification, not an afterthought reserved for “performance testing” that gets skipped when time is short.

Node type variety. The renaming pattern worked correctly on frames and rectangles but produced malformed names on text nodes with existing multi-line content, and on component instances where the plugin’s logic assumed it could freely rewrite a name that Figma treats as inherited from the main component unless explicitly overridden. His test file happened to be light on both text nodes and instances relative to what a typical production file contains.

Selection edge cases. Users reported failures when they selected a mix of top-level frames and deeply nested children simultaneously — a selection shape the developer had never tried, because his own workflow always selected uniformly at one level of the hierarchy.

None of these were exotic bugs. Each one was a direct consequence of testing against a single convenient file rather than deliberately constructing test conditions that stressed the assumptions built into the code.

Rebuilding the Test Process Around Structural Variation

Fixing the immediate bugs was the easy part. The harder, more valuable work was redesigning how testing happened going forward, so the next plugin release wouldn’t repeat the pattern.

Test file diversity became the first fix. Instead of one clean file, testing moved to a small set of deliberately varied files: one large and messy with inconsistent existing naming, one built almost entirely from component instances and variants, one with deep nesting six or seven levels down, and one that mixed all of the above with no particular organization — closer to what an average user’s actual working file looks like than any developer’s polished internal example.

Node type coverage came second. Rather than testing “does it work on layers,” testing shifted to explicitly working through frames, groups, text nodes, vectors, component instances, and boolean operations separately, since each of these behaves differently under Figma’s API and a plugin that handles one correctly won’t necessarily handle another the same way.

Selection pattern testing came third. Single-node selections, multi-node selections at the same hierarchy level, mixed-level selections, and selections spanning multiple pages all needed distinct test passes, because the plugin’s behavior under each selection shape was not something that could be inferred from how it behaved under the others.

Performance testing at scale came fourth, using the large messy file specifically to check whether operations completed in reasonable time and whether the plugin gave users any indication that work was in progress rather than appearing frozen.

What This Restructured Process Caught the Second Time Around

Applying this expanded testing approach to the fixed version of the plugin surfaced two additional issues before the update went live — issues that would very likely have generated their own round of negative reviews had they shipped unnoticed.

The first was a silent failure on locked layers: the plugin skipped them without renaming, which was arguably correct behavior, but gave the user no feedback that anything had been skipped, leaving them to assume every layer had been renamed when some quietly hadn’t been. The second was a naming pattern that produced duplicate names across sibling layers under certain input patterns, which didn’t break anything functionally but undermined the entire point of the tool for anyone relying on unique names downstream.

Both issues were small in isolation. Both would have been invisible to a testing process built around a single clean file, and both were caught directly by testing against structural variety the first pass had lacked.

The General Lesson From This Case

The specific bugs here — synchronous loops on large selections, mishandled component instance names, silent skips on locked layers — are less important than the pattern connecting them. Every one traced back to a test file that didn’t represent the range of files, node types, and selection shapes a real published plugin has to handle. A developer’s own working file is, almost by definition, a poor stand-in for that range, since it reflects one person’s habits and one project’s structure rather than the diversity of an entire user base.

Building a small library of deliberately varied test files — large and messy, instance-heavy, deeply nested, mixed — before publishing costs relatively little time up front and catches a disproportionate share of the issues that would otherwise surface as user complaints after release. Testing node type coverage explicitly, rather than assuming uniform behavior across frames, text, vectors, and instances, closes a second common gap. And treating performance at scale as a required test rather than an optional one addresses the kind of failure that looks like a crash to users even when the plugin is, technically, still working.

None of this requires exotic tooling. It requires treating the test file itself as a variable worth deliberately controlling, rather than defaulting to whatever file happens to be open.

A Testing Checklist Distilled From This Case

Test Category	What to Check	Why It Matters
File size/scale	Large file with thousands of layers	Surfaces performance failures and apparent freezes
Node type coverage	Frames, text, vectors, instances, groups tested separately	Each node type behaves differently under the API
Selection shape	Single, multi, mixed-level, cross-page selections	Assumptions about uniform selections often break
Existing state	Locked layers, existing names, variants	Silent skips or overwrites need explicit user feedback
Output correctness	Verify no unintended duplicates or malformed results	Functional success isn’t the same as correct output

If your current testing process resembles the developer’s original approach — one file, one pass, ship it — the fix isn’t more testing in general. It’s testing against the specific kinds of structural variation this case exposed, before a wave of one-star reviews does the exposing for you.