· Andrei M. · Automation · 11 min read
Case Study: A Garden Tools Retailer Scraped Supplier Catalogs and Cut Setup Time by 80%
A garden tools retailer was spending 2 days manually copying product data from supplier websites that offered no CSV or API feed. Web scraping via URL import reduced new supplier onboarding from 16 hours to 3.
Case Study: A Garden Tools Retailer Scraped Supplier Catalogs and Cut Setup Time by 80%
A garden tools retailer operating a WooCommerce store with roughly 3,800 active SKUs was adding approximately 400 new products per quarter across four suppliers. Two of those suppliers provided regular CSV data feeds. The other two had no structured data export capability — just product pages on their respective websites. For those two suppliers, every new product started with someone reading a web page and typing information into a spreadsheet. The manual data entry was consuming two full working days per catalog refresh cycle.
The Challenge
The retailer’s catalog covered hand tools, power tools, garden machinery, and accessories. The product data requirements for each category were substantial: tools carry regulatory compliance data, technical specifications across multiple dimensions, weight, safety classifications, compatibility matrices, and SKU-to-EAN mappings. A single power tool product might require 35-40 individual data fields to be fully described for sale.
Of the four suppliers, the two with CSV feeds sent files on a weekly schedule. The two without CSV feeds — one a German brand specializing in professional-grade pruning and cutting equipment, the other a domestic manufacturer of soil preparation tools — had no technical integration available. Both declined integration requests citing IT resource constraints.
The manual product data scraping process for these two suppliers worked as follows:
- A catalog assistant opened each supplier product page individually.
- They copied the product name, description, technical specifications, EAN, and any available certification data into a master spreadsheet.
- Images were downloaded manually and renamed to match the SKU naming convention.
- The spreadsheet was then mapped and imported into WooCommerce.
For a seasonal catalog update — typically 120-180 new products per supplier — this process took approximately 8 hours per supplier, or 16 hours total per quarterly cycle. The catalog assistant reported that the most time-consuming elements were not the typing but the context-switching: each product page had slightly different layouts, and the specification table structure varied between product categories on both supplier sites.
There were also data quality issues inherent to manual transcription. An internal audit run after the most recent catalog update found 34 products with at least one incorrect technical specification — wrong blade lengths, transposed weight values, or incorrectly recorded power ratings. Of those, 12 had resulted in customer service contacts from buyers who received products with specifications different from what was described.
[SCREENSHOT: MicroPIM URL import interface showing a supplier product page URL being processed, with extracted fields appearing in the field mapping panel on the right]
What They Tried First
The catalog manager had experimented with browser-based copy-paste automation tools before. A Chrome extension that attempted to extract highlighted text into a clipboard format worked for simple product pages but failed on pages where the specification data was rendered inside structured tables or embedded in JavaScript-loaded content. For approximately 40% of the products on both supplier sites, the extension produced incomplete extracts that still required manual correction.
The team also requested a data partnership with both suppliers — a formal arrangement where the retailer would be included in a monthly data feed that the supplier produced for their larger retail partners. The German brand was receptive but indicated the process would take 3-4 months to set up, as it required account manager approval and IT team involvement on their side. The domestic supplier had no such feed program in place.
A third option explored was hiring a freelance data entry specialist to handle the two manual suppliers. Quotes from two freelancers came in at approximately €18-22 per hour. At 16 hours per quarterly cycle, the cost would be €1,200-1,400 per year — modest on its own, but the catalog manager noted that the quarterly cycle was expected to increase as the retailer added more products to these supplier lines. The linear cost growth of the freelancer model made it unattractive as a long-term solution.
The Solution
The retailer implemented MicroPIM’s URL import feature to handle product data scraping from the two supplier websites. URL import works by taking a list of product page URLs, fetching the page content, parsing the structured data, and mapping extracted fields to the catalog’s attribute schema. The system can handle product data extraction from standard HTML pages, structured specification tables, JSON-LD metadata, and Open Graph tags.
Step 1: Mapping the Supplier Page Structure
Before running any import, the team completed a one-time mapping session for each supplier site. MicroPIM’s URL import tool analyzes a sample URL and presents the extractable data elements — page title, meta description, specification table rows, image URLs, and any schema.org markup — alongside the destination fields in the MicroPIM catalog schema.
The German pruning equipment site used consistent specification tables across all product categories, which made mapping straightforward. Field extraction rules were set up in approximately 90 minutes. The domestic soil tools supplier had two different page templates — one for motorized equipment and one for hand tools — each with a different specification structure. This required two separate mapping profiles, which took about 2.5 hours to configure including testing.
Total mapping setup time: 4 hours across both suppliers. This was a one-time investment; as long as the supplier sites did not restructure their product pages, no re-mapping was required.
Step 2: Generating and Running the URL List
For each catalog refresh, the catalog assistant visits each supplier’s new arrivals or updated products section to identify product page URLs for items not yet in the MicroPIM catalog. This takes approximately 30-45 minutes per supplier — a task that is browsing and verification rather than manual data entry.
The URL list is pasted into MicroPIM’s batch import queue. The system processes each URL sequentially, extracting product data according to the stored mapping profile, and creates draft product records in the MicroPIM catalog. For a batch of 150 product URLs, processing completes in roughly 20 minutes.
[SCREENSHOT: Batch import queue showing 148 product URLs processed, with field extraction confidence scores and a list of products flagged for manual review where extraction confidence was below threshold]
Step 3: Reviewing and Approving Extracted Data
Not every product page produces a clean extraction. The URL import system applies confidence scoring to each extracted field — based on whether the content matched the expected format (numeric, text, measurement unit) and whether the field was found in the expected location on the page. Products where one or more fields have confidence below the set threshold are placed in a review queue.
In the first live batch of 152 products, 18 were flagged for review. The common issues were:
- Three products where the EAN was formatted differently (spaces rather than no delimiter) and needed a simple normalization.
- Eight products where the power rating field contained a range value (“750W - 1100W”) rather than a single figure, requiring a decision about how to record it.
- Seven products from the domestic supplier where the second page template was used but the system had defaulted to the first mapping profile due to a URL pattern overlap.
The review and correction for all 18 flagged products took 45 minutes. The remaining 134 products were accepted without change.
Step 4: Image Handling
Images extracted via URL import are downloaded from the supplier server and stored in MicroPIM’s asset library. The naming convention rules in the system rename downloaded images according to the retailer’s SKU-based file naming standard automatically. For the first batch, 152 products generated 487 images, all correctly named and attached to the relevant product records without manual intervention.
The Results
The first full quarterly catalog update using URL import for both manual suppliers was completed in 3.1 hours, compared to the previous 16-hour manual process. Measured outcomes:
- Onboarding time reduction: 80.6%. From 16 hours to 3.1 hours per quarterly cycle.
- Data quality improvement: error rate dropped from 34 errors per 150 products to 4. The four remaining errors were all in the flagged review queue and caught before the products were published.
- Image processing time: eliminated. The previous manual download-and-rename workflow for images took approximately 2.5 hours per batch. This is now handled automatically.
- Customer service contacts related to incorrect specifications dropped by 68% in the quarter following the implementation, compared to the same quarter the prior year.
- Estimated annual labor saving: 52 hours based on four quarterly cycles at 13 hours saved per cycle. At a fully-loaded internal cost of approximately €25/hour for the catalog assistant role, this represents roughly €1,300 in labor cost recovered per year, excluding the reduction in customer service costs.
The setup for the German supplier’s data feed — which was in progress when MicroPIM was implemented — was eventually completed four months later. The team chose to keep using the URL import for that supplier anyway, because the weekly CSV feed the supplier produced used different attribute naming and required its own mapping work, and the URL import was producing cleaner data.
[SCREENSHOT: Completed batch import summary for a quarterly catalog refresh showing 152 products imported, 18 reviewed, 0 errors on final publish, and time elapsed]
Key Takeaways
- Not every supplier will offer a CSV or API feed, and waiting for them to build one is an unreliable dependency. Product data scraping via URL import provides a practical alternative when the data exists on a supplier website but no structured export is available.
- One-time mapping setup amortizes quickly. Four hours of initial configuration saves 13 hours per quarterly cycle, paying back in the first run.
- Confidence-scored review queues are the difference between automated product data extraction and unreliable automation. Flagging uncertain extractions for human review keeps data quality high without requiring full manual review of every record.
- The real cost of manual data entry is not just the labor — it is also the error rate that comes with high-volume repetitive tasks. Transcription errors in technical specifications create downstream customer service costs that are harder to quantify but materially real.
- URL import works best when the supplier site has a consistent page template. Sites with multiple templates per product type require additional mapping profiles, but this is still a one-time setup cost per template, not a per-product cost.
Manual product data scraping from supplier websites is one of the most time-intensive and error-prone tasks in ecommerce catalog management. If you have suppliers who do not offer structured data feeds, MicroPIM’s URL import feature can extract, map, and import product data in bulk from any standard product page. Start a free account and test it against your supplier’s catalog at app.micropim.net/register.
Related Reading
- Every Import Format Explained
- Feed Your Store: One-Link Product Import
- Case Study: Beauty Brand Competitor Specs
Frequently Asked Questions
Does URL import work on supplier sites that require a login to view product pages?
MicroPIM’s URL import works with publicly accessible product pages. For supplier portals that require authentication, the recommended approach is to use a session cookie or to request a direct data export in any format the supplier can provide — even an HTML page save works as input. The URL import feature is designed for public-facing supplier catalogs and distributor sites rather than gated B2B portals.
What happens when a supplier updates their website layout and the mapping profile breaks?
When the page structure changes, extracted fields will either be missing or will have low confidence scores, which triggers the review queue. The team would notice a higher-than-normal proportion of flagged products in the next batch. The mapping profile can be updated by running the mapping tool against a sample of the new page layout — typically 30-60 minutes of re-mapping work, which is still significantly less than returning to full manual entry.
Can MicroPIM handle specification data that is embedded in images rather than text?
Not directly. URL import extracts text content and structured data from HTML. Specification data that is only available as image content — scanned catalog pages or tables rendered as images — requires OCR processing before import, which is outside the scope of the URL import feature. For this type of supplier content, the recommended path is requesting a text-based source from the supplier.
How does the system handle duplicate products if a URL is included in two import batches?
MicroPIM checks for existing products using EAN as the primary match key during URL import. If a product with the same EAN already exists in the catalog, the system offers the option to update the existing record (useful for refreshing specifications or images) or skip the duplicate. The default behavior can be configured per import profile, so recurring catalog refresh batches can be set to update-on-match automatically.

