How-to-Scrape-Decathlon-Using-Playwright-in-Python-for-Product-Data-Retrieval

Decathlon, a renowned sporting goods retailer, offers various sports apparel, footwear, and equipment. This article explores how to scrape apparel data by category using Playwright and Python to collect valuable insights into product trends and pricing from Decathlon's website.

Playwright is a library that helps control web browsers like Chromium, Firefox, and WebKit using programming languages such as Python and JavaScript. It's an excellent tool for data scraping from ecommerce website and automating tasks like form submissions and button clicks. Utilizing Playwright, we will navigate through each category and gather essential product information, including name, price, and description.

This tutorial provides a foundational understanding of using Playwright and Python for scraping Decathlon's website, focusing on extracting key data attributes from individual product pages.

List of Data Fields

List-of-Data-Fields
  • Product URL
  • Product Name
  • Brand
  • MRP
  • Sale Price
  • Number of Reviews
  • Ratings
  • Color
  • Features
  • Product Information

Below, you'll find a step-by-step guide to scrape Decathlon with Playwright in Python.

Import Necessary Libraries

To initiate the procedure, we should begin by importing the necessary libraries to enable us to interact with the website and retrieve the essential information.

Import-Necessary-Libraries

The following libraries serve specific purposes in automating browser testing with Playwright:

'random': This library is employed to generate random numbers, which can help create test data or shuffle the order of tests.

'asyncio': It is used for managing asynchronous programming in Python, particularly when utilizing Playwright's asynchronous API.

'pandas': This library is helpful for data analysis and manipulation. In this tutorial, it is applicable to store and manipulate data acquired from the web pages under examination.

'async_playwright': This library represents the asynchronous API for Playwright, and it plays a crucial role in automating browser testing. The asynchronous nature of this API enables the execution of multiple operations concurrently, resulting in faster and more efficient testing procedures.

These libraries collectively support the automation of browser testing with Playwright, covering tasks such as generating test data, managing asynchronous operations, data handling, and automating interactions with web browsers.

Scrape Product URLs

The next step involves extracting the URLs of the apparel products based on their respective categories.

Scrape-Product-URLs

In this context, our Decathlon product data scraping services use the function 'get_product_urls' to retrieve product URLs from a web page. This function harnesses the capabilities of the Playwright library for automating browser testing and gathering the resulting product URLs from the webpage. It accepts two parameters: 'browser' and 'page,' both instances of the Playwright Browser and Page classes, respectively.

The process begins by using 'page.querySelectorAll()' to locate all elements on the page containing the product URLs. Subsequently, a for loop helps to iterate through these elements, extracting the 'href' attribute, which contains the product page's URL.

Additionally, the function checks for a "next" button on the page. If such a button exists, the function clicks on it and invokes itself recursively to retrieve URLs from the subsequent page. This recursive process continues until all relevant product URLs are available.

Additionally-the-function-checks-for-a-next-button-on-the-page-If-such-a-button-exists-the

In this scenario, we aim to scrape product URLs categorized by product type. To achieve this, we follow a two-step process. Initially, we click the product category button to reveal the list of available categories. Subsequently, we click on each category to filter and gather the relevant product URLs.

In-this-scenario-we-aim-to-scrape-product-URLs-categorized-by-product-type-To-achieve-this

In this context, we utilize the Python function 'filter_products' to filter products on the Decathlon website by their respective categories and furnish a list of product URLs, along with their associated categories.

The process commences with the expansion of the product category section on the website, followed by the activation of the "Show All" button to reveal all available subcategories. Subsequently, a predefined list of subcategories is available, and the function iterates through each. For each subcategory, the corresponding checkbox applies the desired filtering criteria to the products. After selecting a subcategory, the function patiently waits for the page to load and employs the 'get_product_urls' function to extract the list of product URLs.

After processing all the subcategories, the function performs a cleanup operation by clicking the "Clear" button for each subcategory, effectively resetting the filters.

Scrape Product Name

The subsequent step involves extracting the names of the products from the web pages.

Scrape-Product-Name

In this context, we've employed an asynchronous function called 'get_product_name,' which accepts a 'page' argument representing a Playwright page object. Within this function, locate the product name element on the page using the 'query_selector()' method of the 'page' object, with the appropriate CSS selector provided. On locating the element, the function retrieves the text content of the element and returns it as a string.

However, in the event of an exception occurring during the process, such as when the element is not available on the page, the function assigns the 'product_name' variable the value "Not Available."

Scrape Product Brand

The subsequent step involves extracting the brand information of the products from the web pages.

Scrape-Product-Brand

Much like extracting the product name, the function 'get_brand_name' is responsible for retrieving the brand name of a product from a web page. The process involves an attempt to locate the brand name element using a CSS selector that targets the specific element containing the brand name. When the element is successfully available, the function extracts the text content using the 'text_content()' method and assigns it to the 'brand_name' variable. It's important that the brand name may include both the primary brand name and any sub-brand names, for instance, "Decathlon Wedze," where "Wedze" is one of the sub-brands of Decathlon. If an exception occurs during the search or extraction process for the brand name element, the function defaults to assigning "Not Available" to the brand name.

A similar approach can extract other attributes, such as MRP, sale price, number of reviews, ratings, color, features, and product information. For each attribute, a separate function is available, utilizing the 'query_selector' method and 'text_content' or equivalent methods to pinpoint the relevant element on the page and gather the desired information. Additionally, it's essential to adjust the CSS selectors used within these functions to align with the specific structure of the scraped web page.

Scrape MRP of Products

Scrape-MRP-of-Products

Scrape Sales Price

Scrape-Sales-Price

Scrape Number of Reviews

Scrape-Number-of-Reviews

Scrape Ratings

Scrape-Ratings

Scrape Features of Products

Scrape Product Information

Scrape-Product-Information

The code defines an asynchronous function named get_ProductInformation, which takes a page object as its argument. This function can retrieve product information from Decathlon's website. It iterates through each entry in the product information and extracts the text content from the "name" and "value" elements using the text_content method. Subsequently, it eliminates any newline characters from the collected strings with the replace method and stores the name-value pair in a dictionary named product information. If an exception is raised, for instance, if the element is unavailable or unable to extract, the code assigns the "Not Available" value to the ProductInformation dictionary.

Implementing Maximum Retry Limit for Request Retries

In web scraping, request retries are critical in handling temporary network errors or unexpected responses from websites. The primary goal is to reattempt a failed request, increasing the likelihood of a successful outcome.

Before accessing the target URL, the script incorporates a retry mechanism to address potential timeouts. It employs a while loop to repeatedly attempt the URL navigation until the request succeeds or the maximum number of retries has been exhausted. On achieving the maximum retry limit without success, the script raises an exception.

This code represents a function that executes a request to a specified link and handles retries in case of failure. This function proves valuable when scraping web pages, as network issues or timeouts can occasionally lead to request failures.

This-code-represents-a-function-that-executes-a-request-to-a-specified-link-and-handles

This function is responsible for requesting a specific URL using the 'goto' method provided by the Playwright library's page object. In the event of a request failure, the function attempts to retry it, allowing for a maximum of five retries as determined by the constant MAX_RETRIES. The function incorporates the 'asyncio.sleep' method ranging from 1 to 5 seconds between each retry to avoid immediate reattempts. This deliberate pause is essential to prevent overloading the request and potentially causing more failures.

The perform_request_with_retry function expects two arguments: 'page' and 'link.' The 'page' argument represents the Playwright page object responsible for the request, while the 'link' argument specifies the URL to the directed request. Continuing with the process, we invoke the functions and store the extracted data in an initially empty list.

The-perform-request-with-retry-function-expects-two-arguments-page-and-link-The

This Python script employs an asynchronous function called "main" to scrape product information from Decathlon web pages. It utilizes the Playwright library to initiate a Firefox browser, navigate to the Decathlon page, and extract the URLs of each product using the "extract_product_urls" function. Store these URLs in a list named "product_url." The script then iterates through each product URL, loads the product page using the "perform_request_with_retry" function, and retrieves various details such as the product name, brand, star rating, number of reviews, MRP, sale price, number of reviews, ratings, color, features, and product information.

Store the information as tuples in a list called "data." Additionally, the script displays a progress message after processing every 10 product URLs and a completion message once all the product URLs have been processed. The data in the "data" list is then converted into a Pandas DataFrame and saved as a CSV file using the "to_csv" method. Finally, the browser is closed using the "browser.close()" statement. Execute the script by calling the "main" function using the "asyncio.run(main())" statement, running it as an asynchronous coroutine.

Conclusion: In today's rapidly evolving business landscape, data is paramount, and web scraping is the gateway to unlocking its full potential. With the correct data and tools, brands can gain profound insights into the market, facilitating informed decisions that drive growth and profitability.

To remain competitive in the modern business world, brands must leverage every advantage available to stay ahead of competitors. It is where web scraping becomes crucial, enabling companies to access vital insights on market trends, pricing strategies, and competitor data.

By harnessing the capabilities of Playwright and Python tools, companies can extract valuable data from websites like Decathlon, obtaining a wealth of information about product offerings, pricing, and other critical metrics. When combined with the ecommerce website data collection services of a leading web scraping company, the results can be truly transformative and game-changing.

At Product Data Scrape, we uphold unwavering ethical standards in every facet of our operations, be it our Competitor Price Monitoring Services or Mobile App Data Scraping. With a worldwide footprint encompassing numerous offices, we steadfastly provide outstanding and transparent services to cater to the varied requirements of our esteemed clientele.

LATEST BLOG

How Event-triggered Price Intelligence Scraper for USA Transforms Pricing Strategies with Real-Time Data?

Discover how an Event-triggered Price Intelligence Scraper for USA provides real-time data to optimize pricing strategies, stay competitive, and boost profitability.

Extract ASDA Grocery Prices and Stock Available Data for Pricing & Stock Intelligence

Gain real-time insights by Extract ASDA Grocery Prices and Stock Available Data, enabling pricing analysis and smarter stock management decisions.

How to Scrape Multiple HTML Tables Using Python for Data Analysis?

Learn how to scrape multiple HTML tables using Python to extract structured data efficiently for analysis, reporting, and data-driven decision-making.

Case Studies

Discover our scraping success through detailed case studies across various industries and applications.

Why Product Data Scrape?

Why Choose Product Data Scrape for Retail Data Web Scraping?

Choose Product Data Scrape for Retail Data scraping to access accurate data, enhance decision-making, and boost your online sales strategy.

Reliable-Insights

Reliable Insights

With our Retail data scraping services, you gain reliable insights that empower you to make informed decisions based on accurate product data.

Data-Efficiency

Data Efficiency

We help you extract Retail Data product data efficiently, streamlining your processes to ensure timely access to crucial market information.

Market-Adaptation

Market Adaptation

By leveraging our Retail data scraping, you can quickly adapt to market changes, giving you a competitive edge with real-time analysis.

Price-Optimization

Price Optimization

Our Retail Data price monitoring tools enable you to stay competitive by adjusting prices dynamically, attracting customers while maximizing your profits effectively.

Competitive-Edge

Competitive Edge

With our competitor price tracking, you can analyze market positioning and adjust your strategies, responding effectively to competitor actions and pricing.

Feedback-Analysis

Feedback Analysis

Utilizing our Retail Data review scraping, you gain valuable customer insights that help you improve product offerings and enhance overall customer satisfaction.

Awards

Recipient of Top Industry Awards

clutch

92% of employees believe this is an excellent workplace.

crunchbase
Awards

Top Web Scraping Company USA

datarade
Awards

Top Data Scraping Company USA

goodfirms
Awards

Best Enterprise-Grade Web Company

sourcefroge
Awards

Leading Data Extraction Company

truefirms
Awards

Top Big Data Consulting Company

trustpilot
Awards

Best Company with Great Price!

webguru
Awards

Best Web Scraping Company

Process

How We Scrape E-Commerce Data?

See the results that matter

Read inspiring client journeys

Discover how our clients achieved success with us.

6X

Conversion Rate Growth

“I used Product Data Scrape to extract Walmart fashion product data, and the results were outstanding. Real-time insights into pricing, trends, and inventory helped me refine my strategy and achieve a 6X increase in conversions. It gave me the competitive edge I needed in the fashion category.”

7X

Sales Velocity Boost

“Through Kroger sales data extraction with Product Data Scrape, we unlocked actionable pricing and promotion insights, achieving a 7X Sales Velocity Boost while maximizing conversions and driving sustainable growth.”

"By using Product Data Scrape to scrape GoPuff prices data, we accelerated our pricing decisions by 4X, improving margins and customer satisfaction."

"Implementing liquor data scraping allowed us to track competitor offerings and optimize assortments. Within three quarters, we achieved a 3X improvement in sales!"

Resource Hub: Explore the Latest Insights and Trends

The Resource Center offers up-to-date case studies, insightful blogs, detailed research reports, and engaging infographics to help you explore valuable insights and data-driven trends effectively.

Get In Touch

How Event-triggered Price Intelligence Scraper for USA Transforms Pricing Strategies with Real-Time Data?

Discover how an Event-triggered Price Intelligence Scraper for USA provides real-time data to optimize pricing strategies, stay competitive, and boost profitability.

Extract ASDA Grocery Prices and Stock Available Data for Pricing & Stock Intelligence

Gain real-time insights by Extract ASDA Grocery Prices and Stock Available Data, enabling pricing analysis and smarter stock management decisions.

How to Scrape Multiple HTML Tables Using Python for Data Analysis?

Learn how to scrape multiple HTML tables using Python to extract structured data efficiently for analysis, reporting, and data-driven decision-making.

How Data Scraping for Market Potential Transformed Business Strategy

Discover how Data Scraping for Market Potential helped businesses gain actionable insights, optimize strategies, and drive growth in competitive markets.

Scrape Product Data from Fashion Sites via API - Faster Detection

Discover how retailers scrape product data from fashion sites via API to gain 40% faster trend detection, improved pricing accuracy, and real-time insights.

Benefits of Web Scraping for New Startups USA - Driving Growth Through Data-Driven Decisions

Benefits of Web Scraping for New Startups USA: Discover how startups leverage web scraping to gain insights and drive growth with data-driven decisions.

Web Scraping Flipkart vs Meesho Discount Data India - Comparing Product Discounts and Seller Ratings

Web scraping Flipkart vs Meesho discount data India to compare product discounts, analyze seller ratings, and provide actionable insights.

How to Extract Google Trends Insights Using Python?

Learn how to extract Google Trends insights using Python with PyTrends. Step-by-step guide to scrape trends data and analyze market patterns effectively.

Extract Carrefour Online Grocery Listing Data UAE to Track 95% of Products and Pricing Trends in Real Time

Discover how to Extract Carrefour Online Grocery Listing Data UAE to track 95% of products, monitor pricing trends, and gain real-time market insights

Scrape Data From Any Ecommerce Websites

Easily scrape data from any eCommerce website to track prices, monitor competitors, and analyze product trends in real time with Real Data API.

Walmart vs Amazon: Who Leads Online E-Commerce?

Explore how Walmart and Amazon compete in online e-commerce, comparing sales, growth trends, and strategies to see who truly leads the market.

Web Scraping for Competitive Pricing Intelligence – Product Data Scrape 2025

Unlock real-time Web Scraping for Competitive Pricing Intelligence. Track prices, discounts & inventory shifts with Product Data Scrape.

Top 10 Product Categories on Naver Smartstore

Naver Smartstore’s top categories include Fashion, Beauty, Electronics, Home, Health, Baby, Food, Books, Sports, and Pet Supplies, catering to diverse shopper needs.

5 Best Uses of Data Scraping Services for Corporate World

Explore the 5 best uses of data scraping services for the corporate world—enhancing market research, competitor tracking, lead generation, and business growth.

DoorDash vs Instacart vs GoPuff: Who Leads Canada’s Grocery Delivery Race?

Compare Canada’s fastest grocery delivery services—DoorDash, Instacart, and GoPuff—to see who leads in speed, coverage, and customer satisfaction.

FAQs

E-Commerce Data Scraping FAQs

Our E-commerce data scraping FAQs provide clear answers to common questions, helping you understand the process and its benefits effectively.

E-commerce scraping services are automated solutions that gather product data from online retailers, providing businesses with valuable insights for decision-making and competitive analysis.

We use advanced web scraping tools to extract e-commerce product data, capturing essential information like prices, descriptions, and availability from multiple sources.

E-commerce data scraping involves collecting data from online platforms to analyze trends and gain insights, helping businesses improve strategies and optimize operations effectively.

E-commerce price monitoring tracks product prices across various platforms in real time, enabling businesses to adjust pricing strategies based on market conditions and competitor actions.

Let’s talk about your requirements

Let’s discuss your requirements in detail to ensure we meet your needs effectively and efficiently.

bg

Trusted by 1500+ Companies Across the Globe

decathlon
Mask-group
myntra
subway
Unilever
zomato

Send us a message