Python Playwright Web Scraping: Automate Any Website in 2026

You've been there. You open up BeautifulSoup, point it at a site, and get back a skeleton of empty <div> tags. The data you need is right there on screen — but the HTML is generated by JavaScript after the page loads, which means traditional scrapers come up empty every time.

That's the wall you hit with about 60% of modern websites in 2026. Single-page applications (SPAs), infinite-scroll feeds, login-gated dashboards, dynamically rendered tables — they're all powered by JavaScript frameworks like React, Vue, and Next.js. BeautifulSoup never sees the rendered content because it only reads static HTML.

Python Playwright web scraping automation solves this completely. Playwright controls a real browser, waits for JavaScript to execute, and hands you the fully-rendered DOM to scrape. In this tutorial you'll go from zero to scraping dynamic websites with async Python, handling infinite scroll, bypassing common detection traps, and saving results to CSV — all with working code.

Why Playwright Beats BeautifulSoup and Selenium for Modern Sites

Before writing a single line of code, it's worth understanding why Playwright has become the go-to tool for scraping JavaScript-heavy websites — and how it stacks up against the alternatives.

Playwright vs Selenium: Quick Comparison

Feature	Playwright	Selenium
Modern browser support	Chromium, Firefox, WebKit	Chrome, Firefox, Edge
Installation complexity	Single `pip install` + one CLI command	Requires separate WebDriver binaries
Async/await support	Native first-class support	Requires third-party wrappers
Auto-wait for elements	Built-in smart auto-waiting	Manual `WebDriverWait` required
Screenshot & PDF	Built-in	Limited
Speed	Faster (modern protocol)	Slower (older WebDriver protocol)
Stealth / anti-detection	Better defaults	More detectable
Active development	Microsoft-backed, rapid updates	Mature but slower iteration

Playwright uses the Chrome DevTools Protocol directly, which makes it significantly faster than Selenium's older WebDriver approach. Auto-waiting is the killer feature: instead of manually adding time.sleep() calls or configuring explicit waits, Playwright intelligently waits for elements to be visible, enabled, and stable before interacting with them.

BeautifulSoup paired with requests is still perfect for static HTML pages — it's lightweight and fast. But the moment a site requires JavaScript execution, Playwright is the right tool.

Installation and Setup

Getting Playwright running in Python takes about two minutes.

bash

1pip install playwright
2playwright install

The second command downloads the browser binaries (Chromium, Firefox, and WebKit). You only need to run it once per environment.

To install just Chromium and keep things lean:

bash

1playwright install chromium

Verify your installation works:

python

1from playwright.sync_api import sync_playwright
2
3with sync_playwright() as p:
4    browser = p.chromium.launch(headless=False)
5    page = browser.new_page()
6    page.goto("https://example.com")
7    print(page.title())
8    browser.close()

Run that script and you should see a browser window open, navigate to example.com, and print "Example Domain" to your terminal.

Headless vs Headful Mode

Playwright can run in two modes:

Headless (headless=True, the default): No visible browser window. Runs faster, ideal for production scraping and CI pipelines.
Headful (headless=False): A real browser window opens. Essential for debugging — you can see exactly what Playwright is doing.

python

1# Headless mode (production — faster, no UI)
2browser = p.chromium.launch(headless=True)
3
4# Headful mode (debugging — watch the browser work)
5browser = p.chromium.launch(headless=False, slow_mo=500)

The slow_mo parameter adds a millisecond delay between each action, making headful mode much easier to follow visually. Start every new scraping project in headful mode, then switch to headless once everything works.

Navigating Pages and Waiting for Elements

The most common scraping mistake is not waiting for elements to load. Playwright's built-in auto-wait handles most cases, but you sometimes need to be explicit.

python

1from playwright.sync_api import sync_playwright
2
3with sync_playwright() as p:
4    browser = p.chromium.launch(headless=True)
5    page = browser.new_page()
6
7    # Navigate and wait until the network is idle
8    page.goto("https://books.toscrape.com", wait_until="networkidle")
9
10    # Wait for a specific element to appear
11    page.wait_for_selector("article.product_pod")
12
13    # Extract all book titles
14    titles = page.query_selector_all("article.product_pod h3 a")
15    for title in titles:
16        print(title.get_attribute("title"))
17
18    browser.close()

Key wait strategies:

wait_until="networkidle" — waits until no network requests fire for 500ms
wait_until="domcontentloaded" — waits for the HTML to parse (faster, less safe)
page.wait_for_selector(css) — waits for a specific element
page.wait_for_load_state("networkidle") — waits after an action triggers loading

Extracting Table Data from Dynamic Pages

Dynamic tables are one of the most common scraping targets. Here's how to extract a full HTML table that's rendered by JavaScript:

python

1from playwright.sync_api import sync_playwright
2import csv
3
4with sync_playwright() as p:
5    browser = p.chromium.launch(headless=True)
6    page = browser.new_page()
7    page.goto("https://the-internet.herokuapp.com/tables", wait_until="networkidle")
8
9    # Extract table headers
10    headers = [th.inner_text() for th in page.query_selector_all("table#table1 thead th")]
11
12    # Extract table rows
13    rows = []
14    for tr in page.query_selector_all("table#table1 tbody tr"):
15        cells = [td.inner_text() for td in tr.query_selector_all("td")]
16        rows.append(cells)
17
18    # Save to CSV
19    with open("table_data.csv", "w", newline="", encoding="utf-8") as f:
20        writer = csv.writer(f)
21        writer.writerow(headers)
22        writer.writerows(rows)
23
24    print(f"Extracted {len(rows)} rows")
25    browser.close()

Handling Infinite Scroll

Infinite-scroll pages load content as you scroll down — a pattern used by Twitter/X, LinkedIn, product listing pages, and news feeds. Playwright can simulate scrolling to trigger content loading.

python

1from playwright.sync_api import sync_playwright
2import time
3
4def scrape_infinite_scroll(url: str, max_scrolls: int = 10) -> list[str]:
5    results = []
6
7    with sync_playwright() as p:
8        browser = p.chromium.launch(headless=True)
9        page = browser.new_page()
10        page.goto(url, wait_until="networkidle")
11
12        for scroll_num in range(max_scrolls):
13            # Collect visible items before scrolling
14            items = page.query_selector_all(".item-selector")
15            for item in items:
16                text = item.inner_text().strip()
17                if text and text not in results:
18                    results.append(text)
19
20            # Scroll to the bottom of the page
21            previous_height = page.evaluate("document.body.scrollHeight")
22            page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
23
24            # Wait for new content to load
25            page.wait_for_timeout(2000)
26            new_height = page.evaluate("document.body.scrollHeight")
27
28            # Stop if no new content loaded
29            if new_height == previous_height:
30                print(f"Reached end of page after {scroll_num + 1} scrolls")
31                break
32
33        browser.close()
34
35    return results

Replace ".item-selector" with the actual CSS selector for the content you want to extract. The height comparison trick detects when the page stops loading new content.

Taking Screenshots for Debugging and Monitoring

Screenshots are invaluable for debugging scraper failures and building website monitoring tools.

python

1from playwright.sync_api import sync_playwright
2
3with sync_playwright() as p:
4    browser = p.chromium.launch(headless=True)
5    page = browser.new_page()
6
7    # Set viewport size
8    page.set_viewport_size({"width": 1280, "height": 800})
9    page.goto("https://example.com")
10
11    # Full-page screenshot
12    page.screenshot(path="full_page.png", full_page=True)
13
14    # Screenshot of a specific element only
15    element = page.query_selector("h1")
16    element.screenshot(path="heading.png")
17
18    print("Screenshots saved")
19    browser.close()

Use full-page screenshots when a scraper starts returning unexpected results — a screenshot will instantly show you whether a login wall, CAPTCHA, or layout change broke your selector.

Async Scraping for Speed

Playwright's async API lets you scrape multiple pages concurrently, dramatically cutting total runtime for large scraping jobs.

python

1import asyncio
2import csv
3from playwright.async_api import async_playwright
4
5async def scrape_page(browser, url: str) -> dict:
6    page = await browser.new_page()
7    try:
8        await page.goto(url, wait_until="networkidle", timeout=30000)
9        title = await page.title()
10        # Add your selector-based extraction here
11        content = await page.inner_text("body")
12        return {"url": url, "title": title, "length": len(content)}
13    except Exception as e:
14        print(f"Error scraping {url}: {e}")
15        return {"url": url, "title": "ERROR", "length": 0}
16    finally:
17        await page.close()
18
19async def scrape_all(urls: list[str]) -> list[dict]:
20    async with async_playwright() as p:
21        browser = await p.chromium.launch(headless=True)
22
23        # Limit concurrency to avoid overwhelming the server
24        semaphore = asyncio.Semaphore(5)
25
26        async def scrape_with_limit(url):
27            async with semaphore:
28                return await scrape_page(browser, url)
29
30        results = await asyncio.gather(*[scrape_with_limit(url) for url in urls])
31        await browser.close()
32        return results
33
34# Run it
35urls = [
36    "https://books.toscrape.com/catalogue/page-1.html",
37    "https://books.toscrape.com/catalogue/page-2.html",
38    "https://books.toscrape.com/catalogue/page-3.html",
39]
40
41results = asyncio.run(scrape_all(urls))
42
43with open("results.csv", "w", newline="", encoding="utf-8") as f:
44    writer = csv.DictWriter(f, fieldnames=["url", "title", "length"])
45    writer.writeheader()
46    writer.writerows(results)
47
48print(f"Scraped {len(results)} pages")

The Semaphore(5) limits concurrent browser pages to 5 at a time. Increase it for faster scraping (within reason), or decrease it if the target site rate-limits you.

Anti-Detection Best Practices

Many websites use bot-detection systems (Cloudflare, DataDome, PerimeterX) that look for signs of browser automation. Here are the most effective countermeasures.

1. Use a realistic user agent:

python

1browser = p.chromium.launch(headless=True)
2context = browser.new_context(
3    user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
4               "(KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
5    viewport={"width": 1280, "height": 720},
6    locale="en-US",
7    timezone_id="America/New_York",
8)
9page = context.new_page()

2. Add human-like delays between actions:

python

1import random
2
3# Random delay between 1 and 3 seconds
4page.wait_for_timeout(random.randint(1000, 3000))

3. Use stealth mode with playwright-stealth:

bash

1pip install playwright-stealth

python

1from playwright_stealth import stealth_sync
2
3page = browser.new_page()
4stealth_sync(page)  # Patches browser fingerprint properties
5page.goto("https://target-site.com")

4. Rotate proxies for large-scale scraping:

python

1browser = p.chromium.launch(
2    headless=True,
3    proxy={"server": "http://proxy-host:8080", "username": "user", "password": "pass"}
4)

Important: Always check a website's robots.txt and Terms of Service before scraping. Respect rate limits and don't scrape personal data without authorization.

Putting It All Together: Full Scraper Template

python

1import asyncio
2import csv
3from playwright.async_api import async_playwright
4
5async def main():
6    async with async_playwright() as p:
7        browser = await p.chromium.launch(headless=True)
8        context = await browser.new_context(
9            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
10                       "(KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
11            viewport={"width": 1280, "height": 720},
12        )
13        page = await context.new_page()
14
15        await page.goto("https://books.toscrape.com", wait_until="networkidle")
16        await page.wait_for_selector("article.product_pod")
17
18        books = []
19        articles = await page.query_selector_all("article.product_pod")
20
21        for article in articles:
22            title_el = await article.query_selector("h3 a")
23            price_el = await article.query_selector(".price_color")
24            rating_el = await article.query_selector(".star-rating")
25
26            title = await title_el.get_attribute("title") if title_el else "N/A"
27            price = await price_el.inner_text() if price_el else "N/A"
28            rating = await rating_el.get_attribute("class") if rating_el else "N/A"
29
30            books.append({"title": title, "price": price, "rating": rating})
31
32        with open("books.csv", "w", newline="", encoding="utf-8") as f:
33            writer = csv.DictWriter(f, fieldnames=["title", "price", "rating"])
34            writer.writeheader()
35            writer.writerows(books)
36
37        print(f"Saved {len(books)} books to books.csv")
38        await browser.close()
39
40asyncio.run(main())

Best Practices and Common Mistakes

Always use context managers — the with sync_playwright() and async with async_playwright() patterns ensure browsers are properly closed even if your script crashes.

Set timeouts — default timeout is 30 seconds. For slow sites, increase it: page.set_default_timeout(60000).

Handle errors gracefully — wrap page.goto() in try/except blocks for production scrapers. Network errors and timeouts are inevitable at scale.

Don't forget to close pages — in async mode, always call await page.close() when done with a page. Unclosed pages consume memory.

Test selectors in browser DevTools — before writing code, open the browser console and test your CSS/XPath selectors with document.querySelector(). This saves hours of debugging.

Conclusion

Python Playwright web scraping automation eliminates the biggest limitation of traditional Python scrapers — the inability to execute JavaScript. Whether you're pulling data from React-powered dashboards, scraping infinite-scroll feeds, or monitoring price changes on dynamic e-commerce sites, Playwright gives you a full browser engine to work with.

The async API is the secret weapon here: running five concurrent browser pages can cut a 100-page scraping job from 10 minutes down to 2. Add smart waiting, stealth mode, and proper error handling, and you have a production-ready scraper that can handle almost anything the modern web throws at it.

Start with the sync API to understand the basics, then migrate to async once you need speed. Your BeautifulSoup days aren't over — but for anything JavaScript-rendered, reach for Playwright first.

Frequently Asked Questions

Is Playwright faster than Selenium for web scraping? Yes, in most benchmarks Playwright is 2–3× faster than Selenium. It uses the Chrome DevTools Protocol directly (rather than the older WebDriver protocol), has built-in auto-waiting, and has excellent native async support that allows true concurrent page scraping.

Can Playwright scrape websites protected by Cloudflare? Playwright alone won't bypass Cloudflare's advanced bot protection. You'll need a combination of playwright-stealth, realistic user agents, residential proxies, and potentially a service like Bright Data or Oxylabs for heavy-duty Cloudflare sites. Always check the site's ToS first.

What Python version does Playwright require in 2026? Playwright requires Python 3.9 or higher (Python 3.7 reached end-of-life in June 2023 and is no longer supported). Python 3.11 or 3.12 is recommended for best performance and compatibility in 2026. Check the Playwright Python docs for the current minimum version.

What's the difference between page.query_selector and page.locator? query_selector returns the first matching DOM element and is more familiar to developers coming from JavaScript. locator is Playwright's newer, recommended API — it's lazily evaluated and has built-in retry logic, making it more resilient. For new projects, prefer locator.

Python Playwright Web Scraping: Automate Any Website in 2026

Why Playwright Beats BeautifulSoup and Selenium for Modern Sites

Before writing a single line of code, it's worth understanding why Playwright has become the go-to tool for scraping JavaScript-heavy websites — and how it stacks up against the alternatives.

Playwright vs Selenium: Quick Comparison

Feature	Playwright	Selenium
Modern browser support	Chromium, Firefox, WebKit	Chrome, Firefox, Edge
Installation complexity	Single `pip install` + one CLI command	Requires separate WebDriver binaries
Async/await support	Native first-class support	Requires third-party wrappers
Auto-wait for elements	Built-in smart auto-waiting	Manual `WebDriverWait` required
Screenshot & PDF	Built-in	Limited
Speed	Faster (modern protocol)	Slower (older WebDriver protocol)
Stealth / anti-detection	Better defaults	More detectable
Active development	Microsoft-backed, rapid updates	Mature but slower iteration

BeautifulSoup paired with requests is still perfect for static HTML pages — it's lightweight and fast. But the moment a site requires JavaScript execution, Playwright is the right tool.

Installation and Setup

Getting Playwright running in Python takes about two minutes.

bash

1pip install playwright
2playwright install

The second command downloads the browser binaries (Chromium, Firefox, and WebKit). You only need to run it once per environment.

To install just Chromium and keep things lean:

bash

1playwright install chromium

Verify your installation works:

python

1from playwright.sync_api import sync_playwright
2
3with sync_playwright() as p:
4    browser = p.chromium.launch(headless=False)
5    page = browser.new_page()
6    page.goto("https://example.com")
7    print(page.title())
8    browser.close()

Run that script and you should see a browser window open, navigate to example.com, and print "Example Domain" to your terminal.

Headless vs Headful Mode

Playwright can run in two modes:

Headless (headless=True, the default): No visible browser window. Runs faster, ideal for production scraping and CI pipelines.
Headful (headless=False): A real browser window opens. Essential for debugging — you can see exactly what Playwright is doing.

python

1# Headless mode (production — faster, no UI)
2browser = p.chromium.launch(headless=True)
3
4# Headful mode (debugging — watch the browser work)
5browser = p.chromium.launch(headless=False, slow_mo=500)

Navigating Pages and Waiting for Elements

The most common scraping mistake is not waiting for elements to load. Playwright's built-in auto-wait handles most cases, but you sometimes need to be explicit.

python

1from playwright.sync_api import sync_playwright
2
3with sync_playwright() as p:
4    browser = p.chromium.launch(headless=True)
5    page = browser.new_page()
6
7    # Navigate and wait until the network is idle
8    page.goto("https://books.toscrape.com", wait_until="networkidle")
9
10    # Wait for a specific element to appear
11    page.wait_for_selector("article.product_pod")
12
13    # Extract all book titles
14    titles = page.query_selector_all("article.product_pod h3 a")
15    for title in titles:
16        print(title.get_attribute("title"))
17
18    browser.close()

Key wait strategies:

wait_until="networkidle" — waits until no network requests fire for 500ms
wait_until="domcontentloaded" — waits for the HTML to parse (faster, less safe)
page.wait_for_selector(css) — waits for a specific element
page.wait_for_load_state("networkidle") — waits after an action triggers loading

Extracting Table Data from Dynamic Pages

Dynamic tables are one of the most common scraping targets. Here's how to extract a full HTML table that's rendered by JavaScript:

python

1from playwright.sync_api import sync_playwright
2import csv
3
4with sync_playwright() as p:
5    browser = p.chromium.launch(headless=True)
6    page = browser.new_page()
7    page.goto("https://the-internet.herokuapp.com/tables", wait_until="networkidle")
8
9    # Extract table headers
10    headers = [th.inner_text() for th in page.query_selector_all("table#table1 thead th")]
11
12    # Extract table rows
13    rows = []
14    for tr in page.query_selector_all("table#table1 tbody tr"):
15        cells = [td.inner_text() for td in tr.query_selector_all("td")]
16        rows.append(cells)
17
18    # Save to CSV
19    with open("table_data.csv", "w", newline="", encoding="utf-8") as f:
20        writer = csv.writer(f)
21        writer.writerow(headers)
22        writer.writerows(rows)
23
24    print(f"Extracted {len(rows)} rows")
25    browser.close()

Handling Infinite Scroll

Infinite-scroll pages load content as you scroll down — a pattern used by Twitter/X, LinkedIn, product listing pages, and news feeds. Playwright can simulate scrolling to trigger content loading.

python

1from playwright.sync_api import sync_playwright
2import time
3
4def scrape_infinite_scroll(url: str, max_scrolls: int = 10) -> list[str]:
5    results = []
6
7    with sync_playwright() as p:
8        browser = p.chromium.launch(headless=True)
9        page = browser.new_page()
10        page.goto(url, wait_until="networkidle")
11
12        for scroll_num in range(max_scrolls):
13            # Collect visible items before scrolling
14            items = page.query_selector_all(".item-selector")
15            for item in items:
16                text = item.inner_text().strip()
17                if text and text not in results:
18                    results.append(text)
19
20            # Scroll to the bottom of the page
21            previous_height = page.evaluate("document.body.scrollHeight")
22            page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
23
24            # Wait for new content to load
25            page.wait_for_timeout(2000)
26            new_height = page.evaluate("document.body.scrollHeight")
27
28            # Stop if no new content loaded
29            if new_height == previous_height:
30                print(f"Reached end of page after {scroll_num + 1} scrolls")
31                break
32
33        browser.close()
34
35    return results

Replace ".item-selector" with the actual CSS selector for the content you want to extract. The height comparison trick detects when the page stops loading new content.

Taking Screenshots for Debugging and Monitoring

Screenshots are invaluable for debugging scraper failures and building website monitoring tools.

python

1from playwright.sync_api import sync_playwright
2
3with sync_playwright() as p:
4    browser = p.chromium.launch(headless=True)
5    page = browser.new_page()
6
7    # Set viewport size
8    page.set_viewport_size({"width": 1280, "height": 800})
9    page.goto("https://example.com")
10
11    # Full-page screenshot
12    page.screenshot(path="full_page.png", full_page=True)
13
14    # Screenshot of a specific element only
15    element = page.query_selector("h1")
16    element.screenshot(path="heading.png")
17
18    print("Screenshots saved")
19    browser.close()

Use full-page screenshots when a scraper starts returning unexpected results — a screenshot will instantly show you whether a login wall, CAPTCHA, or layout change broke your selector.

Async Scraping for Speed

Playwright's async API lets you scrape multiple pages concurrently, dramatically cutting total runtime for large scraping jobs.

python

1import asyncio
2import csv
3from playwright.async_api import async_playwright
4
5async def scrape_page(browser, url: str) -> dict:
6    page = await browser.new_page()
7    try:
8        await page.goto(url, wait_until="networkidle", timeout=30000)
9        title = await page.title()
10        # Add your selector-based extraction here
11        content = await page.inner_text("body")
12        return {"url": url, "title": title, "length": len(content)}
13    except Exception as e:
14        print(f"Error scraping {url}: {e}")
15        return {"url": url, "title": "ERROR", "length": 0}
16    finally:
17        await page.close()
18
19async def scrape_all(urls: list[str]) -> list[dict]:
20    async with async_playwright() as p:
21        browser = await p.chromium.launch(headless=True)
22
23        # Limit concurrency to avoid overwhelming the server
24        semaphore = asyncio.Semaphore(5)
25
26        async def scrape_with_limit(url):
27            async with semaphore:
28                return await scrape_page(browser, url)
29
30        results = await asyncio.gather(*[scrape_with_limit(url) for url in urls])
31        await browser.close()
32        return results
33
34# Run it
35urls = [
36    "https://books.toscrape.com/catalogue/page-1.html",
37    "https://books.toscrape.com/catalogue/page-2.html",
38    "https://books.toscrape.com/catalogue/page-3.html",
39]
40
41results = asyncio.run(scrape_all(urls))
42
43with open("results.csv", "w", newline="", encoding="utf-8") as f:
44    writer = csv.DictWriter(f, fieldnames=["url", "title", "length"])
45    writer.writeheader()
46    writer.writerows(results)
47
48print(f"Scraped {len(results)} pages")

The Semaphore(5) limits concurrent browser pages to 5 at a time. Increase it for faster scraping (within reason), or decrease it if the target site rate-limits you.

Anti-Detection Best Practices

Many websites use bot-detection systems (Cloudflare, DataDome, PerimeterX) that look for signs of browser automation. Here are the most effective countermeasures.

1. Use a realistic user agent:

python

1browser = p.chromium.launch(headless=True)
2context = browser.new_context(
3    user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
4               "(KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
5    viewport={"width": 1280, "height": 720},
6    locale="en-US",
7    timezone_id="America/New_York",
8)
9page = context.new_page()

2. Add human-like delays between actions:

python

1import random
2
3# Random delay between 1 and 3 seconds
4page.wait_for_timeout(random.randint(1000, 3000))

3. Use stealth mode with playwright-stealth:

bash

1pip install playwright-stealth

python

1from playwright_stealth import stealth_sync
2
3page = browser.new_page()
4stealth_sync(page)  # Patches browser fingerprint properties
5page.goto("https://target-site.com")

4. Rotate proxies for large-scale scraping:

python

1browser = p.chromium.launch(
2    headless=True,
3    proxy={"server": "http://proxy-host:8080", "username": "user", "password": "pass"}
4)

Important: Always check a website's robots.txt and Terms of Service before scraping. Respect rate limits and don't scrape personal data without authorization.

Putting It All Together: Full Scraper Template

python

1import asyncio
2import csv
3from playwright.async_api import async_playwright
4
5async def main():
6    async with async_playwright() as p:
7        browser = await p.chromium.launch(headless=True)
8        context = await browser.new_context(
9            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
10                       "(KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
11            viewport={"width": 1280, "height": 720},
12        )
13        page = await context.new_page()
14
15        await page.goto("https://books.toscrape.com", wait_until="networkidle")
16        await page.wait_for_selector("article.product_pod")
17
18        books = []
19        articles = await page.query_selector_all("article.product_pod")
20
21        for article in articles:
22            title_el = await article.query_selector("h3 a")
23            price_el = await article.query_selector(".price_color")
24            rating_el = await article.query_selector(".star-rating")
25
26            title = await title_el.get_attribute("title") if title_el else "N/A"
27            price = await price_el.inner_text() if price_el else "N/A"
28            rating = await rating_el.get_attribute("class") if rating_el else "N/A"
29
30            books.append({"title": title, "price": price, "rating": rating})
31
32        with open("books.csv", "w", newline="", encoding="utf-8") as f:
33            writer = csv.DictWriter(f, fieldnames=["title", "price", "rating"])
34            writer.writeheader()
35            writer.writerows(books)
36
37        print(f"Saved {len(books)} books to books.csv")
38        await browser.close()
39
40asyncio.run(main())

Best Practices and Common Mistakes

Always use context managers — the with sync_playwright() and async with async_playwright() patterns ensure browsers are properly closed even if your script crashes.

Set timeouts — default timeout is 30 seconds. For slow sites, increase it: page.set_default_timeout(60000).

Handle errors gracefully — wrap page.goto() in try/except blocks for production scrapers. Network errors and timeouts are inevitable at scale.

Don't forget to close pages — in async mode, always call await page.close() when done with a page. Unclosed pages consume memory.

Test selectors in browser DevTools — before writing code, open the browser console and test your CSS/XPath selectors with document.querySelector(). This saves hours of debugging.

Conclusion

Start with the sync API to understand the basics, then migrate to async once you need speed. Your BeautifulSoup days aren't over — but for anything JavaScript-rendered, reach for Playwright first.

Python Playwright Web Scraping: Automate Any Website in 2026

Python Playwright Web Scraping: Automate Any Website in 2026

Why Playwright Beats BeautifulSoup and Selenium for Modern Sites

Playwright vs Selenium: Quick Comparison

Installation and Setup

Headless vs Headful Mode

Navigating Pages and Waiting for Elements

Extracting Table Data from Dynamic Pages

Handling Infinite Scroll

Taking Screenshots for Debugging and Monitoring

Async Scraping for Speed

Anti-Detection Best Practices

Putting It All Together: Full Scraper Template

Best Practices and Common Mistakes

Conclusion

Frequently Asked Questions

Share this article

Python Playwright Web Scraping: Automate Any Website in 2026

Python Playwright Web Scraping: Automate Any Website in 2026

Why Playwright Beats BeautifulSoup and Selenium for Modern Sites

Playwright vs Selenium: Quick Comparison

Installation and Setup

Headless vs Headful Mode

Navigating Pages and Waiting for Elements

Extracting Table Data from Dynamic Pages

Handling Infinite Scroll

Taking Screenshots for Debugging and Monitoring

Async Scraping for Speed

Anti-Detection Best Practices

Putting It All Together: Full Scraper Template

Best Practices and Common Mistakes

Conclusion

Frequently Asked Questions

Share this article