Playwright Scraper Skill πŸ•·οΈ

License: MIT Node.js Playwright

δΈ­ζ–‡ζ–‡ζͺ” | English

A Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex websites like Discuss.com.hk.

πŸ“¦ Installation: See INSTALL.md
πŸ“š Full Documentation: See SKILL.md
πŸ’‘ Examples: See README.md


✨ Features

  • βœ… Pure Playwright β€” Modern, powerful, easy to use
  • βœ… Anti-Bot Protection β€” Hides automation, realistic UA
  • βœ… Verified β€” 100% success on Discuss.com.hk
  • βœ… Simple to Use β€” One-line commands
  • βœ… Customizable β€” Environment variable support

πŸš€ Quick Start

Installation

npm install
npx playwright install chromium

Usage

# Quick scraping
node scripts/playwright-simple.js https://example.com
 
# Stealth mode (recommended)
node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"

πŸ“– Two Modes

ModeUse CaseSpeedAnti-Bot
SimpleRegular dynamic sitesFast (3-5s)None
Stealth ⭐Sites with anti-botMedium (5-20s)Medium-High

Simple Mode

For sites without anti-bot protection:

node scripts/playwright-simple.js <URL>

For sites with Cloudflare or anti-bot protection:

node scripts/playwright-stealth.js <URL>

Anti-Bot Techniques:

  • Hide navigator.webdriver
  • Realistic User-Agent (iPhone)
  • Human-like behavior simulation
  • Screenshot and HTML saving support

🎯 Customization

All scripts support environment variables:

# Show browser
HEADLESS=false node scripts/playwright-stealth.js <URL>
 
# Custom wait time (milliseconds)
WAIT_TIME=10000 node scripts/playwright-stealth.js <URL>
 
# Save screenshot
SCREENSHOT_PATH=/tmp/page.png node scripts/playwright-stealth.js <URL>
 
# Save HTML
SAVE_HTML=true node scripts/playwright-stealth.js <URL>
 
# Custom User-Agent
USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js <URL>

πŸ“Š Test Results

WebsiteResultTime
Discuss.com.hkβœ… 200 OK5-20s
Example.comβœ… 200 OK3-5s
Cloudflare Protectedβœ… Mostly successful10-30s

πŸ“ File Structure

playwright-scraper-skill/
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ playwright-simple.js       # Simple mode
β”‚   └── playwright-stealth.js      # Stealth mode ⭐
β”œβ”€β”€ examples/
β”‚   β”œβ”€β”€ discuss-hk.sh              # Discuss.com.hk example
β”‚   └── README.md                  # More examples
β”œβ”€β”€ SKILL.md                       # Full documentation
β”œβ”€β”€ INSTALL.md                     # Installation guide
β”œβ”€β”€ README.md                      # This file
β”œβ”€β”€ README_ZH.md                   # Chinese documentation
β”œβ”€β”€ CONTRIBUTING.md                # Contribution guide
β”œβ”€β”€ CHANGELOG.md                   # Version history
└── package.json                   # npm config

πŸ’‘ Best Practices

  1. Try web_fetch first β€” OpenClaw’s built-in tool is fastest
  2. Use Simple for dynamic sites β€” When no anti-bot protection
  3. Use Stealth for protected sites ⭐ β€” Main workhorse
  4. Use specialized skills β€” For YouTube, Reddit, etc.

πŸ› Troubleshooting

Getting 403 blocked?

Use Stealth mode:

node scripts/playwright-stealth.js <URL>

Cloudflare challenge?

Increase wait time + headful mode:

HEADLESS=false WAIT_TIME=30000 node scripts/playwright-stealth.js <URL>

Playwright not found?

Reinstall:

npm install
npx playwright install chromium

More issues? See INSTALL.md


🀝 Contributing

Contributions welcome! See CONTRIBUTING.md


πŸ“„ License

MIT License - See LICENSE