Playwright Scraper Skill π·οΈ
δΈζζζͺ | English
A Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex websites like Discuss.com.hk.
π¦ Installation: See INSTALL.md
π Full Documentation: See SKILL.md
π‘ Examples: See README.md
β¨ Features
- β Pure Playwright β Modern, powerful, easy to use
- β Anti-Bot Protection β Hides automation, realistic UA
- β Verified β 100% success on Discuss.com.hk
- β Simple to Use β One-line commands
- β Customizable β Environment variable support
π Quick Start
Installation
npm install
npx playwright install chromiumUsage
# Quick scraping
node scripts/playwright-simple.js https://example.com
# Stealth mode (recommended)
node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"π Two Modes
| Mode | Use Case | Speed | Anti-Bot |
|---|---|---|---|
| Simple | Regular dynamic sites | Fast (3-5s) | None |
| Stealth β | Sites with anti-bot | Medium (5-20s) | Medium-High |
Simple Mode
For sites without anti-bot protection:
node scripts/playwright-simple.js <URL>Stealth Mode (Recommended)
For sites with Cloudflare or anti-bot protection:
node scripts/playwright-stealth.js <URL>Anti-Bot Techniques:
- Hide
navigator.webdriver - Realistic User-Agent (iPhone)
- Human-like behavior simulation
- Screenshot and HTML saving support
π― Customization
All scripts support environment variables:
# Show browser
HEADLESS=false node scripts/playwright-stealth.js <URL>
# Custom wait time (milliseconds)
WAIT_TIME=10000 node scripts/playwright-stealth.js <URL>
# Save screenshot
SCREENSHOT_PATH=/tmp/page.png node scripts/playwright-stealth.js <URL>
# Save HTML
SAVE_HTML=true node scripts/playwright-stealth.js <URL>
# Custom User-Agent
USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js <URL>π Test Results
| Website | Result | Time |
|---|---|---|
| Discuss.com.hk | β 200 OK | 5-20s |
| Example.com | β 200 OK | 3-5s |
| Cloudflare Protected | β Mostly successful | 10-30s |
π File Structure
playwright-scraper-skill/
βββ scripts/
β βββ playwright-simple.js # Simple mode
β βββ playwright-stealth.js # Stealth mode β
βββ examples/
β βββ discuss-hk.sh # Discuss.com.hk example
β βββ README.md # More examples
βββ SKILL.md # Full documentation
βββ INSTALL.md # Installation guide
βββ README.md # This file
βββ README_ZH.md # Chinese documentation
βββ CONTRIBUTING.md # Contribution guide
βββ CHANGELOG.md # Version history
βββ package.json # npm config
π‘ Best Practices
- Try web_fetch first β OpenClawβs built-in tool is fastest
- Use Simple for dynamic sites β When no anti-bot protection
- Use Stealth for protected sites β β Main workhorse
- Use specialized skills β For YouTube, Reddit, etc.
π Troubleshooting
Getting 403 blocked?
Use Stealth mode:
node scripts/playwright-stealth.js <URL>Cloudflare challenge?
Increase wait time + headful mode:
HEADLESS=false WAIT_TIME=30000 node scripts/playwright-stealth.js <URL>Playwright not found?
Reinstall:
npm install
npx playwright install chromiumMore issues? See INSTALL.md
π€ Contributing
Contributions welcome! See CONTRIBUTING.md
π License
MIT License - See LICENSE