Skip to main content
The page.act() method allows you to control the browser using natural language instructions. The AI interprets your instruction, determines what actions to take, and executes them automatically.

Signature

page.act(instruction: string): Promise<ActResult>
Parameters:
  • instruction - Natural language description of what you want to do
Returns: Promise<ActResult>
{
  success: boolean
  actions: Array<{
    type: string        // 'click', 'input', 'goto', etc.
    params: unknown[]
    result?: unknown
    error?: string
  }>
  reasoning?: string
}

How It Works

When you call page.act():
  1. AI captures the current page state (DOM, visible elements, form fields, etc.)
  2. Your instruction is sent to the AI provider with page context
  3. AI determines which actions to perform (click, input, scroll, etc.)
  4. Actions are executed sequentially on the page
  5. Result is returned with details about what was done

Basic Examples

Simple Click Actions

await page.goto('https://example.com')

// Click a button
await page.act('Click the sign up button')

// Click a link
await page.act('Click the privacy policy link')

// Click by description
await page.act('Click the blue submit button at the bottom')

Form Interactions

// Fill a single field
await page.act('Fill in the email field with [email protected]')

// Fill multiple fields
await page.act('Enter username "johndoe" and password "secret123"')

// Select from dropdown
await page.act('Select "United States" from the country dropdown')

// Check a checkbox
await page.act('Check the "I agree to terms" checkbox')

Intermediate Examples

Multi-Step Workflows

await page.goto('https://example.com/login')

// AI performs multiple actions in sequence
await page.act('Fill in the login form with username "john" and password "pass123", then click login')

Conditional Actions

await page.goto('https://example.com/products')

// AI finds and interacts with dynamic elements
await page.act('If there is a cookie consent banner, click accept')

await page.act('Find the product with the lowest price and click on it')
// Scroll to find elements
await page.act('Scroll down to the footer and click the contact link')

// Navigate through pages
await page.act('Click the next page button')

Advanced Examples

Complex Interactions

await page.goto('https://example.com/checkout')

// Multi-step form completion
const result = await page.act(`
  Fill out the shipping form with:
  - Name: John Doe
  - Address: 123 Main St
  - City: San Francisco
  - ZIP: 94102
  Then click continue
`)

if (result.success) {
  console.log('Form completed successfully')
  console.log('Actions taken:', result.actions)
}

Dynamic Content Handling

// Wait and interact with loaded content
await page.act('Wait for the search results to load, then click the first result')

// Handle popups
await page.act('If a signup popup appears, close it')

// Interact with carousels
await page.act('Click the right arrow on the image carousel 3 times')

Search and Filter

await page.goto('https://example.com/products')

await page.act('Enter "laptop" in the search box and press enter')
await page.act('Filter results by price: $500-$1000')
await page.act('Sort by highest rating')

Best Practices

Write Clear Instructions

Be specific and descriptive. Instead of “click the button”, say “click the blue submit button at the bottom of the form”
Good:
await page.act('Click the red "Add to Cart" button next to the product image')
await page.act('Fill in the email field in the login form with [email protected]')
Bad:
await page.act('Click it') // Too vague
await page.act('Do the thing') // Not specific

Break Down Complex Tasks

For very complex workflows, break them into steps:
// Instead of one giant instruction
await page.act('Fill in the form')
await page.act('Select shipping method')
await page.act('Click continue to payment')

Handle Errors

Check the result and handle failures:
const result = await page.act('Click the checkout button')

if (!result.success) {
  console.error('Action failed:', result.actions)
  // Try alternative approach or manual automation
}

Combine with Manual Automation

Mix AI actions with traditional automation for best results:
// Use AI for dynamic interactions
await page.act('Accept cookie consent if it appears')

// Use manual automation for precise control
await page.click('#precise-selector')
await page.input('#email', '[email protected]')

// Use AI for complex extraction
await page.act('Find and click the product with best rating')

Common Use Cases

Authentication Flows

await page.goto('https://example.com/login')
await page.act('Click the "Sign in with Google" button')
// Handle OAuth popup...
await page.act('Click the authorize button')

Form Filling

await page.act(`
  Complete the contact form with:
  Name: John Doe
  Email: [email protected]
  Message: I would like more information
  Then submit the form
`)

Shopping and E-commerce

await page.goto('https://shop.example.com')
await page.act('Search for "wireless headphones"')
await page.act('Filter by price under $100')
await page.act('Click on the product with the highest rating')
await page.act('Select black color and add to cart')

Content Navigation

await page.goto('https://news.example.com')
await page.act('Find and click the article about technology')
await page.act('Scroll down and click "Load more comments"')

Error Handling

The ActResult includes details about what happened:
const result = await page.act('Click the submit button')

if (result.success) {
  console.log('✓ Action completed successfully')

  // See what actions were performed
  result.actions.forEach(action => {
    console.log(`${action.type}(${action.params.join(', ')})`)
  })

  // View AI reasoning (if available)
  if (result.reasoning) {
    console.log('AI reasoning:', result.reasoning)
  }
} else {
  console.error('✗ Action failed')

  // Check which step failed
  const failedAction = result.actions.find(a => a.error)
  if (failedAction) {
    console.error('Failed at:', failedAction.type, failedAction.error)
  }
}

Limitations

AI actions are powerful but have limitations:
  • Require API calls (adds latency and cost)
  • May not work on heavily obfuscated UIs
  • Success rate varies by page complexity
  • Not suitable for precise pixel-level interactions
When AI actions fail, fall back to traditional selector-based automation.

Performance Tips

  1. Use specific instructions - Reduces AI thinking time
  2. Combine actions - One instruction with multiple steps is faster than multiple calls
  3. Cache page state - If performing multiple actions on same page
  4. Use Haiku model - Faster for simple actions