> ## Documentation Index
> Fetch the complete documentation index at: https://docs.uplink.build/llms.txt
> Use this file to discover all available pages before exploring further.

# Type-Safe Data Extraction

> Extract structured data from web pages using AI with Zod schema validation

The `page.extract()` method uses AI to extract structured data from web pages. You can optionally provide a Zod schema for type-safe extraction with automatic validation.

## Signature

```typescript theme={null}
// With schema (type-safe)
page.extract<T>(
  instruction: string,
  schema: z.ZodType<T>
): Promise<ExtractResult<T>>

// Without schema (string extraction)
page.extract(
  instruction: string
): Promise<ExtractResult<{ extraction: string }>>
```

**Parameters:**

* `instruction` - Natural language description of what data to extract
* `schema` (optional) - Zod schema for validation and type safety

**Returns:** `Promise<ExtractResult<T>>`

```typescript theme={null}
{
  success: boolean
  data?: T              // Typed based on your schema
  error?: string
  reasoning?: string    // AI's explanation
}
```

## Installation

You'll need to install Zod for schema-based extraction:

```bash theme={null}
npm install zod
```

## Basic Examples

### Simple Text Extraction

Extract text without a schema:

```typescript theme={null}
await page.goto('https://example.com/article')

const result = await page.extract('Extract the article title')

if (result.success) {
  console.log(result.data.extraction)
  // "How to Build Better Software"
}
```

### Single Object Extraction

Extract a structured object with type safety:

```typescript theme={null}
import { z } from 'zod'

const productSchema = z.object({
  name: z.string(),
  price: z.number(),
  inStock: z.boolean()
})

await page.goto('https://shop.example.com/product/123')

const result = await page.extract(
  'Extract the product information',
  productSchema
)

if (result.success && result.data) {
  console.log(result.data.name)      // string
  console.log(result.data.price)     // number
  console.log(result.data.inStock)   // boolean
  // TypeScript knows the exact types!
}
```

## Intermediate Examples

### Multiple Fields

Extract complex objects with many fields:

```typescript theme={null}
const userSchema = z.object({
  name: z.string(),
  email: z.string().email(),
  joinDate: z.string(),
  verified: z.boolean(),
  profileImage: z.string().url()
})

await page.goto('https://example.com/profile/johndoe')

const result = await page.extract(
  'Extract user profile information',
  userSchema
)

if (result.success && result.data) {
  const user = result.data
  console.log(`${user.name} (${user.email})`)
  console.log(`Joined: ${user.joinDate}`)
  console.log(`Verified: ${user.verified}`)
}
```

### Optional Fields

Handle optional data with Zod:

```typescript theme={null}
const articleSchema = z.object({
  title: z.string(),
  author: z.string(),
  publishDate: z.string(),
  updatedDate: z.string().optional(),  // May not be present
  tags: z.array(z.string()).optional(),
  readTime: z.number().optional()
})

const result = await page.extract(
  'Extract article metadata',
  articleSchema
)
```

## Advanced Examples

### Array Extraction

Extract lists of items:

```typescript theme={null}
const searchResultsSchema = z.object({
  query: z.string(),
  results: z.array(z.object({
    title: z.string(),
    url: z.string().url(),
    description: z.string(),
    price: z.number().optional()
  })),
  totalResults: z.number()
})

await page.goto('https://example.com/search?q=laptop')

const result = await page.extract(
  'Extract all search results with their details',
  searchResultsSchema
)

if (result.success && result.data) {
  console.log(`Found ${result.data.totalResults} results for "${result.data.query}"`)

  result.data.results.forEach(item => {
    console.log(`${item.title} - $${item.price}`)
    console.log(item.url)
  })
}
```

### Nested Objects

Extract complex nested data structures:

```typescript theme={null}
const restaurantSchema = z.object({
  name: z.string(),
  rating: z.number(),
  priceLevel: z.string(),
  address: z.object({
    street: z.string(),
    city: z.string(),
    state: z.string(),
    zip: z.string()
  }),
  hours: z.object({
    monday: z.string(),
    tuesday: z.string(),
    wednesday: z.string(),
    thursday: z.string(),
    friday: z.string(),
    saturday: z.string(),
    sunday: z.string()
  }),
  reviews: z.array(z.object({
    author: z.string(),
    rating: z.number(),
    text: z.string(),
    date: z.string()
  }))
})

const result = await page.extract(
  'Extract complete restaurant information including address, hours, and recent reviews',
  restaurantSchema
)
```

### Table Extraction

Extract data from HTML tables:

```typescript theme={null}
const tableSchema = z.object({
  headers: z.array(z.string()),
  rows: z.array(z.array(z.string()))
})

await page.goto('https://example.com/data')

const result = await page.extract(
  'Extract the pricing table data',
  tableSchema
)

if (result.success && result.data) {
  // Print as CSV
  console.log(result.data.headers.join(','))
  result.data.rows.forEach(row => {
    console.log(row.join(','))
  })
}
```

## Zod Schema Primer

### Basic Types

```typescript theme={null}
import { z } from 'zod'

z.string()              // string
z.number()              // number
z.boolean()             // boolean
z.date()                // Date object
z.string().url()        // URL string
z.string().email()      // Email string
z.string().uuid()       // UUID string
z.literal('specific')   // Exact value
z.enum(['a', 'b', 'c']) // One of several values
```

### Optional and Nullable

```typescript theme={null}
z.string().optional()         // string | undefined
z.string().nullable()         // string | null
z.string().nullish()          // string | null | undefined
z.string().default('hello')   // string with default value
```

### Arrays and Objects

```typescript theme={null}
z.array(z.string())           // string[]
z.object({                    // { name: string, age: number }
  name: z.string(),
  age: z.number()
})
```

### Validation

```typescript theme={null}
z.string().min(3)             // At least 3 characters
z.string().max(100)           // At most 100 characters
z.number().positive()         // Must be positive
z.number().int()              // Must be integer
z.number().min(0).max(100)    // Between 0 and 100
```

## Error Handling

Handle validation errors gracefully:

```typescript theme={null}
const schema = z.object({
  price: z.number().positive(),
  email: z.string().email()
})

const result = await page.extract(
  'Extract product price and contact email',
  schema
)

if (result.success && result.data) {
  // Data is valid and typed
  console.log('Price:', result.data.price)
  console.log('Email:', result.data.email)
} else {
  // Extraction failed or validation failed
  console.error('Extraction error:', result.error)

  // See AI reasoning
  if (result.reasoning) {
    console.log('AI reasoning:', result.reasoning)
  }
}
```

## Best Practices

### Be Specific in Instructions

<Tip>
  Clear instructions lead to better extraction results
</Tip>

**Good:**

```typescript theme={null}
await page.extract(
  'Extract the product name, price in USD, and availability status from the product details section',
  productSchema
)
```

**Bad:**

```typescript theme={null}
await page.extract('Get the info', productSchema) // Too vague
```

### Design Schemas Carefully

Match your schema to the actual data structure:

```typescript theme={null}
// If prices include currency symbols
const priceSchema = z.string() // "$29.99"
// Not: z.number() (would fail validation)

// Or extract and parse
const priceSchema = z.string().transform(str =>
  parseFloat(str.replace(/[$,]/g, ''))
)
```

### Handle Missing Data

Use optional fields for data that might not be present:

```typescript theme={null}
const schema = z.object({
  title: z.string(),
  // These might not always be present
  subtitle: z.string().optional(),
  author: z.string().optional(),
  rating: z.number().optional()
})
```

### Test with Real Pages

Always test extraction with actual pages:

```typescript theme={null}
// Test extraction
const result = await page.extract(instruction, schema)

if (!result.success) {
  console.error('Extraction failed:', result.error)
  // Adjust instruction or schema
}
```

## Common Use Cases

### E-commerce Product Data

```typescript theme={null}
const productSchema = z.object({
  name: z.string(),
  brand: z.string(),
  price: z.number(),
  originalPrice: z.number().optional(),
  discount: z.number().optional(),
  rating: z.number(),
  reviewCount: z.number(),
  inStock: z.boolean(),
  images: z.array(z.string().url()),
  description: z.string()
})

const result = await page.extract(
  'Extract all product details',
  productSchema
)
```

### Article Metadata

```typescript theme={null}
const articleSchema = z.object({
  headline: z.string(),
  subheadline: z.string().optional(),
  author: z.string(),
  publishDate: z.string(),
  readingTime: z.number(),
  tags: z.array(z.string()),
  summary: z.string()
})

const result = await page.extract(
  'Extract article metadata and summary',
  articleSchema
)
```

### Contact Information

```typescript theme={null}
const contactSchema = z.object({
  name: z.string(),
  email: z.string().email().optional(),
  phone: z.string().optional(),
  address: z.string().optional(),
  website: z.string().url().optional(),
  socialMedia: z.object({
    twitter: z.string().optional(),
    linkedin: z.string().optional(),
    facebook: z.string().optional()
  }).optional()
})

const result = await page.extract(
  'Extract contact information from the page',
  contactSchema
)
```

### Reviews and Ratings

```typescript theme={null}
const reviewsSchema = z.object({
  overallRating: z.number(),
  totalReviews: z.number(),
  reviews: z.array(z.object({
    author: z.string(),
    rating: z.number(),
    title: z.string(),
    text: z.string(),
    date: z.string(),
    helpful: z.number().optional()
  }))
})

const result = await page.extract(
  'Extract product reviews and ratings',
  reviewsSchema
)
```

## Performance Tips

1. **Be specific** - Clear instructions reduce processing time
2. **Use appropriate schemas** - Don't over-complicate schemas
3. **Extract once** - Cache results instead of re-extracting
4. **Batch extraction** - Extract multiple fields at once rather than separate calls

## Limitations

<Warning>
  AI extraction has some limitations:

  * Requires API calls (adds latency and cost)
  * May not work on obfuscated or heavily JavaScript-rendered content
  * Accuracy depends on page structure and instruction clarity
  * Rate limits apply based on your AI provider
</Warning>

## Related

<CardGroup cols={2}>
  <Card title="Natural Language Actions" icon="wand-magic-sparkles" href="./act">
    Perform actions with page.act()
  </Card>

  <Card title="AI Setup" icon="gear" href="./setup">
    Configure AI agents and providers
  </Card>

  <Card title="Best Practices" icon="lightbulb" href="./best-practices">
    Effective AI automation patterns
  </Card>

  <Card title="JavaScript Evaluation" icon="code" href="../page/javascript">
    Manual data extraction with evaluate()
  </Card>
</CardGroup>
