Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
321d4c1
feat: Add bulk embedding and ingest mode
cursoragent Dec 9, 2025
a6f8a71
Merge branch 'main' into cursor/bulk-embed-and-pools-f93e
techiejd Dec 16, 2025
f71a674
Better API
techiejd Dec 16, 2025
3dc508d
WIP
techiejd Dec 16, 2025
a773cc1
WIP
techiejd Dec 18, 2025
87f78de
WIP
techiejd Dec 19, 2025
c3ea2ea
Moves metadata into its own table so it doesn't need to be stored in …
techiejd Dec 20, 2025
e847a9b
WIP
techiejd Dec 20, 2025
5a89415
WIP
techiejd Dec 21, 2025
69363a9
WIP
techiejd Dec 21, 2025
59479c2
Merge branch 'main' into cursor/bulk-embed-and-pools-f93e
techiejd Jan 1, 2026
c04f5ad
Merge branch 'main' into cursor/bulk-embed-and-pools-f93e
techiejd Jan 3, 2026
ec579d2
Uses new embeddingConfig API
techiejd Jan 3, 2026
51b4789
Removes first run onInit
techiejd Jan 4, 2026
64b8a8f
Adds better batch streaming
techiejd Jan 4, 2026
73a664f
WIP
techiejd Jan 5, 2026
0ddd738
WIP
techiejd Jan 6, 2026
c97f6ae
WIP
techiejd Jan 6, 2026
58317df
WIP
techiejd Jan 6, 2026
165b0f3
Ends parralellism to see if that's what's failing in github CI
techiejd Jan 6, 2026
c31b006
WIP
techiejd Jan 7, 2026
ea04198
WIP
techiejd Jan 8, 2026
1aaf52c
Adds CI browser
techiejd Jan 8, 2026
91d0bf7
Runs sequentially so the tests pass in CI
techiejd Jan 8, 2026
95bdb71
increases timeout since tests are in parallel now
techiejd Jan 8, 2026
a64b9fa
Merge branch 'main' into cursor/bulk-embed-and-pools-f93e
techiejd Jan 8, 2026
3a7b73c
Better explanation and leaner API
techiejd Jan 9, 2026
306cd31
WIP
techiejd Jan 10, 2026
a8ebcd5
Merge branch 'main' into cursor/bulk-embed-and-pools-f93e
techiejd Jan 10, 2026
b60af9f
WIP
techiejd Jan 11, 2026
313ef3f
adds import map
techiejd Jan 11, 2026
1db2601
assigns the extra funcs to the payload instance
techiejd Jan 11, 2026
2ca94b5
WIP
techiejd Jan 11, 2026
22661e8
WIP
techiejd Jan 12, 2026
3a02647
fixes tests
techiejd Jan 12, 2026
c2d745b
Adds better retry stragey
techiejd Jan 12, 2026
a8efd82
Increases test time
techiejd Jan 12, 2026
5fdd48a
WIP
techiejd Jan 12, 2026
5194ce9
WIP
techiejd Jan 12, 2026
171411a
Fixes tests WIP
techiejd Jan 13, 2026
da8965c
betters embed
techiejd Jan 13, 2026
9861c90
WIP
techiejd Jan 13, 2026
fff3ef5
WIP
techiejd Jan 13, 2026
f061481
Clean up
techiejd Jan 15, 2026
0ecd01c
trying to fix CI tests
techiejd Jan 15, 2026
b3312f3
Clean up
techiejd Jan 15, 2026
8c75e1a
Better bulkEmbedAll
techiejd Jan 15, 2026
2d15b68
new Readme
techiejd Jan 15, 2026
4609ef5
Working on adding migrations
techiejd Jan 16, 2026
540c154
Merge branch 'main' into makingMigrations
techiejd Jan 16, 2026
2c8238a
WIP
techiejd Jan 17, 2026
a5abfdd
WIP
techiejd Jan 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 96 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,41 @@ export default buildConfig({

The import map tells Payload how to resolve component paths (like `'payloadcms-vectorize/client#EmbedAllButton'`) to actual React components. Without it, client components referenced in your collection configs won't render.

### 2. Search Your Content
**⚠️ Important:** Run this command:

- After initial plugin setup
- If the "Embed all" button doesn't appear in the admin UI

The import map tells Payload how to resolve component paths (like `'payloadcms-vectorize/client#EmbedAllButton'`) to actual React components. Without it, client components referenced in your collection configs won't render.

### 2. Initial Migration Setup

After configuring the plugin, you need to create an initial migration to set up the IVFFLAT indexes in your database.

**For new setups:**

1. Create your initial Payload migration (this will include the embedding columns via Drizzle schema):

```bash
pnpm payload migrate:create --name initial
```

2. Use the migration CLI helper to add IVFFLAT index setup:

```bash
pnpm payload vectorize:migrate
```

The CLI automatically extracts your static configs from the Payload config and patches the migration file with the necessary IVFFLAT index creation SQL.

3. Review and apply the migration:
```bash
pnpm payload migrate
```

**Note:** The embedding columns are created automatically by Drizzle via the `afterSchemaInitHook`, but the IVFFLAT indexes need to be added via migrations for proper schema management.

### 3. Search Your Content

The plugin automatically creates a `/api/vector-search` endpoint:

Expand Down Expand Up @@ -419,6 +453,67 @@ jobs: {
}
```

## Changing Static Config (ivfflatLists or dims) & Migrations

**⚠️ Important:** Changing `dims` is **destructive** - it requires re-embedding all your data. Changing `ivfflatLists` rebuilds the index (non-destructive but may take time).

When you change static config values (`dims` or `ivfflatLists`):

1. **Update your static config** in `payload.config.ts`:

```typescript
const { afterSchemaInitHook, payloadcmsVectorize } = createVectorizeIntegration({
mainKnowledgePool: {
dims: 1536, // Changed from previous value
ivfflatLists: 200, // Changed from previous value
},
})
```

2. **Create a migration** using the CLI helper:

```bash
pnpm payload vectorize:migrate
```

The CLI will:
- Detect changes in your static configs
- Create a new Payload migration using `payload.db.createMigration`
- Patch it with appropriate SQL:
- **If `ivfflatLists` changed**: Rebuilds the IVFFLAT index with the new `lists` parameter (DROP + CREATE INDEX)
- **If `dims` changed**: Truncates the embeddings table (destructive - you'll need to re-embed)

3. **Review the migration file** in `src/migrations/` - it will be named something like `*_vectorize-config.ts`

4. **Apply the migration**:

```bash
pnpm payload migrate
```

5. **If `dims` changed**: Re-embed all your documents using the bulk embed feature.

**Schema name qualification:**

The CLI automatically uses the `schemaName` from your Postgres adapter configuration. If you use a custom schema (e.g., `postgresAdapter({ schemaName: 'custom' })`), all SQL in the migration will be properly qualified with that schema name.

**Idempotency:**

Running `pnpm payload vectorize:migrate` multiple times with no config changes will not create duplicate migrations. The CLI detects when no changes are needed and exits early.

**Development workflow:**

During development, you may want to disable Payload's automatic schema push to ensure migrations are used:

- Set `migrations: { disableAutomaticMigrations: true }` in your Payload config, or
- Avoid using `pnpm payload migrate:status --force` which auto-generates migrations

This ensures your vector-specific migrations are properly applied.

**Runtime behavior:**

The `ensurePgvectorArtifacts` function is now **presence-only** - it checks that pgvector artifacts (extension, column, index) exist but does not create or modify them. If artifacts are missing, it throws descriptive errors prompting you to run migrations. This ensures migrations are the single source of truth for schema changes.

### Endpoints

#### POST `/api/vector-bulk-embed`
Expand Down Expand Up @@ -921,7 +1016,6 @@ Thank you for the stars! The following updates have been completed:

The following features are planned for future releases based on community interest and stars:

- **Migrations for vector dimensions**: Easy migration tools for changing vector dimensions and/or ivfflatLists after initial setup
- **MongoDB support**: Extend vector search capabilities to MongoDB databases
- **Vercel support**: Optimized deployment and configuration for Vercel hosting

Expand Down
22 changes: 14 additions & 8 deletions dev/specs/chunkers.spec.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
import { getPayload } from 'payload'
import { beforeAll, describe, expect, test } from 'vitest'
import { describe, expect, test } from 'vitest'
import { chunkText, chunkRichText } from 'helpers/chunkers.js'
import { postgresAdapter } from '@payloadcms/db-postgres'
import { buildDummyConfig, getInitialMarkdownContent, integration } from './constants.js'
import { createTestDb } from './utils.js'
import { createTestDb, initializePayloadWithMigrations, createTestMigrationsDir } from './utils.js'

describe('Chunkers', () => {
test('textChunker', () => {
Expand All @@ -17,20 +16,27 @@ describe('Chunkers', () => {
})

test('richTextChunker splits by H2', async () => {
beforeAll(async () => {
createTestDb({ dbName: 'chunkers_test' })
})
const dbName = 'chunkers_test'
await createTestDb({ dbName })
const { migrationsDir } = createTestMigrationsDir(dbName)

const cfg = await buildDummyConfig({
db: postgresAdapter({
extensions: ['vector'],
afterSchemaInit: [integration.afterSchemaInitHook],
migrationDir: migrationsDir,
push: false,
pool: {
connectionString: 'postgresql://postgres:password@localhost:5433/chunkers_test',
connectionString: `postgresql://postgres:password@localhost:5433/${dbName}`,
},
}),
})
const markdownContent = await getInitialMarkdownContent(cfg)
const thisPayload = await getPayload({ config: cfg })

const thisPayload = await initializePayloadWithMigrations({
config: cfg,
key: `chunkers-test-${Date.now()}`,
})
const chunks = await chunkRichText(markdownContent, thisPayload)

expect(chunks.length).toBe(3)
Expand Down
19 changes: 16 additions & 3 deletions dev/specs/extensionFields.spec.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
import type { Payload } from 'payload'
import { getPayload } from 'payload'
import { beforeAll, describe, expect, test } from 'vitest'
import { postgresAdapter } from '@payloadcms/db-postgres'
import { buildDummyConfig, integration, plugin } from './constants.js'
import { createTestDb, waitForVectorizationJobs } from './utils.js'
import {
createTestDb,
waitForVectorizationJobs,
initializePayloadWithMigrations,
createTestMigrationsDir,
} from './utils.js'
import { PostgresPayload } from '../../src/types.js'
import { chunkText, chunkRichText } from 'helpers/chunkers.js'
import { makeDummyEmbedDocs, makeDummyEmbedQuery, testEmbeddingVersion } from 'helpers/embed.js'
Expand All @@ -15,6 +19,8 @@

beforeAll(async () => {
await createTestDb({ dbName })
const { migrationsDir } = createTestMigrationsDir(dbName)

const config = await buildDummyConfig({
jobs: {
tasks: [],
Expand All @@ -39,6 +45,8 @@
db: postgresAdapter({
extensions: ['vector'],
afterSchemaInit: [integration.afterSchemaInitHook],
migrationDir: migrationsDir,
push: false,
pool: {
connectionString: `postgresql://postgres:password@localhost:5433/${dbName}`,
},
Expand Down Expand Up @@ -104,7 +112,12 @@
}),
],
})
payload = await getPayload({ config, cron: true })

payload = await initializePayloadWithMigrations({
config,
key: `extension-fields-test-${Date.now()}`,
cron: true,
})
})

test('extension fields are added to the embeddings table schema', async () => {
Expand Down Expand Up @@ -163,7 +176,7 @@
},
})

expect(embeddings.docs.length).toBeGreaterThan(0)

Check failure on line 179 in dev/specs/extensionFields.spec.ts

View workflow job for this annotation

GitHub Actions / test

dev/specs/extensionFields.spec.ts > Extension fields integration tests > extension field values are stored with embeddings

AssertionError: expected 0 to be greater than 0 ❯ dev/specs/extensionFields.spec.ts:179:36
expect(embeddings.docs[0]).toHaveProperty('category', 'tech')
expect(embeddings.docs[0]).toHaveProperty('priority', 5)
})
Expand Down
23 changes: 18 additions & 5 deletions dev/specs/extensionFieldsVectorSearch.spec.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
import { getPayload } from 'payload'
import { describe, expect, test } from 'vitest'
import { makeDummyEmbedDocs, makeDummyEmbedQuery, testEmbeddingVersion } from 'helpers/embed.js'
import { buildDummyConfig, DIMS, integration, plugin } from './constants.js'
import { createTestDb, waitForVectorizationJobs } from './utils.js'
import {
createTestDb,
waitForVectorizationJobs,
initializePayloadWithMigrations,
createTestMigrationsDir,
} from './utils.js'
import { postgresAdapter } from '@payloadcms/db-postgres'
import { chunkRichText, chunkText } from 'helpers/chunkers.js'
import { createVectorSearchHandlers } from '../../src/endpoints/vectorSearch.js'
Expand All @@ -11,7 +15,9 @@
describe('extensionFields', () => {
test('returns extensionFields in search results with correct types', async () => {
// Create a new payload instance with extensionFields
await createTestDb({ dbName: 'endpoint_test_extension' })
const dbName = 'endpoint_test_extension'
await createTestDb({ dbName })
const { migrationsDir } = createTestMigrationsDir(dbName)
const defaultKnowledgePool: KnowledgePoolDynamicConfig = {
collections: {
posts: {
Expand Down Expand Up @@ -89,8 +95,10 @@
db: postgresAdapter({
extensions: ['vector'],
afterSchemaInit: [integration.afterSchemaInitHook],
migrationDir: migrationsDir,
push: false,
pool: {
connectionString: 'postgresql://postgres:password@localhost:5433/endpoint_test_extension',
connectionString: `postgresql://postgres:password@localhost:5433/${dbName}`,
},
}),
plugins: [
Expand All @@ -101,7 +109,12 @@
}),
],
})
const payloadWithExtensions = await getPayload({ config: configWithExtensions, cron: true })

const payloadWithExtensions = await initializePayloadWithMigrations({
config: configWithExtensions,
key: `extension-fields-vector-search-test-${Date.now()}`,
cron: true,
})

// Create a post with extension field values
const testQuery = 'Extension fields test content'
Expand Down Expand Up @@ -136,7 +149,7 @@
// Verify results contain extensionFields
expect(json).toHaveProperty('results')
expect(Array.isArray(json.results)).toBe(true)
expect(json.results.length).toBeGreaterThan(0)

Check failure on line 152 in dev/specs/extensionFieldsVectorSearch.spec.ts

View workflow job for this annotation

GitHub Actions / test

dev/specs/extensionFieldsVectorSearch.spec.ts > extensionFields > returns extensionFields in search results with correct types

AssertionError: expected 0 to be greater than 0 ❯ dev/specs/extensionFieldsVectorSearch.spec.ts:152:33

// Find a result that matches our post
const matchingResult = json.results.find(
Expand Down
27 changes: 20 additions & 7 deletions dev/specs/failedValidation.spec.ts
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
import { postgresAdapter } from '@payloadcms/db-postgres'
import { buildConfig } from 'payload'
import { getPayload } from 'payload'
import { describe, expect, test } from 'vitest'

import { createVectorizeIntegration } from '../../src/index.js'
import { createTestDb, waitForVectorizationJobs } from './utils.js'
import {
createTestDb,
waitForVectorizationJobs,
initializePayloadWithMigrations,
createTestMigrationsDir,
} from './utils.js'

const DIMS = 8
const dbName = 'failed_validation_test'

const embedDocs = async (texts: string[]) => texts.map(() => Array(DIMS).fill(0))
const embedQuery = async (_text: string) => Array(DIMS).fill(0)
Expand All @@ -18,8 +23,7 @@
},
})

const buildMalformedConfig = async () => {
await createTestDb({ dbName: 'failed_validation_test' })
const buildMalformedConfig = async (migrationsDir: string) => {
return buildConfig({
jobs: {
tasks: [],
Expand All @@ -39,10 +43,12 @@
db: postgresAdapter({
extensions: ['vector'],
afterSchemaInit: [afterSchemaInitHook],
migrationDir: migrationsDir,
push: false,
pool: {
connectionString:
process.env.DATABASE_URI ||
'postgresql://postgres:password@localhost:5433/failed_validation_test',
`postgresql://postgres:password@localhost:5433/${dbName}`,
},
}),
plugins: [
Expand Down Expand Up @@ -70,8 +76,15 @@

describe('Validation failures mark jobs as errored', () => {
test('malformed chunk entry fails the vectorize job', async () => {
const config = await buildMalformedConfig()
const payload = await getPayload({ config, cron: true })
await createTestDb({ dbName })
const { migrationsDir } = createTestMigrationsDir(dbName)

const config = await buildMalformedConfig(migrationsDir)
const payload = await initializePayloadWithMigrations({
config,
key: `failed-validation-test-${Date.now()}`,
cron: true,
})

await payload.create({
collection: 'posts',
Expand All @@ -91,7 +104,7 @@
sort: '-createdAt',
})
const failedJob = (res as any)?.docs?.[0]
expect(failedJob.hasError).toBe(true)

Check failure on line 107 in dev/specs/failedValidation.spec.ts

View workflow job for this annotation

GitHub Actions / test

dev/specs/failedValidation.spec.ts > Validation failures mark jobs as errored > malformed chunk entry fails the vectorize job

AssertionError: expected false to be true // Object.is equality - Expected + Received - true + false ❯ dev/specs/failedValidation.spec.ts:107:32
const errMsg = failedJob.error.message
expect(errMsg).toMatch(/chunk/i)
expect(errMsg).toMatch(/Invalid indices: 1/)
Expand Down
21 changes: 18 additions & 3 deletions dev/specs/int.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,14 @@ import { $createHeadingNode } from '@payloadcms/richtext-lexical/lexical/rich-te
import { PostgresPayload } from '../../src/types.js'
import { editorConfigFactory, getEnabledNodes, lexicalEditor } from '@payloadcms/richtext-lexical'
import { DIMS, getInitialMarkdownContent } from './constants.js'
import { createTestDb, waitForVectorizationJobs } from './utils.js'
import {
createTestDb,
waitForVectorizationJobs,
initializePayloadWithMigrations,
createTestMigrationsDir,
} from './utils.js'
import { postgresAdapter } from '@payloadcms/db-postgres'
import { buildConfig, getPayload } from 'payload'
import { buildConfig } from 'payload'
import { createVectorizeIntegration } from 'payloadcms-vectorize'

const embedFn = makeDummyEmbedDocs(DIMS)
Expand All @@ -32,6 +37,8 @@ describe('Plugin integration tests', () => {
beforeAll(async () => {
await createTestDb({ dbName })

const { migrationsDir } = createTestMigrationsDir(dbName)

// Create isolated integration for this test suite
const integration = createVectorizeIntegration({
default: {
Expand All @@ -55,6 +62,8 @@ describe('Plugin integration tests', () => {
db: postgresAdapter({
extensions: ['vector'],
afterSchemaInit: [integration.afterSchemaInitHook],
migrationDir: migrationsDir,
push: false, // Prevent dev mode schema push - use migrations only
pool: {
connectionString: `postgresql://postgres:password@localhost:5433/${dbName}`,
},
Expand Down Expand Up @@ -99,7 +108,13 @@ describe('Plugin integration tests', () => {
},
})

payload = await getPayload({ config, key: `int-test-${Date.now()}`, cron: true })
// Initialize Payload with migrations
payload = await initializePayloadWithMigrations({
config,
key: `int-test-${Date.now()}`,
cron: true,
})

markdownContent = await getInitialMarkdownContent(config)
})

Expand Down
Loading
Loading