# Full-text search

<a id="database.osfttab"></a>

## osfttab

> **Warning:** This page was automatically generated from the database schema dump and may be incomplete or incorrect.

Full-text search main index table.

### Columns

| Name | Type | Length | Description |
|---|---|---|---|
| `osid` | `int` | — |  |
| `ostype` | `int` | — |  |
| `crts` | `datetime` | — |  |
| `upidxts` | `datetime` | — |  |
| `uptxtts` | `datetime` | — |  |
| `ftidx` | `image` | 2147483647 |  |
| `fttxt` | `image` | 2147483647 |  |

<a id="database.osftcontent"></a>

## osftcontent

> **Warning:** This page was automatically generated from the database schema dump and may be incomplete or incorrect. Columns and data types are confirmed from the schema dump; the read/write paths and the connection to [`FULLTEXTFILEATTACHED`](#fts-option) are additionally reverse-engineered from enaio server logs (version 11.0, build 801/802).

Holds the extractable **full-text content** of a document, which serves as the source for the FTS indexer (`index:<port>`) to populate [osfttab](full_text_search.md#database.osfttab). One row per indexable DMS object (primary key `osid`).

The table is populated in two ways:

* **Client supplies the full text** — on `dms.XMLInsert` / `dms.XMLUpdate` / `dms.XMLImport` with the option `FULLTEXTFILEATTACHED=1`, the client uploads a pre-prepared full-text file (e.g. from an external OCR pipeline). It lands directly in `fulltext_content`.
* **Server-side rendition** — default path (`FULLTEXTFILEATTACHED=0`): the [`os-rendition-cache`](server_cluster_and_configuration.md#database.oscpmqueue) service produces the full text asynchronously via OCR / text extraction and writes the result to this table.

### Columns

| Name | Type | Length | Description |
|---|---|---|---|
| `osid` | `int` | — | DMS object ID (primary key). References the main object table of the respective object type (`objectX.id`). |
| `ostype` | `int` | — | enaio object-type ID (`haupttyp * 65536 + untertyp`, see Object-type ID). |
| `fulltext_content` | `image` | max | Extracted full text as a BLOB (in Oracle the equivalent `BLOB`). Read by the indexer during the FTS index run and handed to the indexer service via [std.GetCPObjectIdxFulltext](../std.md#std.GetCPObjectIdxFulltext) as a temporary file (`<ostemp>\cpb_idx_<GUID>.tmp`). |

### Lifecycle

On every `dms.XMLInsert` / `dms.XMLUpdate` / `dms.XMLImport` for an indexable object, the existing full-text content is deleted **before** a new `RENDITION` message is enqueued in [oscpmqueue](server_cluster_and_configuration.md#database.oscpmqueue):

```sql
-- Before every re-indexing — invoked server-internally by
-- CThreadExecutorData::fnStoreInWork → SetIndexDataChanged
-- → StartRenditionExport → WriteVolltextFileIntoDatabase
DELETE FROM osftcontent WHERE osid = 2676390;
```
Writes (`INSERT` with BLOB content) run server-internally via `CBaseData::Bind` and are therefore only visible in SQL trace logs as bind operations (not as raw SQL statements).

### Related tables

* [osftslog](full_text_search.md#database.osftslog) — status mirror of the FTS pipeline per object (`flag2` state machine).
* [oscpmqueue](server_cluster_and_configuration.md#database.oscpmqueue) — CP message queue that synchronises rendition and indexer jobs.
* [osfttab](full_text_search.md#database.osfttab) — main index table of the full-text search.

### Related Server-API jobs

| Job / code path | Role w.r.t. `osftcontent` |
|---|---|
| [dms.XMLInsert](../dms.md#dms.XMLInsert) / [dms.XMLUpdate](../dms.md#dms.XMLUpdate) / [dms.XMLImport](../dms.md#dms.XMLImport) | Trigger the `DELETE` of the existing row via `CThreadExecutorData::fnStoreInWork → SetIndexDataChanged → StartRenditionExport → WriteVolltextFileIntoDatabase`. With `FULLTEXTFILEATTACHED=1` they additionally `INSERT` the supplied full-text file. `dms.XMLImport` internally branches into the insert or update path and triggers the same lifecycle there. |
| `DoCreateCPMessage` | Right after the `osftcontent` DELETE, a `RENDITION` job is enqueued in [oscpmqueue](server_cluster_and_configuration.md#database.oscpmqueue). |
| [std.GetCPObjectIdxFulltext](../std.md#std.GetCPObjectIdxFulltext) | Indexer read path: fetches the full text from `osftcontent` (or the rendition cache) and delivers it to the indexer service as a temporary file. |
| [std.GetCPObjectInfo](../std.md#std.GetCPObjectInfo) | Returns `HasVolltextFile=True/False` — signals to the indexer whether an entry exists in `osftcontent`. |
| [ado.ExecuteSQL](../ado.md#ado.ExecuteSQL) | Direct SQL path for `DELETE` operations. |

<a id="database.osftslog"></a>

## osftslog

> **Warning:** This page was automatically generated from the database schema dump and may be incomplete or incorrect. Columns and data types are confirmed from the schema dump; the value ranges of `flag1`/`flag2` and the pipeline lifecycle are additionally reverse-engineered from enaio server logs (version 11.0, build 801/802).

Persistent **status mirror** of the enaio® Full Text Search (FTS) pipeline. One row per indexable DMS object (primary key `osid`), living forever — no `DELETE` observed in the analysed logs. **No pull queue**: every access is via `osid` lookup; the actual job queue of the FTS pipeline is [oscpmqueue](server_cluster_and_configuration.md#database.oscpmqueue).

Maintained exclusively by two service workers:

* **`index:<port>`** — FTS indexer service (default port `8045`), consuming the CP queues `FULLTEXTIDX`, `FULLTEXTDOC`, `FULLTEXTFILTER`, `FULLTEXTLOCATION`, `FULLTEXTDELETE`.
* **`os-rendition-cache`** — rendition service, consuming the CP queues `RENDITION`, `RENRESET`.

User clients **never access** this table directly.

### Columns

| Name | Type | Length | Description |
|---|---|---|---|
| `osid` | `int` | not-null | DMS object ID (primary key, unique per row). References the main object table of the respective object type (`objectX.id`). |
| `ostype` | `int` | not-null | enaio object-type ID (`haupttyp * 65536 + untertyp`, see Object-type ID). Observed example values include `17`, `262144`, `262163`, `262248`, `393219`. |
| `date1` | `int` | — | **Trigger time** — point in time of the last object change that required a (re-)indexing. Format: **minutes since Unix epoch** (`1970-01-01 UTC`). |
| `date2` | `int` | — | **Last processing time** of the pipeline (same format as `date1`). Updated on every status change — including partial updates by the rendition worker. On `INSERT`, initially identical to `date1`. |
| `flag1` | `int` | — | **Index-relevance flag**. Consistently `1` in the analysed logs; `0` presumably means "not to be indexed". Set exclusively by full `UPDATE`s of the indexer; partial `UPDATE`s by the rendition worker leave the column untouched. |
| `flag2` | `int` | — | **Pipeline status code** — see [Value range `flag2`](#db-osftslog-flag2). A mix of HTTP-status-code analogues (`200`/`404`/`422`) for terminal states and vendor-specific codes (`1001`–`1006`) for intermediate pipeline phases. |
| `instance` | `nvarchar` | 32 | **Worker instance tag** in the form `'<service-name>:<port>'` (e.g. `'index:8045'`). Varies across multiple indexer instances. |

### Value range `flag2`

| Value | Meaning |
|---|---|
| `0` | Object picked up by the indexer from `FULLTEXTIDX` — waiting for the final indexing run. `INSERT` always sets this value initially. Full `UPDATE` by indexer, preceded by: `std.GetNextCPMessage(QueueNames='FULLTEXTIDX')`. |
| `200` | OK — full-text successfully indexed. Full `UPDATE` by indexer, preceded by: `std.GetNextCPMessage(QueueNames='FULLTEXTDOC')`. |
| `404` | Full-text rendition not (yet) available in the cache. The indexer then triggers a new rendition job via [std.CreateCPMessages](../std.md#std.CreateCPMessages) with `CreateRenditionMessages=true`. **Transient** state. |
| `422` | Rendition is present but the content cannot (meaningfully) be indexed — e.g. a PDF without extractable text (scan without OCR), format incompatibility. **Terminal state** (no automatic retry observed). |
| `1001` | Rendition worker has picked up the job. Partial `UPDATE` by rendition service, preceded by: `std.GetNextCPMessage(QueueNames='RENDITION')`. |
| `1002` | Rendition finished and written to cache (after `std.StoreInCacheByID`). Partial `UPDATE` by rendition service. |
| `1003` | Rendition worker starts the productive step. Deterministically transitions to `1004`. |
| `1004` | Rendition running / result pending. Typical predecessor state of `200`. |
| `1006` | Rendition skipped — no hash, no full-text file available. Retry path. |
> **Note:** Full `UPDATE`s (by the indexer) set `flag1, date1, date2, flag2, instance`. Partial `UPDATE`s (by the rendition service) set only `date2, flag2`.

### Typical lifecycle of an `osid`

```text
INSERT (flag2 = 0, by indexer from FULLTEXTIDX)
  → UPDATE flag2 = 1001   (rendition worker pickup)
  → UPDATE flag2 = 1003 → 1004   (rendition running)
  → UPDATE flag2 = 1002   (rendition in cache)
  → UPDATE flag2 = 200    (indexer from FULLTEXTDOC: successfully indexed)
```
When the object changes, a new `FULLTEXTIDX` entry is created in [oscpmqueue](server_cluster_and_configuration.md#database.oscpmqueue); the lifecycle restarts with `flag2 = 0`.

### Observed SQL patterns

```sql
-- Initial registration (INSERT, by indexer)
INSERT INTO osftslog (osid, ostype, date1, date2, flag1, flag2, instance)
VALUES (2677570, 262248, 29657186, 29657186, 1, 0, 'index:8045');

-- Set terminal state (full UPDATE, by indexer)
UPDATE osftslog
SET date1 = 29657186, flag1 = 1, date2 = 29657186, flag2 = 200, instance = 'index:8045'
WHERE osid = 2677570;

-- Intermediate pipeline phase (partial UPDATE, by rendition service)
UPDATE osftslog SET date2 = 29657186, flag2 = 1002 WHERE osid = 2677570;

-- Status lookup (before every operation)
SELECT * FROM osftslog WHERE osid = 2677570;
```

### Related tables

* [oscpmqueue](server_cluster_and_configuration.md#database.oscpmqueue) — the actual CP message queue of the pipeline (push-pull mechanism).
* [osftcontent](full_text_search.md#database.osftcontent) — full-text content storage (read by the indexer as source).
* [osfttab](full_text_search.md#database.osfttab) — main index table of the full-text search.

### Related Server-API jobs

| Job | Role w.r.t. `osftslog` |
|---|---|
| [std.GetNextCPMessage](../std.md#std.GetNextCPMessage) | Pickup from [oscpmqueue](server_cluster_and_configuration.md#database.oscpmqueue) — starts every pipeline action. The queue name determines which `flag2` values follow. |
| [std.GetCPObjectInfo](../std.md#std.GetCPObjectInfo) | Provides the worker with `HasVolltextFile` and `Hash` — influences whether `1001` → `1006` (skip) or `1003`/`1004` (work) is set. |
| [std.GetCPObjectIdxFulltext](../std.md#std.GetCPObjectIdxFulltext) | Indexer read path: fetches the full-text content (from [osftcontent](full_text_search.md#database.osftcontent) or the cache). Empty result → `flag2 = 404`. |
| [std.CreateCPMessages](../std.md#std.CreateCPMessages) | Called by the indexer when a rendition needs to be re-requested (`CreateRenditionMessages=true`). |
| [std.DispatchCPMessage](../std.md#std.DispatchCPMessage) | Post-processing of a completed message. |
| [std.CPRenditionChanged](../std.md#std.CPRenditionChanged) | Called by the rendition service after `Reason=TEXT` / `PAGECOUNT` / … . Triggers re-indexing (transition `200` → `1004`). |
| [ado.ExecuteSQL](../ado.md#ado.ExecuteSQL) | Direct SQL path through which both workers read and write `osftslog`. |

<a id="database.oscatalogsearch"></a>

## oscatalogsearch

> **Warning:** This page was automatically generated from the database schema dump and may be incomplete or incorrect.

Catalogue search configuration (cached search definitions).

### Columns

| Name | Type | Length | Description |
|---|---|---|---|
| `oskey` | `nvarchar` | 440 |  |
| `oslocname` | `nvarchar` | 10 |  |
| `osvalue` | `nvarchar` | max |  |
| `ostag` | `nvarchar` | max |  |

<a id="database.positivwords"></a>

## positivwords

> **Warning:** This page was automatically generated from the database schema dump and may be incomplete or incorrect.

Positive-word list for the full-text search (counterpart to [negativwords](full_text_search.md#database.negativwords)).

### Columns

| Name | Type | Length | Description |
|---|---|---|---|
| `id` | `int` | — |  |
| `word` | `nvarchar` | 30 |  |

<a id="database.negativwords"></a>

## negativwords

> **Warning:** This page was automatically generated from the database schema dump and may be incomplete or incorrect.

Stop-word list of the full-text search. Words in this table are ignored when building the full-text index (e.g. "der", "die", "das", "und", …).

### Columns

| Name | Type | Length | Description |
|---|---|---|---|
| `id` | `int` | — | Unique ID of the stop-word entry. |
| `word` | `nvarchar` | 30 | The stop word itself. |

### Usage by Server-API jobs

* [Full-text engine (vtx)](../vtx.md) — uses this list implicitly during indexing.
