3.3.3 ベクトル変換の定義

ベクトル表現の定義は、SQL関数であるcreate_vectorizer関数を使用し、vectorizerと呼ぶベクトル変換に関するパラメータの集合を定義することにより行います。

例) ベクトル変換の定義

rag_database=> SELECT ai.create_vectorizer(
     'sample_table'::regclass,
     destination => 'sample_embeddings',
     embedding => ai.embedding_ollama('all-minilm', 384),
     chunking => ai.chunking_recursive_character_text_splitter('contents'),
     processing => ai.processing_default(batch_size => 200, concurrency => 1),
     scheduling => pgx_vectorizer.schedule_vectorizer(interval '1 hour'),
     indexing => ai.indexing_hnsw(min_rows =>50000, opclass => 'vector_cosine_ops')
); 
create_vectorizer
-------------------
                 1 - 作成されたvectorizerのID
(1 row)

ベクトル変換の定義では、ベクトル化の対象となるテキストデータを含むテーブルの情報、ベクトル表現に直接かかわる埋込みモデルやベクトル長、およびベクトル変換の前に行われる前処理などを指定できるほか、ベクトル変換の行うタイミングなどをスケジュールとして指定できます。Fujitsu Enterprise Postgres内でバックグラウンドの自動的なベクトル変換を行う場合には、scheduling引数にはpgx_vectorizer.schedule_vectorizerを指定してください。

ポイント

ベクトル変換の対象となるテキストを含むテーブルの名前や主キー、列名などを変更するとベクトル変換処理が正しく動作できないため、これらの情報を変更しないでください。

作成したベクトル変換定義はai.vectorizerテーブルで確認できます。

SELECT * FROM ai.vectorizer where view_name = 'sample_embeddings';
id            | 1
source_schema | public
source_table  | sample_table
source_pk     | [{"pknum": 1, "attnum": 1, "attname": "id", "typname": "int4"}]
target_schema | public
target_table  | sample_embeddings_store
view_schema   | public
view_name     | sample_embeddings
trigger_name  | _vectorizer_src_trg_1
queue_schema  | ai
queue_table   | _vectorizer_q_1
config        | {"version": "0.8.0", "chunking": {"chunk_size": 800, "separators": ["\n\n", "\n", ".", "?", "!", " ", ""], "config_type": "chunking", "chunk_column": "contents", "chunk_overlap": 400, "implementation": "recursive_character_text_splitter", "is_separator_regex": false}, "indexing": {"config_type": "indexing", "implementation": "none"}, "embedding": {"model": "all-minilm", "dimensions": 384, "config_type": "embedding", "implementation": "ollama"}, "formatting": {"template": "$chunk", "config_type": "formatting", "implementation": "python_template"}, "processing": {"batch_size": 2000, "concurrency": 1, "config_type": "processing", "implementation": "default"}, "scheduling": {"config_type": "scheduling", "implementation": "none", "schedule_interval": "01:00:00", "extra_implementation": "pgx_vectorizer"}}
disabled      | f