给 Postgres 写一个向量插件 - 向量类型
2025/1/8 3:03:31
本文主要是介绍给 Postgres 写一个向量插件 - 向量类型,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
在这篇文章中,我们将为 Postgres 实现 vector
类型:
CREATE TABLE items (v vector(3));
Postgres 扩展结构和 pgrx 包装器
在实现它之前,让我们先看看典型的扩展结构,以及 pgrx 如何为我们简化它。
典型的 Postgres 扩展可以大致分为 2 层:
- 实现,通常使用 C 等低级语言完成。
- 将实现粘合到 Postgres 的高级 SQL 语句。
- 指定扩展的一些基本属性的控制文件。
如果你看一下 pgvector 的源代码,这个 3 层结构非常明显,src
目录
用于 C 代码,sql
目录包含更高级的 SQL 胶水,还有一个
.control 文件。那么 pgrx 如何使扩展构建更容易?
- 它用 Rust 包装 Postgres C API
正如我们所说,即使我们用 Rust 构建扩展,Postgres 的 API 仍然是 C,pgrx 尝试将它们包装在 Rust 中,这样我们就不需要为 C 烦恼了。
- 如果可能,使用 Rust 宏生成 SQL 胶水
稍后我们将看到 pgrx 可以自动为我们生成 SQL 胶水。
- pgrx 为我们生成
.control
文件
CREATE TYPE vector
我们来定义我们的 Vector
类型,使用 std::vec::Vec
看起来非常简单,而且由于 vector
需要存储浮点数,我们在这里使用 f64
:
struct Vector { value: Vec<f64> }
然后呢?
用于创建新类型的 SQL 语句是 CREATE TYPE ...
,从它的 文档,我们会知道我们正在实现的 vector
类型是一个 基类型,要创建基类型,需要支持函数 input_function
和 output_function
。而且由于它需要采用使用 modifer 实现的维度参数(vector(DIMENSION)
),因此还需要函数 type_modifier_input_function
和 type_modifier_output_function
。因此,我们需要为我们的 Vector
类型实现这 4 个函数。
input_function
引用文档,
input_function
将类型的外部文本表示转换为为该类型定义的运算符和函数使用的内部表示。输入函数可以声明为采用一个
cstring
类型的参数,也可以声明为采用三个cstring
、oid
、integer
类型的参数。第一个参数是作为 C 字符串的输入文本,第二个参数是类型自己的 OID(数组类型除外,它们接收其元素类型的 OID),第三个是目标列的 typmod(如果已知)(如果未知,则传递 -1)。输入函数必须返回数据类型本身的值。
好的,从文档来看,这个 input_function
用于反序列化,serde
是 Rust 中最流行的反序列化库,所以让我们使用它。对于参数,由于 vector
需要类型修饰符,我们需要它接受 3 个参数。我们的 input_function
如下所示:
#[pg_extern(immutable, strict, parallel_safe, requires = [ "shell_type" ])] fn vector_input( input: &CStr, _oid: pg_sys::Oid, type_modifier: i32, ) -> Vector { let value = match serde_json::from_str::<Vec<f64>>( input.to_str().expect("expect input to be UTF-8 encoded"), ) { Ok(v) => v, Err(e) => { pgrx::error!("failed to deserialize the input string due to error {}", e) } }; let dimension = match u16::try_from(value.len()) { Ok(d) => d, Err(_) => { pgrx::error!("this vector's dimension [{}] is too large", value.len()); } }; // cast should be safe as dimension should be a positive let expected_dimension = match u16::try_from(type_modifier) { Ok(d) => d, Err(_) => { panic!("failed to cast stored dimension [{}] to u16", type_modifier); } }; // check the dimension if dimension != expected_dimension { pgrx::error!( "mismatched dimension, expected {}, found {}", expected_dimension, dimension ); } Vector { value } }
这有一大堆东西,让我们逐一研究一下。
#[pg_extern(immutable, strict, parallel_safe, require = [ "shell_type" ])]
如果你用 pg_extern
标记一个函数,那么 pgrx
会自动为你生成类似 CREATE FUNCTION <你的函数>
的 SQL,immutable, strict, parallel_safe
是你认为你的函数具有的属性,它们与 CREATE FUNCTION
文档 中列出的属性相对应。因为这个 Rust 宏用于生成 SQL,并且 SQL 可以相互依赖,所以这个 requires = [ "shell_type" ]
用于明确这种依赖关系。
shell_type
是另一个定义 shell 类型的 SQL 代码段的名称,什么是 shell 类型?它的行为就像一个占位符,这样我们在完全实现它之前就可以有一个 vector
类型来使用。此 #[pg_extern]
宏生成的 SQL 将是:
CREATE FUNCTION "vector_input"( "input" cstring, "_oid" oid, "type_modifier" INT ) RETURNS vector
如您所见,此函数 RETURNS vector
,但在实现这 4 个必需函数之前,我们如何才能拥有 vector
类型?
Shell 类型正是为此而生!我们可以定义一个 shell 类型(虚拟类型,不需要提供任何函数),并让我们的函数依赖于它:
pgrx
不会为我们定义这个 shell 类型,我们需要在 SQL 中手动执行此操作,如下所示:
extension_sql!( r#"CREATE TYPE vector; -- shell type"#, name = "shell_type" );
extension_sql!()
宏允许我们在 Rust 代码中编写 SQL,然后 pgrx
会将其包含在生成的 SQL 脚本中。name = "shell_type"
指定此 SQL 代码段的标识符,可用于引用它。我们的 vector_input()
函数依赖于此 shell 类型,因此它 requires = [ "shell_type" ]
。
fn vector_input( input: &CStr, _oid: pg_sys::Oid, type_modifier: i32, ) -> Vector {
input
参数是一个表示我们的向量输入文本的字符串,_oid
以 _
为前缀,因为我们不需要它。type_modifier
参数的类型为 i32
,这就是类型修饰符在 Postgres 中的存储方式。当我们实现类型修饰符输入/输出函数时,我们将再次看到它。
let value = match serde_json::from_str::<Vec<f64>>( input.to_str().expect("expect input to be UTF-8coded"), ) { Ok(v) => v, Err(e) => { pgrx::error!("failed to deserialize the input string due to error {}", e) } };
然后我们将 input
转换为 UTF-8 编码的 &str
并将其传递给 serde_json::from_str()
。输入文本应该是 UTF-8 编码的,所以我们应该是安全的。如果在反序列化过程中发生任何错误,只需使用 pgrx::error!()
输出错误,它将在 error
级别记录并终止当前事务。
let dimension = match u16::try_from(value.len()) { Ok(d) => d, Err(_) => { pgrx::error!("此向量的维度 [{}] 太大", value.len()); } }; // cast should be safe as dimension should be a positive let expected_dimension = match u16::try_from(type_modifier) { Ok(d) => d, Err(_) => { panic!("无法将存储的维度 [{}] 转换为 u16", type_modifier); } };
我们支持的最大维度是 u16::MAX
,我们这样做只是因为这是 pgvector 所做的。
// check the dimension if dimension != expected_dimension { pgrx::error!( "mismatched dimension, expected {}, found {}", expected_dimension, dimension ); } Vector { value }
最后,我们检查输入向量是否具有预期的维度,如果没有,则出错。否则,我们返回解析后的向量。
output_function
output_function
执行反向操作,它将给定的向量序列化为字符串。这是我们的实现:
#[pg_extern(immutable, strict, parallel_safe, require = [ "shell_type" ])] fn vector_output(value: Vector) -> CString { let value_serialized_string = serde_json::to_string(&value).unwrap(); CString::new(value_serialized_string).expect("中间不应该有 NUL") }
我们只需序列化 Vec<f64>
并将其返回到 CString
中,非常简单。
type_modifier_input_function
type_modifier_input_function
应该解析输入修饰符,检查解析的修饰符,如果有效,则将其编码为整数,这是 Postgres 存储类型修饰符的方式。
一个类型可以接受多个类型修饰符,用 ,
分隔,这就是我们在这里看到数组的原因。
#[pg_extern(immutable, strict, parallel_safe, requires = [ "shell_type" ])] fn vector_modifier_input(list: pgrx::datum::Array<&CStr>) -> i32 { if list.len() != 1 { pgrx::error!("too many modifiers, expect 1") } let modifier = list .get(0) .expect("should be Some as len = 1") .expect("type modifier cannot be NULL"); let Ok(dimension) = modifier .to_str() .expect("expect type modifiers to be UTF-8 encoded") .parse::<u16>() else { pgrx::error!("invalid dimension value, expect a number in range [1, 65535]") }; dimension as i32 }
实现很简单,vector
类型只接受 1 个修饰符,因此我们验证数组长度是否为 1。如果是,则尝试将其解析为 u16
,如果没有发生错误,则将其作为 i32
返回,以便 Postgres 可以存储它。
type_modifier_output_function
#[pg_extern(immutable, strict, parallel_safe, require = [ "shell_type" ])] fn vector_modifier_output(type_modifier: i32) -> CString { CString::new(format!("({})", type_modifier)).expect("No NUL in the middle") }
type_modifier_output_function
的实现也很简单,只需返回一个格式为 (type_modifier)
的字符串即可。
所有必需的函数都已实现,现在我们终于可以 CREATE
它了!
// create the actual type, specifying the input and output functions extension_sql!( r#" CREATE TYPE vector ( INPUT = vector_input, OUTPUT = vector_output, TYPMOD_IN = vector_modifier_input, TYPMOD_OUT = vector_modifier_output, STORAGE = external ); "#, name = "concrete_type", creates = [Type(Vector)], requires = [ "shell_type", vector_input, vector_output, vector_modifier_input, vector_modifier_output ] );
大家可能对STORAGE = external
这个参数比较陌生,我这里简单介绍一下。 Postgres 默认使用的存储引擎是堆,使用这个引擎,表的磁盘文件被分割成页,页的大小默认为 8192 字节,Postgres 希望至少 4 行可以存储在一个页中,如果某些列太大,无法容纳 4 行,Postgres 会将它们移动到外部表,即 TOAST 表。这个 STORAGE
参数控制 Postgres 在将列移动到 TOAST 表中时的行为。将其设置为 external
意味着如果该列太大,则可以将其移动到 TOAST 表中。
我们的第一次启动!
看起来一切就绪,让我们进行第一次启动吧!
$ cargo pgrx run error[E0277]: the trait bound `for<'a> fn(&'a CStr, Oid, i32) -> Vector: FunctionMetadata<_>` is not satisfied --> pg_extension/src/vector_type.rs:87:1 | 86 | #[pg_extern(immutable, strict, parallel_safe, requires = [ "shell_type" ])] | --------------------------------------------------------------------------- in this procedural macro expansion 87 | fn vector_input(input: &CStr, _oid: pg_sys::Oid, type_modifier: i32) -> Vector { | ^^ the trait `FunctionMetadata<_>` is not implemented for `for<'a> fn(&'a CStr, Oid, i32) -> Vector` | = help: the following other types implement trait `FunctionMetadata<A>`: `unsafe fn() -> R` implements `FunctionMetadata<()>` `unsafe fn(T0) -> R` implements `FunctionMetadata<(T0,)>` `unsafe fn(T0, T1) -> R` implements `FunctionMetadata<(T0, T1)>` `unsafe fn(T0, T1, T2) -> R` implements `FunctionMetadata<(T0, T1, T2)>` `unsafe fn(T0, T1, T2, T3) -> R` implements `FunctionMetadata<(T0, T1, T2, T3)>` `unsafe fn(T0, T1, T2, T3, T4) -> R` implements `FunctionMetadata<(T0, T1, T2, T3, T4)>` `unsafe fn(T0, T1, T2, T3, T4, T5) -> R` implements `FunctionMetadata<(T0, T1, T2, T3, T4, T5)>` `unsafe fn(T0, T1, T2, T3, T4, T5, T6) -> R` implements `FunctionMetadata<(T0, T1, T2, T3, T4, T5, T6)>` and 25 others = note: this error originates in the attribute macro `pg_extern` (in Nightly builds, run with -Z macro-backtrace for more info) error[E0599]: the function or associated item `entity` exists for struct `Vector`, but its trait bounds were not satisfied --> pg_extension/src/vector_type.rs:86:1 | 22 | pub(crate) struct Vector { | ------------------------ function or associated item `entity` not found for this struct because it doesn't satisfy `Vector: SqlTranslatable` ... 86 | #[pg_extern(immutable, strict, parallel_safe, requires = [ "shell_type" ])] | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ function or associated item cannot be called on `Vector` due to unsatisfied trait bounds | = note: the following trait bounds were not satisfied: `Vector: SqlTranslatable` which is required by `&Vector: SqlTranslatable` note: the trait `SqlTranslatable` must be implemented --> $HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pgrx-sql-entity-graph-0.12.9/src/metadata/sql_translatable.rs:73:1 | 73 | pub unsafe trait SqlTranslatable { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ = help: items from traits can only be used if the trait is implemented and in scope = note: the following traits define an item `entity`, perhaps you need to implement one of them: candidate #1: `FunctionMetadata` candidate #2: `SqlTranslatable` = note: this error originates in the attribute macro `pg_extern` (in Nightly builds, run with -Z macro-backtrace for more info) error[E0277]: the trait bound `Vector: RetAbi` is not satisfied --> pg_extension/src/vector_type.rs:87:73 | 87 | fn vector_input(input: &CStr, _oid: pg_sys::Oid, type_modifier: i32) -> Vector { | ^^^^^^ the trait `BoxRet` is not implemented for `Vector`, which is required by `Vector: RetAbi` | = help: the following other types implement trait `BoxRet`: &'a CStr &'a [u8] &'a str () AnyArray AnyElement AnyNumeric BOX and 33 others = note: required for `Vector` to implement `RetAbi` error[E0277]: the trait bound `Vector: RetAbi` is not satisfied --> pg_extension/src/vector_type.rs:87:80 | 86 | #[pg_extern(immutable, strict, parallel_safe, requires = [ "shell_type" ])] | --------------------------------------------------------------------------- in this procedural macro expansion 87 | fn vector_input(input: &CStr, _oid: pg_sys::Oid, type_modifier: i32) -> Vector { | ________________________________________________________________________________^ 88 | | let value = match serde_json::from_str::<Vec<f64>>( 89 | | input.to_str().expect("expect input to be UTF-8 encoded"), 90 | | ) { ... | 110 | | Vector { value } 111 | | } | |_^ the trait `BoxRet` is not implemented for `Vector`, which is required by `Vector: RetAbi` | = help: the following other types implement trait `BoxRet`: &'a CStr &'a [u8] &'a str () AnyArray AnyElement AnyNumeric BOX and 33 others = note: required for `Vector` to implement `RetAbi` = note: this error originates in the attribute macro `pg_extern` (in Nightly builds, run with -Z macro-backtrace for more info) error[E0277]: the trait bound `fn(Vector) -> CString: FunctionMetadata<_>` is not satisfied --> pg_extension/src/vector_type.rs:114:1 | 113 | #[pg_extern(immutable, strict, parallel_safe, requires = [ "shell_type" ])] | --------------------------------------------------------------------------- in this procedural macro expansion 114 | fn vector_output(value: Vector) -> CString { | ^^ the trait `FunctionMetadata<_>` is not implemented for `fn(Vector) -> CString` | = help: the following other types implement trait `FunctionMetadata<A>`: `unsafe fn() -> R` implements `FunctionMetadata<()>` `unsafe fn(T0) -> R` implements `FunctionMetadata<(T0,)>` `unsafe fn(T0, T1) -> R` implements `FunctionMetadata<(T0, T1)>` `unsafe fn(T0, T1, T2) -> R` implements `FunctionMetadata<(T0, T1, T2)>` `unsafe fn(T0, T1, T2, T3) -> R` implements `FunctionMetadata<(T0, T1, T2, T3)>` `unsafe fn(T0, T1, T2, T3, T4) -> R` implements `FunctionMetadata<(T0, T1, T2, T3, T4)>` `unsafe fn(T0, T1, T2, T3, T4, T5) -> R` implements `FunctionMetadata<(T0, T1, T2, T3, T4, T5)>` `unsafe fn(T0, T1, T2, T3, T4, T5, T6) -> R` implements `FunctionMetadata<(T0, T1, T2, T3, T4, T5, T6)>` and 25 others = note: this error originates in the attribute macro `pg_extern` (in Nightly builds, run with -Z macro-backtrace for more info) error[E0599]: the function or associated item `entity` exists for struct `Vector`, but its trait bounds were not satisfied --> pg_extension/src/vector_type.rs:113:1 | 22 | pub(crate) struct Vector { | ------------------------ function or associated item `entity` not found for this struct because it doesn't satisfy `Vector: SqlTranslatable` ... 113 | #[pg_extern(immutable, strict, parallel_safe, requires = [ "shell_type" ])] | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ function or associated item cannot be called on `Vector` due to unsatisfied trait bounds | = note: the following trait bounds were not satisfied: `Vector: SqlTranslatable` which is required by `&Vector: SqlTranslatable` note: the trait `SqlTranslatable` must be implemented --> $HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pgrx-sql-entity-graph-0.12.9/src/metadata/sql_translatable.rs:73:1 | 73 | pub unsafe trait SqlTranslatable { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ = help: items from traits can only be used if the trait is implemented and in scope = note: the following traits define an item `entity`, perhaps you need to implement one of them: candidate #1: `FunctionMetadata` candidate #2: `SqlTranslatable` = note: this error originates in the attribute macro `pg_extern` (in Nightly builds, run with -Z macro-backtrace for more info) error[E0277]: the trait bound `Vector: ArgAbi<'_>` is not satisfied --> pg_extension/src/vector_type.rs:114:18 | 113 | #[pg_extern(immutable, strict, parallel_safe, requires = [ "shell_type" ])] | --------------------------------------------------------------------------- in this procedural macro expansion 114 | fn vector_output(value: Vector) -> CString { | ^^^^^ the trait `ArgAbi<'_>` is not implemented for `Vector` | = help: the following other types implement trait `ArgAbi<'fcx>`: &'fcx T &'fcx [u8] &'fcx str *mut FunctionCallInfoBaseData AnyArray AnyElement AnyNumeric BOX and 36 others note: required by a bound in `pgrx::callconv::Args::<'a, 'fcx>::next_arg_unchecked` --> $HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pgrx-0.12.9/src/callconv.rs:931:41 | 931 | pub unsafe fn next_arg_unchecked<T: ArgAbi<'fcx>>(&mut self) -> Option<T> { | ^^^^^^^^^^^^ required by this bound in `Args::<'a, 'fcx>::next_arg_unchecked` = note: this error originates in the attribute macro `pg_extern` (in Nightly builds, run with -Z macro-backtrace for more info)
修复错误!
特征绑定 for<'a> fn(&'a CStr, Oid, i32) -> Vector: FunctionMetadata<_>
不满足
好吧,这个错误的消息相当混乱,但确实有一个有用的提示:
note: the trait `SqlTranslatable` must be implemented --> $HOME/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pgrx-sql-entity-graph-0.12.9/src/metadata/sql_translatable.rs:73:1 | 73 | pub unsafe trait SqlTranslatable { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ = help: items from traits can only be used if the trait is implemented and in scope = note: the following traits define an item `entity`, perhaps you need to implement one of them: candidate #1: `FunctionMetadata` candidate #2: `SqlTranslatable` = note: this error originates in the attribute macro `pg_extern` (in Nightly builds, run with -Z macro-backtrace for more info)
此注释说我们应该为我们的Vector
类型实现SqlTranslatable
,特征是做什么用的?其文档说此特征代表“可以在 SQL 中表示的值”,这有点令人困惑。让我解释一下。由于 pgrx 会为我们生成 SQL,并且会使用我们的“Vector”类型,因此需要我们告诉它我们的类型在 SQL 中应该被调用什么,也就是将其转换为 SQL。
在 SQL 中,我们的 Vector
类型简称为 vector
,因此我们的实现如下:
unsafe impl SqlTranslatable for Vector { fn argument_sql() -> Result<SqlMapping, ArgumentError> { Ok(SqlMapping::As("vector".into())) } fn return_sql() -> Result<Returns, ReturnsError> { Ok(Returns::One(SqlMapping::As("vector".into()))) } }
特征绑定 Vector: RetAbi
和 Vector: ArgAbi
不满足
我们的 Vector
类型有其内存表示(如何在内存中布局),而 Postgres 有自己的方式(或规则)将事物存储在内存中。 vector_input()
函数向 Postgres 返回一个 Vector
,而 vector_output()
函数接受一个 Vector
参数,该参数将由 Postgres 提供:
fn vector_input(input: &CStr, _oid: pg_sys::Oid, type_modifier: i32) -> Vector fn vector_output(value: Vector) -> CString
使用 Vector
作为函数返回值和参数要求它能够按照 Postgres 的规则表示。Postgres 使用 Datum
,这是所有 SQL 类型的二进制表示。因此,我们需要通过这两个特征在我们的 Vector
类型和 Datum
之间提供往返。
由于我们的 Vector
类型基本上是 std::vec::Vec<f64>
,并且这两个特征是为 Vec
实现的,因此我们可以简单地使用这些实现:
impl FromDatum for Vector { unsafe fn from_polymorphic_datum( datum: pg_sys::Datum, is_null: bool, typoid: pg_sys::Oid, ) -> Option<Self> where Self: Sized, { let value = <Vec<f64> as FromDatum>::from_polymorphic_datum(datum, is_null, typoid)?; Some(Self { value }) } } impl IntoDatum for Vector { fn into_datum(self) -> Option<pg_sys::Datum> { self.value.into_datum() } fn type_oid() -> pg_sys::Oid { rust_regtypein::<Self>() } } unsafe impl<'fcx> ArgAbi<'fcx> for Vector where Self: 'fcx, { unsafe fn unbox_arg_unchecked(arg: ::pgrx::callconv::Arg<'_, 'fcx>) -> Self { unsafe { arg.unbox_arg_using_from_datum().expect("expect it to be non-NULL") } } } unsafe impl BoxRet for Vector { unsafe fn box_into<'fcx>( self, fcinfo: &mut pgrx::callconv::FcInfo<'fcx>, ) -> pgrx::datum::Datum<'fcx> { match self.into_datum() { Some(datum) => unsafe {fcinfo.return_raw_datum(datum)} None => fcinfo.return_null() } } }
FromDatum
和 IntoDatum
也已实现,因为它们将在 RetAbi
和 ArgAbi
实现中使用。
让我们看看是否有任何错误:
$ cargo check $ echo $? 0
太好了,没有错误,让我们再次启动它!
我们第二次启动!
$ cargo pgrx run ... psql (17.2) Type "help" for help. pg_vector_ext=#
是的,“psql”启动没有任何问题!让我们启用扩展并创建一个带有“vector”列的表:
pg_vector_ext=# CREATE EXTENSION pg_vector_ext; CREATE EXTENSION pg_vector_ext=# CREATE TABLE items (v vector(3)); CREATE TABLE
到目前为止一切顺利,现在让我们插入一些数据:
pg_vector_ext=# INSERT INTO items values ('[1, 2, 3]'); ERROR: failed to cast stored dimension [-1] to u16 LINE 1: INSERT INTO items values ('[1, 2, 3]'); ^ DETAIL: 0: std::backtrace_rs::backtrace::libunwind::trace at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/../../backtrace/src/backtrace/libunwind.rs:116:5 ... 46: <unknown> at main.c:197:3
好吧,Postgres 慌了,因为 type_modifier
参数是 -1
,无法转换为 u16
。但是 type_modifier
怎么会是 -1
,当且仅当类型修饰符未知时,它不是 -1
吗?类型修饰符 3
显然存储在元数据中,因此肯定是已知的。
我在这里卡了一段时间,猜猜怎么着,我不是唯一一个。恕我直言,这是一个错误,但并不是每个人都这么认为。让我们接受它只能是 -1
的事实,解决这个问题的方法是创建一个转换函数,将 Vector
转换为 Vector
,您可以在其中访问存储的类型修饰符:
/// Cast a `vector` to a `vector`, the conversion is meaningless, but we do need /// to do the dimension check here if we cannot get the `typmod` value in vector /// type's input function. #[pgrx::pg_extern(immutable, strict, parallel_safe, requires = ["concrete_type"])] fn cast_vector_to_vector(vector: Vector, type_modifier: i32, _explicit: bool) -> Vector { let expected_dimension = u16::try_from(type_modifier).expect("invalid type_modifier") as usize; let dimension = vector.value.len(); if vector.value.len() != expected_dimension { pgrx::error!( "mismatched dimension, expected {}, found {}", type_modifier, dimension ); } vector } extension_sql!( r#" CREATE CAST (vector AS vector) WITH FUNCTION cast_vector_to_vector(vector, integer, boolean); "#, name = "cast_vector_to_vector", requires = ["concrete_type", cast_vector_to_vector] );
在我们的 vector_input()
函数中,如果类型修饰符未知,则不会执行维度检查:
#[pg_extern(immutable, strict, parallel_safe, requires = [ "shell_type" ])] fn vector_input(input: &CStr, _oid: pg_sys::Oid, type_modifier: i32) -> Vector { let value = match serde_json::from_str::<Vec<f64>>( input.to_str().expect("expect input to be UTF-8 encoded"), ) { Ok(v) => v, Err(e) => { pgrx::error!("failed to deserialize the input string due to error {}", e) } }; let dimension = match u16::try_from(value.len()) { Ok(d) => d, Err(_) => { pgrx::error!("this vector's dimension [{}] is too large", value.len()); } }; // check the dimension in INPUT function if we know the expected dimension. if type_modifier != -1 { let expected_dimension = match u16::try_from(type_modifier) { Ok(d) => d, Err(_) => { panic!("failed to cast stored dimension [{}] to u16", type_modifier); } }; if dimension != expected_dimension { pgrx::error!( "mismatched dimension, expected {}, found {}", expected_dimension, dimension ); } } // If we don't know the type modifier, do not check the dimension. Vector { value } }
现在一切都应该正常工作了:
pg_vector_ext=# INSERT INTO items values ('[1, 2, 3]'); INSERT 0 1
INSERT
工作了, 让我们试试 SELECT
:
pg_vector_ext=# SELECT * FROM items; v --------------- [1.0,2.0,3.0] (1 row)
恭喜!您刚刚为 Postgres 添加了 vector
类型支持!以下是我们实现的完整代码:
src/vector_type.rs
//! This file defines a `Vector` type. use pgrx::callconv::ArgAbi; use pgrx::callconv::BoxRet; use pgrx::datum::FromDatum; use pgrx::datum::IntoDatum; use pgrx::extension_sql; use pgrx::pg_extern; use pgrx::pg_sys; use pgrx::pgrx_sql_entity_graph::metadata::ArgumentError; use pgrx::pgrx_sql_entity_graph::metadata::Returns; use pgrx::pgrx_sql_entity_graph::metadata::ReturnsError; use pgrx::pgrx_sql_entity_graph::metadata::SqlMapping; use pgrx::pgrx_sql_entity_graph::metadata::SqlTranslatable; use pgrx::wrappers::rust_regtypein; use std::ffi::CStr; use std::ffi::CString; /// The `vector` type #[derive(Debug, serde::Deserialize, serde::Serialize)] #[serde(transparent)] pub(crate) struct Vector { pub(crate) value: Vec<f64>, } unsafe impl SqlTranslatable for Vector { fn argument_sql() -> Result<SqlMapping, ArgumentError> { Ok(SqlMapping::As("vector".into())) } fn return_sql() -> Result<Returns, ReturnsError> { Ok(Returns::One(SqlMapping::As("vector".into()))) } } impl FromDatum for Vector { unsafe fn from_polymorphic_datum( datum: pg_sys::Datum, is_null: bool, typoid: pg_sys::Oid, ) -> Option<Self> where Self: Sized, { let value = <Vec<f64> as FromDatum>::from_polymorphic_datum(datum, is_null, typoid)?; Some(Self { value }) } } impl IntoDatum for Vector { fn into_datum(self) -> Option<pg_sys::Datum> { self.value.into_datum() } fn type_oid() -> pg_sys::Oid { rust_regtypein::<Self>() } } unsafe impl<'fcx> ArgAbi<'fcx> for Vector where Self: 'fcx, { unsafe fn unbox_arg_unchecked(arg: ::pgrx::callconv::Arg<'_, 'fcx>) -> Self { unsafe { arg.unbox_arg_using_from_datum() .expect("expect it to be non-NULL") } } } unsafe impl BoxRet for Vector { unsafe fn box_into<'fcx>( self, fcinfo: &mut pgrx::callconv::FcInfo<'fcx>, ) -> pgrx::datum::Datum<'fcx> { match self.into_datum() { Some(datum) => unsafe { fcinfo.return_raw_datum(datum) }, None => fcinfo.return_null(), } } } extension_sql!( r#" CREATE TYPE vector; -- shell type "#, name = "shell_type", bootstrap // declare this extension_sql block as the "bootstrap" block so that it happens first in sql generation ); #[pg_extern(immutable, strict, parallel_safe, requires = [ "shell_type" ])] fn vector_input(input: &CStr, _oid: pg_sys::Oid, type_modifier: i32) -> Vector { let value = match serde_json::from_str::<Vec<f64>>( input.to_str().expect("expect input to be UTF-8 encoded"), ) { Ok(v) => v, Err(e) => { pgrx::error!("failed to deserialize the input string due to error {}", e) } }; let dimension = match u16::try_from(value.len()) { Ok(d) => d, Err(_) => { pgrx::error!("this vector's dimension [{}] is too large", value.len()); } }; // check the dimension in INPUT function if we know the expected dimension. if type_modifier != -1 { let expected_dimension = match u16::try_from(type_modifier) { Ok(d) => d, Err(_) => { panic!("failed to cast stored dimension [{}] to u16", type_modifier); } }; if dimension != expected_dimension { pgrx::error!( "mismatched dimension, expected {}, found {}", expected_dimension, dimension ); } } // If we don't know the type modifier, do not check the dimension. Vector { value } } #[pg_extern(immutable, strict, parallel_safe, requires = [ "shell_type" ])] fn vector_output(value: Vector) -> CString { let value_serialized_string = serde_json::to_string(&value).unwrap(); CString::new(value_serialized_string).expect("there should be no NUL in the middle") } #[pg_extern(immutable, strict, parallel_safe, requires = [ "shell_type" ])] fn vector_modifier_input(list: pgrx::datum::Array<&CStr>) -> i32 { if list.len() != 1 { pgrx::error!("too many modifiers, expect 1") } let modifier = list .get(0) .expect("should be Some as len = 1") .expect("type modifier cannot be NULL"); let Ok(dimension) = modifier .to_str() .expect("expect type modifiers to be UTF-8 encoded") .parse::<u16>() else { pgrx::error!("invalid dimension value, expect a number in range [1, 65535]") }; dimension as i32 } #[pg_extern(immutable, strict, parallel_safe, requires = [ "shell_type" ])] fn vector_modifier_output(type_modifier: i32) -> CString { CString::new(format!("({})", type_modifier)).expect("no NUL in the middle") } // create the actual type, specifying the input and output functions extension_sql!( r#" CREATE TYPE vector ( INPUT = vector_input, OUTPUT = vector_output, TYPMOD_IN = vector_modifier_input, TYPMOD_OUT = vector_modifier_output, STORAGE = external ); "#, name = "concrete_type", creates = [Type(Vector)], requires = [ "shell_type", vector_input, vector_output, vector_modifier_input, vector_modifier_output ] ); /// Cast a `vector` to a `vector`, the conversion is meaningless, but we do need /// to do the dimension check here if we cannot get the `typmod` value in vector /// type's input function. #[pgrx::pg_extern(immutable, strict, parallel_safe, requires = ["concrete_type"])] fn cast_vector_to_vector(vector: Vector, type_modifier: i32, _explicit: bool) -> Vector { let expected_dimension = u16::try_from(type_modifier).expect("invalid type_modifier") as usize; let dimension = vector.value.len(); if vector.value.len() != expected_dimension { pgrx::error!( "mismatched dimension, expected {}, found {}", type_modifier, dimension ); } vector } extension_sql!( r#" CREATE CAST (vector AS vector) WITH FUNCTION cast_vector_to_vector(vector, integer, boolean); "#, name = "cast_vector_to_vector", requires = ["concrete_type", cast_vector_to_vector] );
这篇关于给 Postgres 写一个向量插件 - 向量类型的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2025-01-08高效短剧档期管理技巧,如何选择合适的软件?
- 2025-01-08从杂乱到有序:如何利用看板工具高效管理复杂的项目任务清单
- 2025-01-08短剧演员如何在繁忙档期中高效协作?使用管理软件的优势
- 2025-01-08选择清单管理系统的技巧:帮助团队实现高效协作
- 2025-01-08数字化转型中的清单管理工具:选择与应用指南
- 2025-01-08全面解析SaaS工时管理工具:5大看板工具如何提高工作效率
- 2025-01-07Spring Boot 集成 Easysearch 完整指南
- 2025-01-07需求管理工具的功能盘点:从任务分配到优先级管理的全面解析
- 2025-01-07优秀团队的选择:工时管理软件助力效率提升
- 2025-01-07工时管理系统如何助力团队摆脱低效协作?