- 1. API with NestJS #1. Controllers, routing and the module structure
- 2. API with NestJS #2. Setting up a PostgreSQL database with TypeORM
- 3. API with NestJS #3. Authenticating users with bcrypt, Passport, JWT, and cookies
- 4. API with NestJS #4. Error handling and data validation
- 5. API with NestJS #5. Serializing the response with interceptors
- 6. API with NestJS #6. Looking into dependency injection and modules
- 7. API with NestJS #7. Creating relationships with Postgres and TypeORM
- 8. API with NestJS #8. Writing unit tests
- 9. API with NestJS #9. Testing services and controllers with integration tests
- 10. API with NestJS #10. Uploading public files to Amazon S3
- 11. API with NestJS #11. Managing private files with Amazon S3
- 12. API with NestJS #12. Introduction to Elasticsearch
- 13. API with NestJS #13. Implementing refresh tokens using JWT
- 14. API with NestJS #14. Improving performance of our Postgres database with indexes
- 15. API with NestJS #15. Defining transactions with PostgreSQL and TypeORM
- 16. API with NestJS #16. Using the array data type with PostgreSQL and TypeORM
- 17. API with NestJS #17. Offset and keyset pagination with PostgreSQL and TypeORM
- 18. API with NestJS #18. Exploring the idea of microservices
- 19. API with NestJS #19. Using RabbitMQ to communicate with microservices
- 20. API with NestJS #20. Communicating with microservices using the gRPC framework
- 21. API with NestJS #21. An introduction to CQRS
- 22. API with NestJS #22. Storing JSON with PostgreSQL and TypeORM
- 23. API with NestJS #23. Implementing in-memory cache to increase the performance
- 24. API with NestJS #24. Cache with Redis. Running the app in a Node.js cluster
- 25. API with NestJS #25. Sending scheduled emails with cron and Nodemailer
- 26. API with NestJS #26. Real-time chat with WebSockets
- 27. API with NestJS #27. Introduction to GraphQL. Queries, mutations, and authentication
- 28. API with NestJS #28. Dealing in the N + 1 problem in GraphQL
- 29. API with NestJS #29. Real-time updates with GraphQL subscriptions
- 30. API with NestJS #30. Scalar types in GraphQL
- 31. API with NestJS #31. Two-factor authentication
- 32. API with NestJS #32. Introduction to Prisma with PostgreSQL
- 33. API with NestJS #33. Managing PostgreSQL relationships with Prisma
- 34. API with NestJS #34. Handling CPU-intensive tasks with queues
- 35. API with NestJS #35. Using server-side sessions instead of JSON Web Tokens
- 36. API with NestJS #36. Introduction to Stripe with React
- 37. API with NestJS #37. Using Stripe to save credit cards for future use
- 38. API with NestJS #38. Setting up recurring payments via subscriptions with Stripe
- 39. API with NestJS #39. Reacting to Stripe events with webhooks
- 40. API with NestJS #40. Confirming the email address
- 41. API with NestJS #41. Verifying phone numbers and sending SMS messages with Twilio
- 42. API with NestJS #42. Authenticating users with Google
- 43. API with NestJS #43. Introduction to MongoDB
- 44. API with NestJS #44. Implementing relationships with MongoDB
- 45. API with NestJS #45. Virtual properties with MongoDB and Mongoose
- 46. API with NestJS #46. Managing transactions with MongoDB and Mongoose
- 47. API with NestJS #47. Implementing pagination with MongoDB and Mongoose
- 48. API with NestJS #48. Definining indexes with MongoDB and Mongoose
- 49. API with NestJS #49. Updating with PUT and PATCH with MongoDB and Mongoose
- 50. API with NestJS #50. Introduction to logging with the built-in logger and TypeORM
- 51. API with NestJS #51. Health checks with Terminus and Datadog
- 52. API with NestJS #52. Generating documentation with Compodoc and JSDoc
- 53. API with NestJS #53. Implementing soft deletes with PostgreSQL and TypeORM
- 54. API with NestJS #54. Storing files inside a PostgreSQL database
- 55. API with NestJS #55. Uploading files to the server
- 56. API with NestJS #56. Authorization with roles and claims
- 57. API with NestJS #57. Composing classes with the mixin pattern
- 58. API with NestJS #58. Using ETag to implement cache and save bandwidth
- 59. API with NestJS #59. Introduction to a monorepo with Lerna and Yarn workspaces
- 60. API with NestJS #60. The OpenAPI specification and Swagger
- 61. API with NestJS #61. Dealing with circular dependencies
- 62. API with NestJS #62. Introduction to MikroORM with PostgreSQL
- 63. API with NestJS #63. Relationships with PostgreSQL and MikroORM
- 64. API with NestJS #64. Transactions with PostgreSQL and MikroORM
- 65. API with NestJS #65. Implementing soft deletes using MikroORM and filters
- 66. API with NestJS #66. Improving PostgreSQL performance with indexes using MikroORM
- 67. API with NestJS #67. Migrating to TypeORM 0.3
- 68. API with NestJS #68. Interacting with the application through REPL
- 69. API with NestJS #69. Database migrations with TypeORM
- 70. API with NestJS #70. Defining dynamic modules
- 71. API with NestJS #71. Introduction to feature flags
- 72. API with NestJS #72. Working with PostgreSQL using raw SQL queries
- 73. API with NestJS #73. One-to-one relationships with raw SQL queries
- 74. API with NestJS #74. Designing many-to-one relationships using raw SQL queries
- 75. API with NestJS #75. Many-to-many relationships using raw SQL queries
- 76. API with NestJS #76. Working with transactions using raw SQL queries
- 77. API with NestJS #77. Offset and keyset pagination with raw SQL queries
- 78. API with NestJS #78. Generating statistics using aggregate functions in raw SQL
- 79. API with NestJS #79. Implementing searching with pattern matching and raw SQL
- 80. API with NestJS #80. Updating entities with PUT and PATCH using raw SQL queries
- 81. API with NestJS #81. Soft deletes with raw SQL queries
- 82. API with NestJS #82. Introduction to indexes with raw SQL queries
- 83. API with NestJS #83. Text search with tsvector and raw SQL
- 84. API with NestJS #84. Implementing filtering using subqueries with raw SQL
- 85. API with NestJS #85. Defining constraints with raw SQL
- 86. API with NestJS #86. Logging with the built-in logger when using raw SQL
- 87. API with NestJS #87. Writing unit tests in a project with raw SQL
- 88. API with NestJS #88. Testing a project with raw SQL using integration tests
- 89. API with NestJS #89. Replacing Express with Fastify
- 90. API with NestJS #90. Using various types of SQL joins
- 91. API with NestJS #91. Dockerizing a NestJS API with Docker Compose
- 92. API with NestJS #92. Increasing the developer experience with Docker Compose
- 93. API with NestJS #93. Deploying a NestJS app with Amazon ECS and RDS
- 94. API with NestJS #94. Deploying multiple instances on AWS with a load balancer
- 95. API with NestJS #95. CI/CD with Amazon ECS and GitHub Actions
- 96. API with NestJS #96. Running unit tests with CI/CD and GitHub Actions
- 97. API with NestJS #97. Introduction to managing logs with Amazon CloudWatch
- 98. API with NestJS #98. Health checks with Terminus and Amazon ECS
- 99. API with NestJS #99. Scaling the number of application instances with Amazon ECS
- 100. API with NestJS #100. The HTTPS protocol with Route 53 and AWS Certificate Manager
- 101. API with NestJS #101. Managing sensitive data using the AWS Secrets Manager
- 102. API with NestJS #102. Writing unit tests with Prisma
- 103. API with NestJS #103. Integration tests with Prisma
- 104. API with NestJS #104. Writing transactions with Prisma
- 105. API with NestJS #105. Implementing soft deletes with Prisma and middleware
- 106. API with NestJS #106. Improving performance through indexes with Prisma
- 107. API with NestJS #107. Offset and keyset pagination with Prisma
- 108. API with NestJS #108. Date and time with Prisma and PostgreSQL
- 109. API with NestJS #109. Arrays with PostgreSQL and Prisma
- 110. API with NestJS #110. Managing JSON data with PostgreSQL and Prisma
- 111. API with NestJS #111. Constraints with PostgreSQL and Prisma
- 112. API with NestJS #112. Serializing the response with Prisma
- 113. API with NestJS #113. Logging with Prisma
- 114. API with NestJS #114. Modifying data using PUT and PATCH methods with Prisma
- 115. API with NestJS #115. Database migrations with Prisma
- 116. API with NestJS #116. REST API versioning
- 117. API with NestJS #117. CORS – Cross-Origin Resource Sharing
- 118. API with NestJS #118. Uploading and streaming videos
- 119. API with NestJS #119. Type-safe SQL queries with Kysely and PostgreSQL
- 120. API with NestJS #120. One-to-one relationships with the Kysely query builder
- 121. API with NestJS #121. Many-to-one relationships with PostgreSQL and Kysely
It is very common to implement a feature of searching through the contents of the database. In one of the previous articles, we learned how to implement it in a simple way using pattern matching.
Today we take it a step further and learn about the data types explicitly designed for full-text search.
Text Search Types
PostgreSQL provides two data types that help us implement full-text search. They allow us to search through a collection of texts and find the ones that match a given query the most.
tsvector
The tsvector column stores the text in a format optimized for search. To parse a string into the tsvector format, we need the to_tsvector function.
1 |
SELECT to_tsvector('english', 'The quick brown fox quickly jumps over the lazy dog'); |
When we look at the result of the above query, we notice a set of optimizations. One of the most apparent is grouping duplicates. Thanks to using the English dictionary, PostgreSQL noticed that “quick” and “quickly” are two variants of the same word.
Also, using the tsvector type can help us filter out stop words. They are very common, appear in almost every sentence, and don’t have much value when searching through text. Since we used the English dictionary in the above example, PostgreSQL filtered out the words “the” and “over”.
tsquery
The tsquery data type stores the text we want to search for. To transform a string into the tsquery format, we can use the to_tsquery function.
1 |
SELECT to_tsquery('fox'); |
To check if a certain tsvector matches the tsquery, we need to use the @@ operator.
1 |
SELECT to_tsvector('english', 'The quick brown fox quickly jumps over the lazy dog') @@ to_tsquery('fox'); |
true
When doing the above, we can play with the &, |, and ! boolean operators. For example, we can use the ! operator to make sure a given text does not contain a particular word.
1 |
SELECT to_tsvector('english', 'The quick brown fox quickly jumps over the lazy dog') @@ to_tsquery('!cat'); |
true
Check out the official documentation for a good explanation of all available operators.
Another handy function is plainto_tsquery. It takes an unformatted phrase and inserts the & operator between words. Because of that, it is an excellent choice to handle the input from the user.
1 |
SELECT to_tsvector('english', 'The quick brown fox quickly jumps over the lazy dog') @@ plainto_tsquery('brown fox'); |
true
Transforming the existing data
Let’s take a look at our posts table.
1 2 3 4 5 6 |
CREATE TABLE posts ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, title text NOT NULL, post_content text NOT NULL, author_id int REFERENCES users(id) NOT NULL ) |
Unfortunately, it does not contain a tsvector column. The most straightforward solution to the above problem is to convert our data to tsvector on the fly.
1 2 |
SELECT * FROM posts WHERE to_tsvector('english', post_content) @@ plainto_tsquery('fox'); |
We can take the above even further and combine the contents of the title and post_content columns to search through both.
1 2 |
SELECT * FROM posts WHERE to_tsvector('english', post_content || ' ' || title) @@ plainto_tsquery('fox'); |
The crucial issue with the above approach is that it causes PostgreSQL to transform the text from every record of the posts database, which can take a substantial amount of time.
Instead, I suggest defining a generated column that contains the data transformed into the tsvector format.
1 2 3 4 |
ALTER TABLE posts ADD COLUMN text_tsvector tsvector GENERATED ALWAYS AS ( to_tsvector('english', post_content || ' ' || title) ) STORED |
If you want to know moure about generated columns, check out Defining generated columns with PostgreSQL and TypeORM
Since we use the STORED keyword, we define a stored generated column that is saved in our database. PostgreSQL updates it automatically every time we modify the post_content and title columns.
We can now use our generated column when making a SELECT query to improve its performance drastically.
1 2 |
SELECT * FROM posts WHERE text_tsvector @@ plainto_tsquery('fox'); |
Ordering the results
So far, we haven’t paid attention to the order of the results of our SELECT query. Sorting the search results based on relevance could help the users quite a bit.
For example, we can indicate that the text from the title column is more important than the post_content column. To do that, let’s change how we create our text_tsvector column and use the setweight function.
1 2 3 4 5 |
ALTER TABLE posts ADD COLUMN text_tsvector tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', title), 'A') || setweight(to_tsvector('english', post_content), 'B') ) STORED |
Let’s compare the two following posts after modifying the text_tsvector column:
The combined value of the title and post_content is the same in both posts. However, the text_tsvector takes into account that the title column is more important.
Thanks to the above, we can now use the ts_rank function to order our results based on the weight of each column.
1 2 3 |
SELECT * FROM posts WHERE text_tsvector @@ plainto_tsquery('brown fox') ORDER BY ts_rank(text_tsvector, plainto_tsquery('brown fox')) DESC |
Implementing full-text search with NestJS
Let’s create a migration first to implement the above functionalities in our NestJS project.
1 |
npx knex migrate:make add_post_tsvector |
20221113211441_add_post_tsvector.ts
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
import { Knex } from 'knex'; export async function up(knex: Knex): Promise<void> { await knex.raw(` ALTER TABLE posts ADD COLUMN text_tsvector tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', title), 'A') || setweight(to_tsvector('english', post_content), 'B') ) STORED `); return knex.raw(` CREATE INDEX post_text_tsvector_index ON posts USING GIN (text_tsvector) `); } export async function down(knex: Knex): Promise<void> { return knex.raw(` ALTER TABLE posts DROP COLUMN text_tsvector `); } |
The crucial thing to notice above is that we are creating a Generalized Inverted Index (GIN). It works well with text searching and is appropriate when a column contains more than one value. Doing that can speed up our SELECT queries very significantly.
If you want to know more about indexes, check out API with NestJS #82. Introduction to indexes with raw SQL queries
In one of the previous parts of this series, we implemented the support for the search query parameter.
posts.controller.ts
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
import { ClassSerializerInterceptor, Controller, Get, Query, UseInterceptors, } from '@nestjs/common'; import { PostsService } from './posts.service'; import GetPostsByAuthorQuery from './getPostsByAuthorQuery'; import PaginationParams from '../utils/paginationParams'; import SearchPostsQuery from './searchPostsQuery'; @Controller('posts') @UseInterceptors(ClassSerializerInterceptor) export default class PostsController { constructor(private readonly postsService: PostsService) {} @Get() getPosts( @Query() { authorId }: GetPostsByAuthorQuery, @Query() { search }: SearchPostsQuery, @Query() { offset, limit, idsToSkip }: PaginationParams, ) { return this.postsService.getPosts( authorId, offset, limit, idsToSkip, search, ); } // ... } |
Finally, we need to modify the SQL queries that we make in our repository.
posts.repository.ts
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
import { Injectable } from '@nestjs/common'; import DatabaseService from '../database/database.service'; import PostModel from './post.model'; @Injectable() class PostsSearchRepository { constructor(private readonly databaseService: DatabaseService) {} async search( offset = 0, limit: number | null = null, idsToSkip = 0, searchQuery: string, ) { const databaseResponse = await this.databaseService.runQuery( ` WITH selected_posts AS ( SELECT * FROM posts WHERE id > $3 AND text_tsvector @@ plainto_tsquery($4) ORDER BY id ASC OFFSET $1 LIMIT $2 ), total_posts_count_response AS ( SELECT COUNT(*)::int AS total_posts_count FROM posts WHERE text_tsvector @@ plainto_tsquery($4) ) SELECT * FROM selected_posts, total_posts_count_response `, [offset, limit, idsToSkip, searchQuery], ); const items = databaseResponse.rows.map( (databaseRow) => new PostModel(databaseRow), ); const count = databaseResponse.rows[0]?.total_posts_count || 0; return { items, count, }; } // ... } export default PostsSearchRepository; |
Above, we use the keyset pagination that prevents us from sorting the results in a straightforward way. If you want to know more, check out API with NestJS #77. Offset and keyset pagination with raw SQL queries
Summary
In this article, we’ve gone through implementing full-text search in PostgreSQL. To do that, we had to learn about the tsvector and tsquery data types. In addition, we’ve created a stored generated column and a Generalized Inverted Index to improve the performance. By doing all of the above, we’ve created a fast search mechanism that is a good fit for many applications.
Thank You. It was very interesting 😀