Methods to construct your personal AI assistant for bookmark looking out? | by Jiaqi Chen

Machine Learning

Methods to construct your personal AI assistant for bookmark looking out? | by Jiaqi Chen | Apr, 2024

hhhhm

2024年4月17日

Methods to construct your personal AI assistant for bookmark looking out? | by Jiaqi Chen | Apr, 2024

[ad_1]

Code: https://github.com/swsychen/Boomark2Sheet_Chromeplugin

To move bookmarks right into a Google Sheet for additional processing, we have to construct a custom-made Chrome plugin (or extension) first.

Code construction overview for a Chrome plugin. — Snapshot from creator’s vscode.

Crucial file for a Chrome extension is the manifest.json, which defines the high-level construction and conduct of the plugin. Right here we add the mandatory permissions to make use of the bookmark API from Google Chrome and observe the adjustments within the bookmarks. We even have a subject for oauth2 authentication as a result of we are going to use the Google Sheet API. You have to to place your personal client_id on this subject. You possibly can primarily comply with the Arrange your surroundings part on this hyperlink to get the client_id and a Google Sheet API key (we are going to use it later). One thing you need to discover is:

In OAuth consent display, it’s essential to add your self (Gmail handle) because the take a look at person. In any other case, you’ll not be allowed to make use of the APIs.
In Create OAuth consumer ID, the appliance kind you need to select is Chrome extension (not the Internet software as within the quickstart hyperlink). The Merchandise ID wanted to be specified is the plugin ID (we can have it after we load our plugin and yow will discover it within the extension supervisor).

The blurred half is the Merchandise ID. — Snapshot from creator’s Google Chrome extension supervisor.

The core purposeful file is background.js, which might do all of the syncs within the background. I’ve ready the code for you within the GitHub hyperlink, the one factor it’s essential to change is the spreadsheetId initially of the javascript file. This id you possibly can determine it within the sharing hyperlink of your created Google Sheet (after d/ and earlier than /edit, and sure it’s essential to manually create a Google Sheet first!):

https://docs.google.com/spreadsheets/d/{spreadsheetId}/edit#gid=0

The primary logic of the code is to take heed to any change in your bookmarks and refresh (clear + write) your Sheet file with all of the bookmarks you’ve got when the plugin is triggered (e.g. once you add a brand new bookmark). It writes the id, title, and URL of every bookmark right into a separate row in your specified Google Sheet.

What it appears like in your Google Sheet. — Snapshot from creator’s Google Sheet.

The final file popup.html is principally not that helpful because it solely defines the content material it exhibits within the popup window once you click on the plugin button in your Chrome browser.

After you be sure that all of the information are in a single folder, now you’re able to add your plugin:

Go to the Extensions>Handle Extensions of your Chrome browser, and activate the Developer mode on the highest proper of the web page.
Click on the Load unpacked and select the code folder. Then your plugin shall be uploaded and working. Click on the hyperlink service employee to see the printed log data from the code.

As soon as uploaded, the plugin will keep operational so long as the Chrome browser is open. And it’ll additionally mechanically begin working once you re-open the browser.

Estuary Move is principally a connector that syncs the database with the info supply you supplied. In our case, when Estuary Move syncs knowledge from a Google Sheet right into a vector database — Pinecone, it’s going to additionally name an embedding mannequin to rework the info into embedding vectors which can then be saved within the Pinecone database.

For organising Estuary Move and Pinecone, there’s already a fairly complete video tutorial on YouTube: https://youtu.be/qyUmVW88L_A?si=xZ-atgJortObxDi-

However please concentrate! As a result of the Estuary Move and Pinecone are in quick growth. Some factors within the video have modified by now, which can trigger confusion. Right here I checklist some updates to the video in an effort to replicate the whole lot simply:

1.(Estuary Move>create Seize) In row batch measurement, you could set some bigger numbers in keeping with the full row numbers in your Google Sheet for bookmarks. (e.g. set it to 600 should you’ve already obtained 400+ rows of bookmarks)

2. (Estuary Move>create Seize) When setting Goal Collections, delete the cursor subject “row_id” and add a brand new one “ID” like the next screenshot. You possibly can maintain the namespace empty.

Change Cursor Subject. — Snapshot from the Sources on Estuary Move (April 2024)

3. (Estuary Move>create Seize) Then swap to the COLLECTION subtab, press EDIT to alter the Key from /row_id to /ID. And also you must also change the “required” subject of the schema code to “ID” like the next:

Change Key and Schema. — Snapshot from the Sources on Estuary Move (April 2024)

    //...skipped
"URL": {
"kind": "string"
},
"row_id": {
"kind": "integer"
}
},
"required": [
"ID"
],
"kind": "object"
}

After “SAVE AND PUBLISH”, you possibly can see that Collections>{your assortment title}>Overview>Information Preview will present the right ID of every bookmark.

4. (Estuary Move>create Seize) Within the final step, you possibly can see an Superior Specification Editor (within the backside of the web page). Right here you possibly can add a subject “interval”: 10m to lower the refresh fee to per 10 minutes (default setting is per 5 minutes if not specified). Every refresh will name the OpenAI embedding mannequin to redo all of the embedding which can value some cash. Reducing the speed is to avoid wasting half of the cash. You possibly can ignore the “backfill” subject.

Specify the interval. — Snapshot from the Sources on Estuary Move (April 2024)

        //...skipped
"syncMode": "full_refresh"
},
"goal": "CJQ/mybookmark/bookmarks_v3"
}
],
"interval": "10m"
}

5. (Estuary Move>create Materialization) The Pinecone surroundings is usually “gcp-starter” for a free-tier Pinecone index or like “us-east-1-aws” for standard-plan customers (I don’t use serverless mode in Pinecone as a result of the Estuary Move has not but supplied a connector for the Pinecone serverless mode). The Pinecone index is the index title once you create the index in Pinecone.

6. (Estuary Move>create Materialization) Listed here are some difficult elements.

First, you need to choose the supply seize utilizing the blue button “SOURCE FROM CAPTURE” after which go away the Pinecone namespace in “CONFIG” EMPTY (the free tier of Pinecone will need to have an empty namespace).
Second, after urgent “NEXT”, within the emerged Superior Specification Editor of the materialization, you should ensure that the “bindings” subject is NOT EMPTY. Fill within the content material as within the following screenshot whether it is empty or the sector doesn’t exist, in any other case, it gained’t ship something to Pinecone. Additionally, it’s essential to change the “supply” subject utilizing your personal Assortment path (identical because the “goal” within the earlier screenshot). If some errors pop up after you press “NEXT” and earlier than you possibly can see the editor, press “NEXT” once more, and you will notice the Superior Specification Editor. Then you possibly can specify the “bindings” and press “SAVE AND PUBLISH”. Every little thing must be okay after this step. The errors happen as a result of we didn’t specify the “bindings” earlier than.
If there’s one other error message arising after you’ve got revealed the whole lot and simply returned to the Vacation spot web page telling you that you haven’t added a group, merely ignore it so long as you see the utilization will not be zero within the OVERVIEW histogram (see the next screenshots). The histogram principally means how a lot knowledge it has despatched to Pinecone.

Make sure that the “bindings” subject is crammed in like this. — Snapshot from the Locations on Estuary Move (April 2024)

"bindings": [
{
"resource": {},
"source": "CJQ/mybookmark/bookmarks_v3",
"fields": {
"recommended": true
}
}
],

Don’t panic in regards to the error, press “NEXT” once more. — Snapshot from the Locations on Estuary Move (April 2024)

Make sure that the utilization in OVERVIEW will not be empty. — Snapshot from the Locations on Estuary Move (April 2024)

7. (Pinecone>create index) Pinecone has give you serverless index mode (free however not supported by Estuary Move but) however I don’t use it on this undertaking. Right here we nonetheless use the pod-based choice (not free anymore since final checked on April 14, 2024) which is effectively sufficient for our bookmark embedding storage. When creating an index, all you want is to set the index title and dimensions.

8. (Pinecone>Indexes>{Your index}) After you end the creation of the Pinecone index and ensure the index title and surroundings are crammed in appropriately within the materialization of Estuary Move, you’re set. Within the Pincone console, go to Indexes>{Your index} and you need to see the vector rely exhibiting the full variety of your bookmarks. It might take a couple of minutes till the Pinecone receives info from Estuary Move and exhibits the right vector rely.

Right here I’ve 402 bookmarks, so the vector rely exhibits 402. — Snapshot from Pinecone (April 2024)

Code: https://github.com/swsychen/BookmarkAI_App

We’re nearly there! The final step is to construct an exquisite interface similar to the unique ChatGPT. Right here we use a really handy framework referred to as Streamlit, with which we will construct an app in only some strains of code. Langchain can also be a user-friendly framework for utilizing any giant language mannequin with minimal code.

I’ve additionally ready the code for this App for you. Comply with the set up and utilization information within the GitHub hyperlink and revel in!

The primary logic of the code is:

get person immediate → create a retriever chain with ChatGPT and Pinecone → enter the immediate to the chain and get a response → stream the consequence to the UI

The core a part of the code. — Snapshot from creator’s vscode.

Please discover, that as a result of the Langchain is in growth, the code could also be deprecated should you use a more moderen model aside from the said one in necessities.txt. If you wish to dig deeper into Langchain and use one other LLM for this bookmark looking out, be at liberty to look into the official paperwork of Langchain.

[ad_2]