https://blog.entropy.observer/sorting-400-tabs-in-60-seconds/

Ian's blog | entropy.observer

  * Home
  * Blog

Sign in Subscribe
Feb 16, 2023

Sorting 400+ tabs in 60 seconds with JS, Rust & GPT-3: Part 1

Sorting 400+ tabs in 60 seconds with JS, Rust & GPT-3: Part 1

I'm a serial tabbist. I admit it.

Currently, I have about 460 tabs open across 5 brave windows. Let's
not even get started on the bookmarks.

    "B-b-but, they're all necessary! So much knowledge! So many good
    links!"
    - My inner hoarder

Yeah, I'm like an information hamster. I just keep hoarding all the
tabs until I can find enough time to read everything - and open even
more of them on the way. And as one can assume, having so many tabs
can be quite overwhelming, either when I need to find something and
it's lost beyond the borders of the tab bar or when I'm just looking
at the screen and getting the anxious feeling of "having so much to
do" - even when there is nothing to be done.

So, being the lazy hacker I am, instead of actually sorting them,
cleaning them up
or *gulp* simply closing them all, I wondered - why not just let the
machine do the job? Can I have a 1-click solution to all my woes? Can
I Marie-Kondo my inner hoarder into submission by using code?

Luckily for us, there is a giant language model worth billions of
dollars just waiting to eagerly do the job. The idea is simple: Give
GPT3 a list of items and ask it to return a list of categories those
items belong to. Wrap all that up into a chrome extension and let the
magic happen.

So, let's crack our fingers and get coding.. or.. oh... wait..

The sweet taste of complexity

Let's backpedal a bit. So, our plan sounds simple enough. But as it
usually goes in software, we missed out on some key details that are
going to blow up our scope and budget if we don't think about them
properly.

Some of the key issues to think about before we dive into code head
first and find ourselves in a world of regret are:

  * Prompt token limits
    OpenAI's language models have token limits - 2048 or 4096 tokens.
    Since each token is about 4 characters, that limits our prompt
    and response size to 8192/16384 characters respectively.

    There are a few ways we can get around this problem (we'll cover
    all of them):
    - Cutting our prompt into consumable chunks
    - Optimising the data sent to reduce token count
    - Fine-tuning a model for our task
  * API Key security
    Since OpenAI API charges API calls by tokens used, our API key
    needs to be hidden somewhere safe. Hardcoding it in our extension
    is a no-no - unless we really want to pay OpenAI millions of
    dollars in bills because some bored script kiddy decided to
    scrape our key.
  * User privacy
    Tab titles and URL's can reveal sensitive things - private
    documents,
    links, session ID's and a lot of data about a person. We want
    users to be able to trust the extension, so we want to
    open-source it, have it build and deploy from that source and
    make it easy to deploy for others.
  * Ease of update
    Since LLM's can be fickle with their responses and OpenAI API
    could incur us insane usage costs due to simple mistakes, we want
    to have control over updates instead of letting the users do it
    at their whim. That means our most important code cannot reside
    in the extension.

How do we solve those issues?

We'll take a simple route - instead of writing all of the logic in
the extension itself, we'll hide it behind an API - we'll build a
simple backend service that will receive the tab data from the
extension, chunk our prompts, communicate with OpenAI's API and
reduce the data back into a single response. This enables us to both
secure our keys, control our updates and open-source the extension
without giving our secret token away.

To do this, we'll be using Rust - with Axum as our backend framework,
Shuttle as our deployment platform and Github Actions as our CI.

So, before we get into code, let's do some napkin sketches to get an
overview of what we're building:

[sketch](Not a real napkin - made with okso.app, an amazing
whiteboarding app made by Oleksii Trekhleb)


Step 1: Building the Extension

Chromium extension are quite simple to build - they're basically just
tiny webpages that live inside your browser and (with proper
permissions) are given access to your browser by using your browser's
API. We'll be relying on the Chrome API - it's the API Google Chrome
uses - and which many Chromium project based browsers expose (such as
Brave, which I'm using, and even Edge, tho with a different
namespace). Other browsers, like Firefox or Safari aren't built off
of the Chromium project, but provide a quite similar extension API.
If you want to know more about the differences between them, I'd
suggest this MDN article.

Specifically we'll be focusing on these two API's:

  * chrome.tabs - enables us to query tabs our user currently has
    opened
  * chrome.tabGroups - enables us to query existing groups, create
    new ones and move tabs inside them

So let's get to building. To bootstrap our extension, we'll be using
Chrome extension CLI - it will generate the initial project structure
we need.
So, hit the terminal with:

npm install -g chrome-extension-cli
chrome-extension-cli bookie-js
cd bookie-js

Follow the instructions at the end and load the build folder as an
extension - it will allow you to load and test your extension via hot
reload, so every change will be immediately visible.

Now, take a peek inside the structure it generated - most of it is
self-explanatory,

+-- README.md
+-- config
|   +-- paths.js
|   +-- webpack.common.js
|   +-- webpack.config.js
+-- node_modules
+-- package-lock.json
+-- package.json
+-- pbcopy
+-- public
|   +-- icons
|   +-- manifest.json
|   +-- popup.html
+-- src
    +-- background.js
    +-- contentScript.js
    +-- popup.css
    +-- popup.js

We're mostly interested in only three files for now:

public/manifest.json

The manifest is a JSON file which provides the browser with
information about your extension, such as name, it's capabilities,
how it's started, which file to display, scripts to run on pages and
many more. A few fields to note there for us:

  * default_popup - the HTML file to show when the extension icon is
    clicked
  * permissions - we need them to access certain parts of Chrome API
  * host_permissions - a set of URL patterns your extension can
    access

For now, we'll leave it all as it is and come back to it later.

src/popup.html

The starting point of our UI. This HTML pops up when we click the
extension button in the browser, so we'll use it to build a simple
interface here.
We'll have a 'Sort' button that calls our API's /sort endpoint and
returns the result, a loading bar and a simple error box in case
anything goes wrong.
For debugging, we can also have a "Show tabs" button that will show
as a list of all of our tabs. So let's write some simple HTML for it:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>Bookie JS</title>
    <link rel="stylesheet" href="popup.css" />
  </head>
  <body>
    <div class="app">
      <div class="button-container">
        <!-- This will call our API -->
      <button id="sortBtn" class="button">Sort my mess</button>
      <div id="loading" class="loading"></div>
      <div id="error" class="error"></div>
    </div>
  </div>
  <script src="popup.js"></script>
</body>
</html>


src/popup.js

This is where our JS will reside. We ain't gonna use no fancy
bulletproof cybernetically CRISPR'd SSSR JavaScript framework, it's
going to be our plain ol' vanilla JS. To update the UI, we will rely
on a simple render(state) function that manipulates DOM elements
using some simple show and hide functions (by changing
element.style.display to block/none).

Now, let's write our thought process down by writing it into
functions:

'use strict';

import './popup.css';

(function () {

const SORT_BTN = 'sortBtn';
const LOADING = 'loading';
const ERROR = 'error';

// get tabs & groups from the API
async function getTabsAndGroups(){};

// call backend with the data
async function callBackendToSort(tabsAndGroups){};

// apply result to browser
async function applySort(sortedCategories){};

//runs our app
async function run(){

 //get tabs
 let tabsAndGroups = await getTabsAndGroups();
 render({loading: false, error: null}

 let btn = document.getElementById('sortBtn')

 //on click, call the API, show loading and apply the results when done
 btn.addEventListener('click',async ()=> {
     render({loading: true, error: null}
      try {
        let result = await callBackendToSort(tabsAndGroups)
        await applySort(result)
        render({loading: false, error: undefined})
      }catch (e){
        render({loading: false, error: e})
      }
 })
}

//load our run function when the content loads
document.addEventListener('DOMContentLoaded', run);

})();

Our first step will be querying the Chrome API for tabs and groups.
As we can see in the docs, we can use chrome.tabs.query to achieve
this.

[Screenshot-2023-02-13-at-15]

So, let's try it:

async function getTabsAndGroups() {
    let chromeTabs = await chrome.tabs.query({})
    console.log(chromeTabs)
  }

Not working? Now, remember that public/manifest.json file? And the
permissions object?

Well, to access tabs, their titles and groups, we'll need to add
matching permissions to it. So open up the manifest.json and under
permissions add "tabs", "tabGroups". Now when installing, chrome can
check your extensions permissions and let the user know what you're
accessing.
But, to be able to access the tabs API, we'll need one other special
permission called host-permissions. It tells the user which websites
the extension is enabled to run on, so if we want to be able to use
it on all tabs we'll need to add the proper URL pattern. So add a new
property to the manifest.json called host-permissions with a pattern
allowing it to match all URL's such as "host_permissions": ["*://*/
*"]. Finally, now we are able to access all of the user's tabs and
groups.

Now that it's working, the data the chrome.tabs.query method returns
will contain a few things we'll need: id, title and groupId. We'll be
using id and title for sorting, and groupId to query existing groups,
so first, we'll map the returned object to a simplified version of
it, using only the properties we need.

To get more data about groups, we'll create tabsForGroups function
which will find all the unique groups and query Chrome API by using
chrome.tabGroups.get(id) to get the title of each group.

async function tabsToGroups(tabs){
  //get all existing groupIds from tabs
  let groupIds = tabs
      .map( (it)=>it.groupId)
      .filter((it)=>it!==null && it!==undefined && it!==-1);

  //push them into a set to get unique ones
  let groups = new Set(groupIds)

  //query chrome API for data about each tab group
  return await Promise.all([...groups]
      .map(async (it) => {
      let item = await chrome.tabGroups.get(it)
        return {
          id: item.id,
          title: item.title
        }
    }));
  }

// now our function can return us all of our tabs and groups
async function getTabsAndGroups() {
    let chromeTabs = await chrome.tabs.query({})
    let tabs = await mapTabs(chromeTabs)
    let tabsWithGroups = await tabsToGroups(tabs)
    let groups =  tabsWithGroups.filter((it)=>it.title.length !== 0);
    return {
      items: tabs,
      categories: groups
    }
  }

Boom, in a few simple steps we have the list of our existing groups
and tabs.
The API calling function is also quite simple. Since our API doesn't
exist yet,
we'll just write a generic POST request to localhost:

async function callBackendToSort(data){
 return await fetch('http://127.0.0.1:8000/sort',{
      method: 'POST',
      headers: {'Content-Type': 'application/json'},
      body: JSON.stringify({
        items: data.items,
        categories: data.categories
      })
    })
}

Our render function is quite simple too - we just check the state and
change our UI accordingly.

function render(state){
    if(state.loading){
      show(LOADING)
      hide(SORT_BTN)
      hide(ERROR)
    }else{
      hide(LOADING)
      show(SORT_BTN,true)
    }
    if(state.loading!==true &&
      (state.error!==undefined && state.error!=null)){
      show(ERROR)
      showError(state.error)
    }else
      hide(ERROR)
}

All that's now left to do is implement the applySort function which
will apply our new categories to the browser itself.

The idea is:

  * Check if the group exists
  * If it doesnt, create it
  * Update it's tabs list and title

For this, we have a bit of API research to do - the documentation
covering this part is a bit confusing. You'd expect to be able to
have something like
chrome.tabGroups.create or chrome.tabGroups.update which would change
tabs in the group, but... that's naive thinking.

To create a group we use the API call chrome.tabs.group by NOT
passing the chrome.tabs.group a groupId. Then, the group will be
created and the new groupId returned to you. This is kind of a weird
call by the chrome team - if groups are just containers of tabs, why
would tabs have knowledge and control over them?

Shouldn't the groups be created and managed via groups API?

Oh also, if you want to add tabs to the group, you use the same call
and pass it the array of tabs via tabIds. "Hey can I pass in the
title too since we're already creating and updating the object via
this API call?" No, for that you'll use chrome.tabGroups.update API
call.

I assumed this weird syntax is because groups were a later addon in
chrome so support was retrofitted into the tabs API itself. So let's
test that assumption. Looking at the commit that added groups to the
Tabs API, we can find the same discussion in the comments, leading us
to the Tab Group API proposal. It seems the team decided to split the
responsibilities between tab management and group management. Since
moving a tab is tab management, it's responsibility belongs in the
Tabs API.

The alternative proposal was also discussed (putting that
responsibility in the TabGroups API), along with it's pros and cons:
Screenshot-2023-02-16-at-21.29.44

From my perspective (as the user of the API), the cons list doesn't
seem that bad. Tabs wouldn't need to know about groups, user security
would be increased (extensions would only need tabGroups permission,
reducing the potential area for malicious abuse by extensions) and it
would hide the implementation details, replacing them with an
intuitive API, which is what abstractions are all about. Weird
decision none the less.

But enough talking about the spaghetti, let's write some down.

function applySort(sortedCategories){

/* The response object we want looks like:
{ categories: [
        { category_id: int, category_title: string, items: [int] }
    ] }
*/


  for (i = 0; i < sortedCategories.categories.length; i++) {
     let category = sortedCategories.categories[i]
     let categoryId = category.category_id
     //check if the group with ID exists
     let groupExists = await chrome.tabGroups.get(categoryId)
                                        .catch((e)=>undefined);
      let groupId;
      if(groupExists === undefined)
         //if it doesnt, the chrome.tabs.group returns us an ID
         groupId = await chrome.tabs.group({ tabIds: category.items });
      else {

        //if it does, we use the existing one
        groupId = groupExists.id
        await chrome.tabs.group({groupId: groupId,
                                tabIds: category.items});
      }

      // Set the title of all groups and collapse them
      await chrome.tabGroups.update(groupId, {
        collapsed: true,
        title: category.title
      });


  })
}

With this, our JS extension MVP is done.
- We collect the tabs and groups
- We send them to the API
- We apply the returned sort.

Now, we don't have an API yet, so how do we test it?
We should write down some unit tests, but let's leave that for
another day (no really - a few posts down we'll look into testing a
chrome extension with Jest). For now, we can fake the return of
callBackendToSort function to include a few categories and a few tab
id's - something like this (but with your tab id's):

{
        "categories": [{
                "category_id": 837293848,
                "category_name": "Hacker News",
                "items": [1322973609, 1322973620]
        }, {
                "category_id": 837293850,
                "category_name": "Science",
                "items": [1322973618, 1322973617, 1322973608]
        }, {
                "category_id": 837293851,
                "category_name": "GitHub",
                "items": [1322973619]
        }, {
                "category_id": 837293852,
                "category_name": "Web Development",
                "items": [1322973612, 1322973613, 1322973615, 1322973616]
        }, {
                "category_id": 837293853,
                "category_name": "Web APIs",
                "items": [1322973646]
        }]
}

Now we can move on to the fun parts - building that API, prompt
optimisations, GPT timeouts and fixing mistakes we'll make in the
days of the future past.
Oh and we'll also be adding some more complexity and feature creep,
but more on that later.

Stay tuned for Part 2 where we'll continue our adventure with
everyone's favourite crab - Rust.

[000039]Rusty the crab cute illustration, simple, clean, 2022 (The
Artist Is A Machine)









Previous issue
Browse all issues

Subscribe to Ian's blog | entropy.observer

Sign up now to get access to the library of members-only issues.
Jamie Larson
Subscribe
Ian's blog | entropy.observer (c) 2023

  * Sign up

Powered by Ghost