------------------------------------------- Completed 6/16/2023 ------------------------------------------
Our goal is to find ways to leverage LLM and Knowledge Base technologies to enhance the eCommerce digital experiences.
This investigation will center on capabilities similar to Zillow's AI powered natural language search link
There are numerous use-cases that LLM AI can address in the eCommerce space. We are going to focus on the specific Question and Answer (QA) use case as described by Zillow.
Commerce QA is a unique beast that blends fluid customer discovery with a very specific product centric language centered on product providers, catalog categories, products/assets, client, loyalty, offers, orders and payment. It involves a fluid discovery flow where the customer is not always sure of what they want at the beginning. For example, the Zillow flow must incorporate a lot of customer specific requirements. Some customer needs that they might not even be aware of at first.
A pure LLM model with fine-tuning will not (we believe) provide the needed multi-step logic reasoning, perception about topological factors, and handling the temporal progression. link
We believe the solutions will incorporate components of both knowledge graph and natural language search.
This experiment will consist of multiple prototypes that demonstrate the various component capabilities. Over the next few weeks we want to better understand the fundamental LLM and Graph reasoning blocks that are needed. The actual end-to-end solution will be demonstrated in a future experiment.
Technologies
Investigate various LLM (Large Language Models) and their applicability
Focus on the Generative Pre-trained Transformer (GPT4) architecture
GPT4All LLM eco-system. Demonstrate use of multiple LLM's running within local windows environment
Use of mainstream opensource LLM technology like LangChain python library
Leverage mainstream knowledge graph standards like schema.org, RDF2Vec, POC4Commerce
Evaluate methods of incorporating commerce knowledge graph into Question Answer flow
Experiment demonstrations
Build robust commerce knowledge graph (ontology) that incorporates providers, catalog categories, NFT assets, asset availability, offer, client, order, and purchase. Based on POC4Commerce.
Build LLM based Question and Answer solution that incorporate embeddings based on our knowledge graph. Investigate fine-tuning to see if that is viable in the ecommerce QA use-case.
Build a Natural Language Knowledge Base QA flow and compare to the LLM QA capabilities.
First demonstration
Build semantic knowledge base that leverage schema.org, OASIS, GoodRelations and POC4Commerce schema definitions
Demonstrate how standard industry schema's can be used in concert with RDF triples within an eCommerce context. Industry models like schema.org impact the ontology (language) but the API's and storage are based on triples that reference that ontology. Very important distinction.
Build example RDF data for ecommerce flow that incorporates providers, catalog categories, NFT assets, asset availability, offer, client, order, and purchase
Query semantic knowledge base using SPARQL.
Investigate how LLM can be used to augment commerce knowledge graph with domain unstructured data.
Second demonstration
Investigate multiple LLM Models
Get LLM based AI search engine running that leverages LangChain framework and integrated commerce data via embeddings.
Investigate building vector store for embeddings via RDF2Vec. Look at other techniques for leveraging RDF data into LLM QA flow.
Investigate various flows and how to incorporate RDF data. Look at Graph Neural Networks (GNN's). Build out example QA flow.
Looking at various fine-tuning mechanisms to train our LLM. Databricks, Hugging Face, and MosaicML.
Understand model quality, bias, toxicity, and hallucination issues.
Understand tradeoffs of using opensource local models or enterprise solutions like Google or OpenAI.
Third demonstration
QA flow that leverages our knowledge graph and natural language query processing
Generate SPARQL syntax from natural language query and pre-trained models (leverage RDF)
Leverage farm-haystack python framework
Understand best ways for storing and managing schema and RDF data.
The three demonstration overview and conclusions are captured in specific demonstration focused documents.
GPT4 and LLM search engine with embeddings (need to finish)
Knowledge Graph and Natural Language Question and Answer flow (need to finish)
Our findings were relatively conclusive. LLM on its own was not adequate. Augmenting LLM with semantic knowledge is going to be key for an ecommerce QA process. Our best option was to leverage LangChain knowledge base embeddings within the AI natural language search flow. This seams to be a common conclusion that folks are coming to for use-cases like eCommerce.
Our analysis also showed the incredible pace at which this technology is moving. All of our demonstrations required us to make upgrades to python libraries to get code working. Most of the python libraries we used were less than a month old. We will need to revisit these technologies and findings every month or so.
We recommend that commerce shops focus on developing rich ontologies and focused domain specific knowledge graphs. Also build tools to maintain those knowledge graphs over time. Find ways to embed knowledge graph information (JSON-LD) into your site content (semantic web) so that third party engines can leverage it.
We also recommend looking at Natural Language Processing search techniques that leverage your knowledge graph data to enhance your current site search and navigation.
This will all prepare brands for the future. We anticipate that open source technology for a blended LLM and Knowledge Graph based QA flow will be available within the next few months. The time to market for a solution like this will be mostly determined by the brands ability to get their data ready for this capability.
It's interesting to note the google search trends in keywords like "Artificial Intelligence", "semantic web" and "ontology". The initial AI hype has been clearly centered on LLM based Question and Answer flows. In the past 12 years semantic web and ontology development has just not caught on. I seems pretty clear that the push for AI driven use-cases will probably invigorate the semantic metadata discussion.
https://makersuite.google.com/app/home
PaLM API
Semantic Search Engine
https://www.deepset.ai/blog/how-to-build-a-semantic-search-engine-in-python
LLM Search Solution (good examples)
https://github.com/ray-project/langchain-ray
https://www.anyscale.com/blog/llm-open-source-search-engine-langchain-ray
https://github.com/ggerganov/llama.cpp
https://beebom.com/how-run-chatgpt-like-language-model-pc-offline/
https://blog.replit.com/llm-training
AI engine that extracts information from various metaverse data source
Supports a query API
https://whaleanalytica.com/metaverse/
Third Experiment: Run entire system locally
look at PyTorch (Deep learning framework)
LLM Engines
Open Source LLM's
Closed LLM's
OpenAI
Microsoft
Definitions:
A GPT model is a type of neural network that uses the transformer architecture to learn from large amounts of text data
Moat: Moats are defensibility mechanisms that prevent competitors from copying your product and business
Datasource
Interesting Projects
GPT/LLM Hackathon
Project
My first working example of an LLM application using LLMChain and OpenAI
https://coinsbench.com/chat-with-your-databases-using-langchain-bb7d31ed2e76
nice overview
https://www.leewayhertz.com/build-private-llm/
Interesting but did not build
Looks like a simple example
Local Project
https://github.com/codemaker2015/sqldatabasechain-langchain-demo
C:\Web3Store\LLMService\venv
python db.py
python app.py
example to include document in usecase
https://python.langchain.com/en/latest/use_cases/question_answering.html
good examples
embeddings using langchain and gpt4all
https://artificialcorner.com/gpt4all-is-the-local-chatgpt-for-your-documents-and-it-is-free-df1016bc335
Open Source project that allows you to run the engine on your local computer. Disconnected from the internet. The demo gave me a great response to my "explain slipper slope" question.
GPT4All provides a chat client that hooks to your local server.
GPT4All provides an easy to use client python library.
GPT4All has a discord community
The model architecture is based on LLaMa
download and install from this exe: https://gpt4all.io/index.html
github: https://github.com/nomic-ai/gpt4all
it put a shortcut on my windows desktop. launched the client application
provides about 10 models to choose from. I chose gpt4all-j-v1.3-groovy because it has a commercial use license and is only 3.53GB in size.
ask question. "explain slippery slope" got a great answer.
(knowledge graphs, semantic network, JSON-LD) here
RDF Graph is a curated set of facts. Links back to schema.org. 40% of sites support JSON-LD data.
Here is a good overview, but not the answer link link
Issue with AI is hallucinations that we need to watch out for
Realize that fine tuning does not add knowledge. Today, you cannot retrain the model. Add your data into model as it is being created.
LLM and Knowledge Graphs for ecommerce link
Supervised Fine-Tuning of an LLM
Supervised Training Phase
The fine-tuning step is relatively cheap regarding computation cost due to available techniques like the LoRa and QLoRA.
NaLLM project focus on fine-tuning with RDF Graph
structured (JSON-LD) vs unstructure (web content)
real-time LLM reaches out to graph. LangChain. LlamaIndex is another GPT Index
The idea behind retrieval-augmented LLM applications like ChatGPT Plugins and LangChain is to avoid relying on internal LLM knowledge only to generate answers.
Generate real time sparql query language.
implementation of this approach
https://github.com/tomasonjo/blogs/blob/master/llm/Neo4jOpenAIApoc.ipynb
Amazon Neptune
Neo4j integration into Google AI
Neo4j unveils generative AI features for Google Cloud Vertex
example of using graph with google ai
https://neo4j.com/labs/apoc/5/ml/vertexai/
step by step implementation
https://neo4j.com/blog/use-graphs-for-smarter-ai-with-neo4j-and-google-cloud-vertex-ai/
Data graph catalog
Knowledge Graph Conference
Linkedin source of data
Code examples