All Insights
Current Article:
Data & AI Use Case

Data & AI Use Case

Posted on
February 22, 2023
, By
Nađa Vukićević

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.


Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

  • Test text
  • Test text
  • Test text

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.

Over the past few years, Data and Artificial Intelligence have become the most trending technology used across all industries. Yes, technology (singular) because Data and AI need to work in synergy to generate the best results. 

By the time the term 'Data' was coined, there was already so much of it that IT professionals and computer scientists quickly realized that sorting out and parsing all the data and, at the same time, analyzing the same data would be impossible for humans. They knew that if anything could help, it would be AI algorithms that would be able to derive valuable real-time insights from that much data. 

The merge of these two disciplines created a technological boom changing businesses left and right. But how exactly?


AI for Business

Unlike a decade ago, the internet is now filled with data providing important information about consumer habits, likes, dislikes, personal preferences, and activities. This type of data, when paired with AI technology, can help businesses:

  • boost efficiency with various process automation solutions
  • improve consistency and speed of service
  • use customer insight and review to inform decision-makers
  • discover new opportunities for new products and services
  • it can detect fraud threats
  • optimize supply chain operations


We could list the benefits of AI for business for days. However, many people would still need clarification, a concrete example, about how to get a taste of some of these benefits and how they can improve their business operations with AI. 

Here at Quantox Technology, we believe in a practical approach and the power of learning. With more than 550 IT experts, our team has some of the best mentors in the industry, ready to convey the usage and possibilities of their products in the simplest way. 


Quantox E-commerce AI Solution Use Case

The Quantox team was introduced to an e-commerce client that needed an AI solution to simplify one part of their working process. Our client had an online furniture store where they sold design furniture and artwork. So naturally, they've set up a webshop with thousands of furniture listings where you can filter your search based on your preferences. For example, you could filter furniture by type (beds, couches, sofas, tables), color, manufacturer, material, and many other specifications.

That meant every product needed a string of descriptive words in their specifications for their search filter to work. For example: If you wanted a black table, you would get a listing of any product that mentions the words "black" and "table." But if you wanted a "Sanne Protin black dining table," if labeled correctly, you would get only one result. 


However, the client's problem was not labeling products correctly, rather than the labeling process being money and time-consuming. At that time, their products were being labeled manually, meaning they needed to pay people to input labels individually. 



To fix this problem, the Quantox Data & AI team proposed they create an automatic labeling system that will save them valuable resources. We created an AI system that takes an ad as an input and returns "labels" as an output, meaning product type, color, manufacturer, and material. In terms of business, the result was a significant decrease in the labeling cost and time to obtain the labels, which generally sped up the entire listing processing, had a considerable impact on their business, and positively affected their business efficiency.



Translating Human Problems to Data Problems 


So, we were building this system. It's a simple task for a human but could be difficult for a computer. To successfully transfer this business problem into a data problem, we must ask the right questions.

 What would a human do if they had the task of labeling the products?

The trick is to answer these questions in as much detail as possible. For example, some would say, "well, a human would read the description," but the correct answer is, "the human would look at the listing." That means our internal conversation would look something like this:

Through questions, we slowly realize that humans would look at the image and type in what they see. For example, they see a black table, so they would type that it's a table and it's black. They could also read the description to see who the manufacturer is and what material was used. 



The answers to our questions told us our system needs to be able to emulate human behavior. Mainly it needs to be able to "see" the images and recognize objects on them, but also recognize text, read it, and understand the meaning of it too in order to extract the vital information needed for labeling.

Well, what can "identify" objects in AI? Computer vision certainly can. 

And how can a computer read and understand text just like a human? With Natural Language Processing, of course.



As always, we needed data to apply to our AI algorithm, so we took around 20 thousand pre-existing labels to train our AI models. The diagram below shows how data trained our models and how it was deployed, monitored, and maintained.

On the left side, you'll notice a very important box saying, "hold out a fraction aside." This means we took a small fraction of data from the 20 thousand label batch (about 10-20%), which is the data we did not show to our model. The reason is simple - we wanted to test our model on that data after it's trained. This way, we can see how our model performs on data it hasn't seen before. 

In the middle, you can see that we have two separate models. One that works with images and one that works with text. Each model is tasked to learn different things, and finally, that information is combined to output the desired information. 

The task is simple: "Build a system that says: This listing is selling a lamp, and the color of the lamp is beige-brown-yellow-ish."

We'll learn how this simple task can quickly become a bit more complicated for a computer.

The listings are given as "image+text" pairs, and we're free to choose from a wide arsenal of AI techniques as long as the solution delivers the desired information that is correct for the majority of the time. In the following sections, we describe some methods we opted for.

Our dataset contains around 20 thousand examples similar to this one, containing images of various furniture types: couches, sofas, tables, wardrobes, etc. In addition, the data includes information about where the object is on the image and the product being sold. We used 15,000 examples to train our AI algorithms (training data), while the other 5,000 we used as test data.

Here's an example of what a typical listing looked like.

Disclaimer: The product shown in the photo is for illustration purposes only and does not represent the actual product of our clients. It is used solely to explain and demonstrate the capabilities of our AI models.


Our computer vision model was supposed to extract information from the image. We expected it to localize a lamp and tell us that it is, in fact, a lamp. To do so, our algorithm needed to perform two tasks: object localization and image classification. 

A bounding box consists of 4 numbers representing the coordinates of the box itself:

X_upper_left, Y_upper_left, X_lower_right, Y_lower_right.

Our model, in essence, is a model that tries to predict these four coordinates in an image and assign the label "lamp" to this listing. It does it in such a way that the predicted coordinates should mostly overlap with coordinates provided by humans. At the same time, the product label ("lamp" in this case) should match the human annotations in most cases. 


GOAL: Maximize the overlap and assign the correct label.
Now that we know what the model should do let's see how it does it. 



Each image was run through a convolutional neural network (CNN) tasked with learning the features of the image. Neural networks are made of neurons. Neurons group into layers. Layers hold numbers used to process our data (by multiplication and addition). Those numbers represent the knowledge of our network. Intelligence even, if you will.

The CNN model we created contains 53 convolutional layers. Each layer transforms the image by applying a convolution operation on the image. Convolution is actually a whole lot of additions and multiplication following the pixel grid of the image and some additional rules.

What that means is, at a high level, we plug an original image into the CNN, and it goes through each of these layers sequentially. The output of each layer is a slightly modified version of the original image. These modifications can highlight contours and shapes of interest in the original image or blur and make insignificant contours and shapes disappear.

We program the network to tweak the convolution operation and figure out what these shapes are on its own. Then, we let the network try multiple times until the results are satisfactory according to a predetermined criterion. This process of tweaking the layers (or, more technically, tweaking the parameters of the convolutions) is known as training the neural network. 

When the image passes all 53 convolutional layers, we add another two layers, which we construct in such a way that we can read what the network thinks the bounding box is and what is the object residing in that bounding box. Finally, we read predictions from these final layers. They're tweaked similarly to convolutional layers during training. When this lamp was successfully detected, we cropped out the predicted bounding box, extracted the dominant colors, and gave it its first label - a "lamp."




Now that we know how the computer vision part of the solution works, the question arises: "Why do we need text at all? Isn't computer vision all we need because we got the colors and the product?"

Well, not exactly. What if the lamp was labeled differently, for example, "torch"? In this example, it isn't, but there could be a lot of other instances where labels can be nothing more than misleading synonyms. We can't simply rely on one-word descriptions of the product coming from some people we don't even know who labeled these listings. They might have labeled it for a different purpose, who knows. That's why we're complementing the information we extracted from images with the data from text. We're shrinking the margin of error and trying to output results that align with the industry's terminology as much as we can. 

Plus, the computer vision model can detect a lamp and might even detect some sort of cat on the bottom of the lamp. Of course, it's essentially wrong detection (because it's not a cat, but a llama), but the text helps us deal with these cases as well, and now we'll see how.

So, let's get technical. Now suppose that our computer vision model detected both a lamp and a cat.

Now we have two detections: a lamp and a cat. Certainly, a cat is not for sale, nor is it a real cat. It's a cat-shaped pedestal for the lamp, which is probably not even a cat but a llama. This is just to give you a sense of ways these models can be wrong. We need to filter the object detection algorithms' outputs by analyzing text. Remember that the listing description states:

"Created and designed by the famous 20th-century art deco designer Jean-Michelle Jarre, this nightlight can be a great addition to your living room or bedroom for only $99.99". 

Our goal is to search for the words "lamp" and "cat" in this text and find a match. But we would fail if we want to match the exact string of words. There are no words "lamp" or "cat" in the description. But there is the word "nightlight." You'd agree that refers to the lamp. Of course, you would, but how do we build an AI system to figure that out?

That's the goal here. And the tool is, again, a neural network. Specifically, the neural network performing the task of "word embedding," unlike the former network performing the task of "object detection." 

The neural network's inputs are text sequences, and the final output is a numerical representation of the input word. Literally, a sequence of numbers. How many numbers? We can decide that, but the common practice is higher degrees of 2, such as 256 or 512. 

So yes, sequences of numbers, like the lottery, but with one crucial property. When two words of similar contexts, such as "nightlight" and "lamp," are represented numerically, they have similar representation, again by some predefined criteria. However, when words of vastly different contexts are represented numerically, these representations are different, such as "lamp" and "truck," for example.  

So we create a neural network consisting of layers, except they're not convolutional. They're different types of layers, each connected and the final layer from where we can read predictions. Like most NNs, this one also has to predict something, and by tweaking this network, it learns which words frequently co-occur, effectively meaning that the network learns which words are contextually similar. Its knowledge is represented in its layers, which we then extract from the network and use as our "numerical representation of the words themselves." 

When this is trained, we get a numerical representation of all the words from the listing description and all the words from the object detection model, namely, "lamp" and "cat." Now we measure the similarity between the word "lamp" and all the words from the description individually and then repeat the process for the word "cat." The similarity score is a cosine (remember trigonometry?) between the numerical representations of two words for which we calculate the similarity. This cosine similarity ranges between 0 and 1, 0 meaning not similar, and 1 meaning identical. 

Now relying on the fact that the listing descriptions are highly likely to mention the product itself or at least some products' synonyms, we can expect that whatever the product is ("lamp" or "cat"), it will have a high similarity score with whatever the product is in the description. 

  • In this case, the words "lamp" and "nightlight" have, say, the highest similarity between all the combinations, so we say - "THE PRODUCT IS THE LAMP."  
  • The word "cat" will have minimal similarity with the words in the description because the description never mentions animals, cats, dogs, pets, or anything within that context, and context is learned in the neural network, so "cat" will end up omitted. That's how we know - "THE PRODUCT IS NOT THE CAT."

Now let's see the results. 



When all this is said and done, and the models are trained and validated, it's time to put them to the test and see how they behave on the data they weren't trained on. Remember that the test data contains the "text + image" and the "location + object" parts. We hide the "location + object" part, obtain the predictions from it, and then compare them with the actual location and object. 

When predicting a product, we achieved 91% accuracy on the training set of around 5k examples. As for colors, the number is at about 70% on the same dataset, but it is a pessimistic estimate based on sampling and employing the naked eye to method to say "this is right" and "this is wrong." The reason for that is that people see colors differently. In the case of our lamp, one person can say "beige," but the other person might say "light yellow." Suppose you convert the color names into an RGB system. In that case, they can differ vastly, so if we calculate the Euclidean distance between "beige" and "light yellow" in the RGB coordinate system, we can wrongly conclude that the model made a wrong prediction. In fact, both the model and the person labeling are right. We believe the actual number is much higher than 70% since the errors mostly come from the edge cases where you can argue it's the same color.



Once we've run all data through our models and our models were trained and tested, we deployed the model in production. This means that we need to save the models in some sort of model storage, which could be on some location on our server, on our client's server, or on a cloud storage service of choice. In this case, it was the client's server.

Specifically, we used Flask to create a Python API and Docker for deployment. When the API is called, it triggers a forward pass through the neural networks, which returns some predictions. The application then processes those predictions as described above and returns a response. Finally, the client integrates with our model by calling our API from their application, which gets them the desired information. 

They can then process the information however they see fit, e.g., display them in some sort of drop-down menu on their front end or have radio buttons to help their users filter their search when buying furniture. Here's a diagram describing the solution's architecture at the highest level.

And that's it. We described the process of training a computer vision model, a natural language processing model, we touched neural networks a bit, shining a light on what it means to train the neural network, what are "As" and "Is" in "AI," and how the information obtained from these networks is used in practices to improve the user experience when shopping for furniture online! A lot of math and engineering for such a seemingly simple task, right? However, the work pays off when implemented correctly, especially in the eCommerce industry, where there's a great deal of data and many different types of applications for AI solutions.


Data, AI, and Quantox - a Perfect Recipe for eCommerce Growth!

Transforming your eCommerce business paradigm in 2023 can make your business stand out from the competition by increasing revenue, conversion rates, and the overall popularity of your brand. Integrating AI technology in your eCommerce business gets that much easier with Quantox Technology team of experts that have been providing Data and AI services and eCommerce solutions for years and have a proven track record of success. Our team will take the time to understand your unique needs and develop a customized solution that fits those needs. So, are you ready to unleash the potential of AI for your eCommerce? Let's Talk!