How to use the 'wolkvox vision' component in wolkvox Studio
Table of Contents
Introduction
The “wolkvox vision” component allows you to apply Artificial Intelligence to analyze images within a conversational flow: from detecting objects/labels, reading text in an image, and detecting faces, to performing advanced analysis guided by instructions. With this, you can automate validations, data extraction (for example, documents, screenshots, receipts), and enrich customer interaction without manual intervention.
This component is located in the Cognitive group and is available for Interaction, Chat, and CRM + Webhook routing points.
Configuration
- Double-left-click on the component to open its configuration panel.
- Configure its fields:
- In the “Type of wolkvox Vision” field, choose what you need to obtain from the image. The available options are:
- DETECT_LABELS: detects labels/categories of what appears in the image (objects or elements).
- DETECT_TEXT: extracts visible text in the image (useful for signs, screens, printed text, etc.).
- DETECT_FACES: detects faces and provides associated information (for example, face location and estimates such as age/gender).
- OBJECT_LOCALIZATION: locates specific objects and indicates their position within the image.
- TEXT_DETECTION: similar to DETECT_TEXT, focusing on text detection/extraction and potentially providing more structural detail.
-
WV_TOTAL_VISION: a more advanced and “guided” analysis, designed for scenarios where you need a more complete and reliable response (usually supported by clear instructions).
- Pro tip: if your goal is to “extract specific data” (for example: ID number, value, date, entity), WV_TOTAL_VISION + good instructions is usually the best choice.
- In the “Image URL” field, type:
- A public URL where the image is hosted, or a variable containing that URL (depending on your flow and the channel).
- Important: the component needs to be able to access the image. If the URL is not public or accessible from the service, the analysis may fail or return empty results.
- In the “Instructions” field, write clearly and specifically what you want the AI to extract or report.
- Instruction examples (you can adapt them):
- “Extract: document type, number, full name, and expiration date.”
- “Read the text and provide a 3-line summary.”
- “Identify the main elements of the image and describe them in a list.”
- “Detect if there is a payment receipt and extract: value, reference, and date.”
- Recommendation: ask only for what is necessary and define the expected format (list, fields, short text, etc.) to obtain more consistent responses.
- Instruction examples (you can adapt them):
- In the “Info Vars” block (visible in the panel), the component shows you the variables available to consume the result:
- $count_vision: number of results obtained in the analysis.
- $txt_vision: complete analysis response in text.
- $json_vision: complete analysis response in JSON format (array).
- These are the variables you will use later in your flow.
- Click on “Save” to apply the component changes.
- In the “Type of wolkvox Vision” field, choose what you need to obtain from the image. The available options are:
