gemini explore (beta) - user guide · 2020. 6. 1. · restarting gemini explore known...

Gemini Explore (Beta) - User Guide


Version: Beta 1.1

Introduction

Gemini Explore Demo

Network Administrator Installation GuideSupported Versions

System Requirement

Network Access

Deploy the OVA template

User Information Required

User Instructions - Gemini ExplorePrerequisites before you begin:

Components that comprise Gemini Explore

Gemini Explore - Exploration Dashboard

Gemini Explore - Data Onboarding & Modeling Engine (DOMe)

Explore Cockpit interface

Preparation of your Data SourceCSV Data Sources

Splunk Data Sources

Data Modeling - Basic Rules of the Road

Stage 1: Connecting to Gemini Explore

Stage 2: Selecting the Data Source

Example of a CSV Data Source

Example of a Splunk Data Source

Stage 3: Creating the Model

Creating Nodes

Creating Edges

Adding different Icons to represent Elements on Canvas

Stage 4: Creating the Graph

Stage 5: Creating the Flow

Stage 6: Viewing the Model using Gemini Explore

Editing Sources, Models and Graphs

Clean-up and Removal of Model Data and ComponentsI have edited my Model or Splunk query and I want to re-ingest for use in Explore

I have renamed my Model and I want to re-ingest data for use in Explore

I want to delete everything and start from scratch

Tutorial 1: Using Gemini Explore with a CSV data sourceStep 1: Login and add the data sources

Step 2: Create the Models

Step 3: Create the Graphs

Step 4: Create the Flows

Step 5: Viewing the Models using Gemini Explore

Step 5.1: Working with the Graph Canvas

Step 5.2: Exploring data using the Canvas

Tutorial 2: Using Gemini Explore with a Splunk data sourceStep 1: Preparing your Splunk environment

Step 2: Login to Gemini Explore and Perform a Clean-up Operation

Step 3: Add a new Splunk Source

Step 4: Edit the Model

Step 5: Edit the Graph



Tutorial 3: Machine Learning with Gemini ExploreStep 1: Run the following at your Splunk interface;

Step 2: Login to Gemini Explore and Modify the Splunk Model

Step 3: Clean-up and Run the Graph Flow based on the new Splunk search

Step 4: Visualise the result in Gemini Explore

Step 5 (optional): Add the Symptoms data into the mix

Troubleshooting GuideAccessing the Parser Logs

Restarting Gemini Explore

Known Problems/Solutions

Introduction

Gemini Explore is an intuitive visual graph-based data exploration tool that works directly on Splunk or CSV data sources.

Using this dynamic multi-layer visualization tool, the user is able to drill-down and interact with their data. This process is intuitive to use, as it mimics the way our brains ‘think’. When we discover something interesting, we instinctively want to know more detail and how it may relate to other datasets. With , Exploreusers can simply select or double-click an element on the canvas to discover its context and reveal more information.

Gemini Explore Demo

We have made available a that you are free to use in order to familiarise yourself with this new interactive visual technology. This Gemini Explore Demo demo is currently available as a that you can incorporate into your network.VMware OVA template

Please contact any member of the Gemini team to request a . This will be provided together with sample data - an interesting Covid-Gemini Explore Demo19 use case - in order to help you familiarise yourself with the technology.

Network Administrator Installation Guide

The environment should be created on using the template provided by .Gemini Explore Demo VMware Explore OVA Gemini Data

Supported Versions

VMware has various virtualization product lines but only has been officially tested and is therefore supported by the VMware vSphere Gemini Explore Demo. Specifically, and above are supported.vSphere/ESXi version 6.0

We will deliver an built with VMware hardware Version 8. Inquiries regarding compatibility with other VMware product lines or details regarding OVA template Login credentials for the Explore Demo should be directed at [email protected]

System Requirement

The following specifications dictate the minimum recommended system requirements:

CPU 4 cores

RAM 8 GB

DISK 200 GB

NIC 1 x 1GB Ethernet

Network Access

Ensure the following ports are accessible from the local network.

Gemini Explore Explore Cockpit SSH/SCP

80:tcp, 8012:tcp, 8015:tcp 9090:tcp 22:tcp

In environments without DHCP or VM Customization Specifications, initial network configuration can be done through the VM console. The providenmtui tool s a text user interface for configuring and activating your network connections if required. See the following document for help and guidance on using this tool.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_networking/configuring-ip-networking-with-nmtui_configuring-and-managing-networking

Deploy the OVA template

To create a new Virtual Machine, import using the OVA Template supplied. The following guide may be of assistance:

https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.vm_admin.doc/GUID-AFEDC48B-C96F-4088-9C1F-4F0A30E965DE.html

User Information Required

Ensure that you advise your Users of the browser URI values required for access to;

The interface (std port 80)Gemini Explore

The interface (port 9090)Explore Cockpit

User Instructions - Gemini Explore

mailto:[email protected]://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_networking/configuring-ip-networking-with-nmtui_configuring-and-managing-networkinghttps://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_networking/configuring-ip-networking-with-nmtui_configuring-and-managing-networkinghttps://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.vm_admin.doc/GUID-AFEDC48B-C96F-4088-9C1F-4F0A30E965DE.html

Prerequisites before you begin:

In order to start exploring data using , ensure that you have the following from your Network Administrator;Gemini Explore

A that points to the interfaceURI Gemini Explore

A that points to the interfaceURI Explore Cockpit

Components that comprise Gemini Explore

The dashboard of is used to view and analyze your Models, but from here we can also access the Exploration Gemini Explore Data Onboarding & (DOMe) used to add and shape your chosen data sources into suitable Models. Another interface, the , can be used for Modeling Engine Explore Cockpit

network activities and log file access.

Gemini Explore - Exploration Dashboard

This is where you can observe the results of your Gemini DOMe model. Here you can intuitively explore your data source using the visual graph technology built-in to Gemini Explore.

Access to is granted over a standard browser connection using its IP or DNS name. (ie. http://10.10.10.10).Gemini Explore

Gemini Explore - Data Onboarding & Modeling Engine (DOMe)

The interface is available from the drop-down menu of . It facilitates the creation of to use as the basis Explore DOMe Exploration Gemini Explore modelsof your visualization, which in itself involves the creation of a Model, a Graph, and a Flow for each data source.

Explore Cockpit interface

The is available over port 9090, (ie. http://10.10.10.10:9090)Explore Cockpit

It allows you to perform background network tasks such as; joining a local Domain, changing the IP address / hostname. It is also useful for viewing log files, especially to monitor or troubleshoot the modeling process.

Login credentials can be obtained on request from [email protected]

Preparation of your Data Source

Anyone with experience in data modeling knows that the quality of your data source is ultimately responsible for the effectiveness of the outcome. This is also true for Gemini Explore.

CSV Data Sources

Naturally, you will want to review your , but we strongly recommend the use of basic file editors such as , , , etc, CSV data source TextMate VIM notepad++when working with CSV data. Avoid the use of Microsoft Excel at all costs as this will adversely affect the structure of CSV files resulting in errors. If you are in any doubt about the validity of your source data, download a copy of the utility, and run this against your CSV data sources prior to bringing dos2unix allthem into Gemini Explore.

Splunk Data Sources

mailto:[email protected]

The use of Splunk as a source for your data offers you the ability to transform the data prior to its ingestion in , which simply ingests the Gemini Exploreoutput of a Splunk search.

Matching field names to other data sources and creating new fields on-the-fly can be achieved using the Splunk SPL search language. Data can be retrieved from a Splunk Index, or you may wish to create a Data Model specifically for your chosen data source. Either way, Splunk offers an easy-to-use interface with many advantages for formatting and transforming your data source correctly for its use in Gemini Explore.

Data Modeling - Basic Rules of the Road

You will probably have multiple data sources that you wish to explore interactively using . During your review of each data source, familiarise Gemini Exploreyourself with the CSV column headers or Splunk field names, and the data contained within. Knowing your data is fundamental to a successful outcome whatever you do with your data, so ensure that you understand what actually makes up the columns and fields from your datasets.

Consider the following definitions of Modeling when reviewing your data;

Modeling is the process of defining , and .nodes, relationships properties

The end result (a ) is used by against one or more data sources to create ‘ ’ of the graph.model Gemini Explore elements

A will be applied to every line, row, or event within your data source.model

A should have at least but it needs at least ( and ) to include a relationship.model one node two nodes source target

A ‘ ’ will be created if at least one of its properties has a value for a given row.node

A ' ' will be created if both of its and nodes are present, regardless of its properties.relationship source target

Consider the following In order to complete an effective Model using Gemini Explore;

The in your dataset that will become ‘ ’ on your graphheaders/fields nodes

The in your dataset that will be more useful as ‘ ’ of a node.headers/fields properties

Tags perform multiple tasks; they define how you visually class, categorize the dataset and how to merge with other datasets

The will assign the ' ' for your datasets, and works in conjunction with the tags to merge with other datasets.caption key field

Consider carefully the that occur between . These will be mapped as ‘ ’ and can transform how your data is represented relationships nodes edgeson the canvas.

This is not an exact science and it may take a couple of attempts for you to establish , and . However, this is not generally an issue nodes properties edgesas each can be built and destroyed many times before a final outcome is established. Here are some more general guidance rules to help model useful create the model;

Do not be tempted to create too many . Each additional node increases the complexity and can obscure the overall effectiveness of the Nodesmodel.

Consider adding a header/field as a rather than as a , if it will be more useful as an additional metric or detail that enhances the property node nodeitself.

Use to categorize data on your Graph. These will create the ‘ of elements available. tags class'

Ensure that an (relationship) is described accurately, orientated correctly, and convention dictates it is usually in uppercase using underscores edgeto delimit words (ie. BELONGS_TO)

Consider ingesting no more than a few hundred records from each dataset whilst creating Models to keep the process efficient. Once the models are complete, all the data can then be ingested.

For more insights into constructing models, please refer to the addendum titled, ‘ ’Gemini Explore - Modeling FAQ

Stage 1: Connecting to Gemini Explore

Access using the from your Network Administrator. (ie. Gemini Explore URI http://)

Default login credentials are as follows:

Username: [email protected]

Password: changeme

Login using the credentials you have been assigned to reveal the following interface.Gemini Explore

From the menu at the top of the page, select the ‘ ’ option to add a new data source and create a Model.Exploration Data Modeling

Stage 2: Selecting the Data Source

Select the ‘ ’ menu from the top of the screen to reveal existing sources available for your modeling. To add a new data source, select the ' ' Sources Add Newbutton to reveal a choice between a upload or a source.CSV Splunk

Example of a CSV Data Source

From the ‘ ' selector, choose the option to reveal the following;Type CSVUse a logical ‘ ’ for your new Data SourceName

We provide a CSV upload facility in the form of the ‘ ’ button.Choose file

The entry is optional, but can be useful for CSV’s that have non-human generated or foreign language ColumnsHeaders that you wish to modify at source. Note that you will either need to modify every column Header in the data source, or none at all.

Select the used within the CSV source, to delimit both field boundaries and string selections.delimiters

Select the ‘ ’ button when complete.Submit

Example of a Splunk Data Source

From the ‘ ' selector, choose the option to reveal the following;Type Splunk Use a logical name for your Splunk data source and change the ‘ ’ to ‘ ’.Type Splunk

For ‘ ’ ensure that the (default) option is selected.Scheme https

The ‘ ’ and ‘ ’ reflect the IP address or DNS name of your local Splunk instance. Communication is Host Portcompleted using the management port which defaults to 8089.

The Username and Password credentials are required for access to Splunk. These will need to have admin rights.

Select the button to save.Submit

Note that these Splunk settings - including the login credentials - are not verified at this point. Therefore do not assume that a successful ‘ ’ is confirming these parameters.Save

Stage 3: Creating the Model

Select the ‘ ’ menu from the top of the screen to reveal a list of current models. To create a new Model, select the ‘ ' button.Models Add New

The example below has been given a ‘ ’ and ' ’ value consistent with the Data Source, in this case, a CSV file.Name Type

Remember that Modeling is the process of defining , and . We have created an enhanced version of the GML language nodes, relationships propertiescalled in order to produce the Model Mapping. This will become the heart of the and should be created in the following format with help from the GML+ Modelguide notes;

Creating NodesAdd your name or company as the and a relevant number. creator versionThese are values - do not omit.mandatory

Within the definition, define each required ‘ ’, ensuring it has a graph nodeunique ' value.id'

Tags perform several important tasks within the Model, add a comma-delimited list of required to create different or to ‘tags’ classes categoriesgraphically represent your data, and to merge it with other datasets.

creator ""version "1.0"graph [ node [ id 1 tags "*class1, class2, icon" *caption "node_header_value" *keyfield_name "header_value" property1 "header_value" property2 "header_value" property3 "header_value" ] node [ id 2 tags "*class3, icon" *caption "node_header_value" property1 "header_value" ]...

The refers to the actual column header used to represent the caption nodeon the canvas. A tag can be used to rename this on the canvas if required.

Use any random to identify a header value that needs to link keyfield_namewith another dataset, but does not necessarily need to be seen on the canvas.

Use of the asterix( ) here is optional and has a different effect than when *used with a tag. Here it will identify it as a ‘ ’(unique) field. This will reduce keycomplexity at the Graph, deduplicating a many-to-many result, where a more simple one-to-many view from this ‘ ’ field is more appropriate.key

So in order to link with another dataset; use a similarly named * (with an * tag ), or * (with an * ) to merge datasets with each other.caption keyfield_name

Further entries defining column headers will be used to create ' '. propertiesWhen a node is selected on the canvas, these can add useful metrics and information visible in the .Inspector Panel

Creating Edges

... edge [ source 1 target 2 label "IS_WITHIN" *as description "description" h.fieldinData property1 "header_value" ] ]

... edge [ source 1 target 2 label "USEFUL_RELATIONSHIP" $single 1 ]]

Add ‘ ’ that will form visual ‘ ’ on the Graph canvas.edges relationships

Each ‘ ’ defines the (node id) and the (node id) and states edge source target how the relates to the using a ‘ ’. Use care to ensure the source target labeldirection of each relationship is correct. An uppercase format is usually adopted for this.

The asterisk( ) or ‘ ’ (unique) attribute used above for nodes can also be * keyused for edges, again used to simplify the output.

Use ‘ ’ to define a of the relationship which will be visible when as propertythe is selected on the Graph canvas. Or simply define the properties as edgewith the nodes, by listing a property label and defining its actual header_value.

And finally, to enforce just a single relationship from one node to another, use this ‘ ' param shown in the example (where '1’ = true).$single 1

Adding different Icons to represent Elements on Canvas

We have a limited group of icons available for use that help greatly to enrich your Explore canvas. Simply choose an appropriate Icon name from those listed below and add the icon name as an additional tag.

For instance, if you would like to use the ‘ ’ to represent a field called ' then simply add this as an extra tag;Globe icon country_name',

1.

2.

3.

4.

5.

Important rules for tags:

There should be at least one tag defined

Tags should not contain a 'space'

Each tag will become a of data available on the Classcanvas

If you want one of the tags to be used to this mergedataset with others, precede it with an ' 'asterisk

Use an extra tag to define an to represent the class.icon

node [ id 1 tags "Country, Globe" caption "country_name" ... ]

This is the currently available Icon set:

Stage 4: Creating the Graph

Select the ‘ ’ menu from the top of the screen to reveal a list of current graphs. To create a new Graph, select the ‘ ' button.Graphs Add New

Use a Graph ‘ ’ that reflects the and created.Name Data Source Model

Ensure that the correct ‘ ’ is present.Source

Select the ‘ ' of Graph visualization required. We have selected ‘neo4j’ here.Type

The ‘ ' credentials refer to the graphical interface chosen for Explore. In our example case Configurationopposite, we have chosen so the URI will refer to Port 7687 of the local Gemini DOMe. Use the same neo4j login credentials here as used to enter the Gemini DOMe interface.

Select the ‘ ’ button to save the Graph.Submit

Stage 5: Creating the Flow

The final step prior to viewing our data on the Gemini Explore canvas is to create a ‘ ’.Flow

Select the appropriate from the list in our Graphs dashboard.Graph

This will reveal a panel, from which the ‘ ’ button can be selected to create a new ' '. Flows Add New FlowEnsure that the appropriate has been selected.Model

If desired a CRON schedule can be added here to renew the model on a regular basis. This feature is not relevant to a CSV input, but could be useful for a Splunk data source.

If no schedule is required, simply accept the default, ' ' and select the ‘ ’ button to complete the * * * * * Submitprocess.

A confirmation screen (see below), will enable editing of the or , or a clean-up of the model should you wish to make further changes or tweaks to Graph Flowthe model and overwrite the current environment.

This screen also enables the model to be run against your chosen data source. This is achieved by selecting the ‘ ’ link within the panel. When Run Flowsactivated, this will begin the creation process which can be tracked by both the ‘ ’ value and the ' ' count or timestamp.Model Status Last Ingested:

A final summary confirmation screen will need to be activated before the Model runs. Select the ‘ ’ button here.Run

The selected ‘ ’ will change its status to ‘ ' during the build of the model. Use the ' ’ button to abandon the build if required.Flow Active Stop

Stage 6: Viewing the Model using Gemini Explore

To visualize the created using , return to the Dashboard (below). Login if required. Note that it can take several minutes Model Gemini Explore Exploration for a model to build. This can be confirmed when the of the Flow returns to ‘ ’. status deactivated

Editing Sources, Models and Graphs

Each of the Model components has an ‘Edit’ button. It is very common for instance, for a Model Mappings to require a ‘tweak’ in order to modify the Graph output or to create an additional Property.

When Models or Sources are edited, a operation (see the section below), will need to be completed before data can be re-ingested against the new Clean-up Model.

If a new Source or Model name is used, then a new Graph and Flow will be required.

If the Component naming is kept consistent, simply re-run the Graph Flows following a Clean-up and Edit operation.all

Clean-up and Removal of Model Data and Components

Because of the inter-related functionality between datasets ingested into Gemini Explore, it is important to remove data in a specific way. Components you need to edit or remove are dependent on what it is you need to achieve.

Choose from the following procedures below, relevant to your requirements.

I have edited my Model or Splunk query and I want to re-ingest for use in Explore

This implies that the data source is still valid, but that the model or Splunk query has been edited since the last time the Graph Flow was run.

Step 1: Perform a operationClean-up

From the interface, select any from the Graphs dashboard.Data Modeling Graph

Select the button, and verify at the warning message with the ' ' button, to confirm the removal of Graph data from Explore.Clean-up OK all

It is important to understand that a operation will remove ‘ ’ of the ingested data from Explore, not just the data for the Graph that has clean-up ALLbeen selected. This is down to the complex merging of datasets behind the scenes, which are impossible to separate.

Step 2: Confirm at the dashboard that all classes of data have been removed.Exploration

Step 3: Run all to re-ingest Data. Verify that all classes are now available at the Exploration dashboard.Flows

I have renamed my Model and I want to re-ingest data for use in Explore

If the name has changed, this will affect its corresponding . In this case, you will need to ‘Delete’ and recreate the also.Model Flow Flow

Step 1: Perform a operationClean-up

From the interface, select any from the Graphs dashboard.Data Modeling Graph


Step 2: Delete the from the dataset that has a new Model nameFlow

Select the relevant Graph and delete its using the button in the panel. Confirm the removal of the Flow using the 'OK' button at the Flow ‘Delete’ Flowsfollowing prompt.

Step 3: Confirm at the dashboard that all classes of data have been removed.Exploration

Step 4: Add a new and ' ', to ingest the data. Verify that the classes are now available at the Exploration dashboard.Flow Run

I want to delete everything and start from scratch

If you want to remove all component traces, this must be done in reverse order to how they were created. In other words, begin deleting the Flow for each Model and work back towards the Source.

Complete a Clean-up operation

Delete the Flow(s) from their Graphs

Delete the Graph(s)

Delete the Model(s)

Delete the Source(s)

Tutorial 1: Using Gemini Explore with a CSV data source

In order to ensure you are familiar with the intuitive and interactive nature of our , we have prepared sample data - two CSV data sources Gemini Explorecontaining Covid-19 data from a single Country, linked by the ‘ ’ header. This exercise will show how separate data sources can be linked by a similarly casenamed header to aid visual analysis at the graph canvas.

The process of creating a working graph model is broken down into several stages highlighted in the graphic below. The , , and Source Model Graph Flowstages will be created in the following steps.

Download the two sample CSV files below;CSV1 CSV2

If you want to view these files with Excel or Numbers, please make a copy to protect its original form, and view the file copy.

Step 1: Login and add the data sources

Login to given to you by your Network Administrator (ie. http://) using the following credentials:Gemini Explore URI


Password: changeme

Select the option from the menu. Data Modeling Exploration

From the dashboard, select the ‘ ’ button.Sources Add New

Add a of your choice, for example, ‘covid-country67_cases’, and change the ‘ ’ to a ‘ ’ source.Name Type CSV

Use the ‘ ’ button to locate (covid_country67_cases.csv) file, and select the button to save.Choose file CSV1 Submit

Note that if you open these data sources with a graphical spreadsheet tool such as Microsoft Excel, this can change the underlying CSV structure. It is highly recommended that you download and use the facility on each CSV source, before using it in .dos2unix Gemini Explore

Repeat this exercise with (remarks_country67.csv), to produce two Data Sources listed at the Sources dashboard (see below).CSV2

Step 2: Create the Models

From the dashboard, select the ‘ ’ button.Models Add New

Add a of your choice, ie. ‘covid-country67_model’, and change the ‘ ’ to ‘CSV’Name Type

In the ‘ ' panel, copy the following code to create a mapping for our file.Mapping covid_country67_cases.csv

creator "country67-cases"version "1.0"graph [ node [ id 1 tags "*case,person" *caption "case" recovered "recovered" gender "gender" source "source" ] node [ id 2 tags "*country,Globe" *caption "citizenship" ] edge [ source 1 target 2 label "CITIZEN_OF" ] node [ id 3 tags "infection_region" *caption "region" ] edge [ source 3 target 1 label "INFECTED_IN" ]]

Select the button to save the Model.Submit

Repeat the above exercise with the data source, to create a second (ie, remarks_country67_model) whose ‘ ’ should include the CSV2 Model Mappingfollowing code.

creator "country67-remarks"version "1.0"graph [ node [ id 1 tags "*case,person" *caption "case" ] node [ id 2 tags "remarks,File" caption "remarks" ] edge [ source 2 target 1 label "IS_FROM" ]]

You should now have two Models listed at the dashboard (see below);Models

Step 3: Create the GraphsFrom the dashboard, select the ‘ ’ button.Graphs Add New

We will require two graphs, one for and another for .neo4j CSV1 CSV2

Use the entries opposite as a guide for the creation of the two Graphs.

Ensure that you select the correct ' ' from the drop-down list as this will not automatically Sourcepopulate

Select the option from the graph ‘ ’ drop-down.neo4j Type

For the settings, the ‘ entry should refer to the neo4j port, therefore, please use the Configuration URI’following settings:

URI: bolt://neo4j:7687


Password: changeme

You should now have two listed at the Graphs dashboard (see below);Graphs


The final step is to create a ‘ ’ for each data source. This will be followed by ' ' each to create working Models viewable on the Flow running Flow Gemini canvas.Explore

Select the first Graph from the list presented in our and select the ‘ ’ button from its lower ‘ ’ panel to create a Graphs dashboard Add New FlowsFlow.

As this is a simple CSV source that does not require scheduling, simply verify that the correct has been identified in the entry box, and Model Modelselect the ‘ ’ button to create a simple ‘ ’.Submit Flow

Observe the new entry that has appeared in the panel and the appearance of three new links; , and , enabling you to modify, Flows edit run deletedelete, or create the model flow.

Select the ‘ ’ link to initiate the build of the , confirming with the second ‘ ' button when prompted.Run Model Run

The Model could take a while to build, especially if a huge amount of data is involved. Monitoring this process is achieved by observing the ‘ ’ value ingestedwithin the Flows panel (see below). The model is complete when the ‘ ’ value is at, ‘ . The ingested number should be '373' on completion.Status completed’

If the status is at ' ’, then refer to the section for details on accessing the Parser Log file.failed Troubleshooting

Repeat the above process for the second for the (remarks_country67) data source. The ingested number should be '373' on completion.Graph Flow CSV2


From the Exploration menu, select to view the model's produced in the ‘ ’ panel. The result should be similar to the following;Exploration Elements Data

Note: The remarks value may vary from the above, but if anything else is missing from the above, please refer to the section for assistance.Troubleshooting

Step 5.1: Working with the Graph Canvas

To begin, let us add the to the . Use the ‘ ' button alongside the ' ’ element to add the first 300 elements from your data source to cases Graph Canvas + casethe canvas.

Under some circumstances, it may be better to bring back selective records.To search for one individual case, use the ' ' entry box within the . For example, let us search show Search Panelfor .case 19

Clear the existing search, and type the name of the element required, in this example ‘ ’. As you type you casewill be prompted to select an uppercase version of the element concerned.

Select the appropriate uppercase element to reveal a cursor entry. This is where you can add a specific value from your data source, ie. ‘ ’19

Select the ‘ ’ button to confirm your entry. This will bring back case ‘ ’ onto your Graph Canvas for Apply 19analysis.

If you wish to add more values from either this or other elements, repeat the process.

For more details in the functionality of working with the Graph Canvas, please refer to the document.Gemini Explore - Modeling FAQ

Step 5.2: Exploring data using the Canvas

Now that we have one record on the Graph Canvas ( ), to view its immediate connections. The canvas should resemble the following;19 double-click

Note that the canvas will only display the first 300 elements on the canvas at any one time.

Press ' ' at the canvas at any time to clear.e

For more help on working at the graph canvas, refer to the online User Guide, https://support.geminidata.com/docs/gem-explorer-user-guide/

https://support.geminidata.com/docs/gem-explorer-user-guide/

What we can deduce from the Graph in its current form;

Select element ‘ ’ with your mouse and observe the show that this is a male subject19 Inspector Panel

Select the interconnecting line - - between the and the , to reveal that this was a citizen of Taiwan.relationship 19 entity Taiwan entity

Select the relationship between the and to reveal that this was where the subject was infected.19 entity Central-Changhua entity

Select the relationship between the and the remark beginning, ‘White Taxi..’ to reveal detail including that this subject was a taxi driver.19 entity

Tutorial 2: Using Gemini Explore with a Splunk data source

Sometimes it may be preferable to use Splunk to ingest your data, as this will enable the additional flexibility of transforming the data prior to its ingestion within . This might include the need to; rename existing fields or add additional fields on-the-fly using the Splunk SPL language.Gemini Explore

For this exercise, we will continue with the theme of Covid-19 data, but you will need your own Splunk environment available, either on your current network or your own local workstation.

Step 1: Preparing your Splunk environment

Add the (remarks_country.csv) file downloaded in , to your instance using the menu option. CSV2 Tutorial 1 Splunk Settings / Lookups / Lookup TableEnsure that you change the permissions to , ie. All apps, and give rights to everyone, to make the data source easier to access from any app.Global read

We would also recommend that you download and install the app and its associated app Machine Learning Toolkit Python for Scientific Computing which can greatly assist you in leveraging your data to maximize the benefit of . This installation will be required if you want to complete Gemini ExploreTutorial 3.

Verify that the CSV file has been installed correctly by running the following search at your Splunk interface:

| inputlookup remarks_country67.csv

A total of records should be available resembling the following screen output;373

You could also use the wizard in Splunk to upload your CSV, but please ensure this goes into a temporary index. If you use this option, Add Data remember to use ' ' in searches, instead of the command suggested in this tutorial.sourcetype=csv inputlookup

Step 2: Login to Gemini Explore and Perform a Clean-up Operation

We should remove the previous ' ' file entry, so that we can be sure the new Splunk interface is working correctly.CSV Type

Login to , and select the option from the menu.Gemini Explore Data Modeling Exploration

Select the ' ' from the dashboard.remarks_country67_graph Graphs


Step 3: Add a new Splunk Source

We could add a completely new source, but in this case we are dealing with the same CSV file, just from a different source, so we will edit the original Source.From the dashboard, locate the ' ' source and select the ‘ ’ button.Sources remarks_country67 Edit

We will retain the name, but change the ‘ ’ to ‘ ’. Type Splunk

For ‘ ’ ensure that (default) is selected.Scheme https

The ‘ ’ and ‘ ’ reflect the IP address or DNS name of your local Splunk instance. Host PortCommunication is completed using the management port which defaults to 8089.

The Username and Password credentials are required for access to Splunk. These will need to have admin rights.

Select the button to save.Submit

Note that these Splunk settings - including the login credentials - are not verified at this point. Therefore do not assume that a successful ‘ ’ is confirming these parameters.Save

It is important to understand that this operation will remove ‘ ’ of the ingested data from Explore, not just the data for the Graph that clean-up ALLhas been selected. This is down to the complex merging of datasets behind the scenes, which are impossible to separate.

Step 4: Edit the Model

Again, the will largely remain the same as we are dealing with the same source file, so we will edit the original entry.Model From the dashboard, select the ‘remarks_country67_model’ and choose the ‘ ’ button.Models Edit

We will retain the name, but change the ‘ ’ to ‘ ’Type Splunk

The ‘ ’ entry box should contain the following Splunk search.Query

| inputlookup remarks_country67.csv

Retain the ‘ ' entry as this will not need to change as it is the same mapping we used for the Mapping CS, used in Tutorial 1.V Type

Note that the reflects the exact same from the data source. This will enable the two caption header value CSV1 datasets to merge on the canvas.

Select the button to save the Model.Submit

Step 5: Edit the Graph

Despite keeping the same name, the Source has changed from CSV to Splunk, so we need to enter Edit mode to acknowledge the new Type.

From the dashboard, select ‘ ’Graphs remarks_country67_graph

Select the correct ' ' from the drop-down list.Source

The settings will remain the same, so simply select the ‘ ’ button to save the Configuration SubmitGraph.


As a Clean-up operation had been completed at the start, we will need to re-run both ‘ ’.Flows

From the menu, select the to reveal its panel at the bottom of the screen. Select the ‘ ’ option, and accept the Graphs covid_country67_graph Flow Runconfirmation ' ' button when prompted.Run

From the menu, select the to reveal its panel at the bottom of the screen.Graphs remarks_country67_graph Flow

Because we have kept the Model name the same, we can simply select the ' ' option here also, the difference this time however, will be the and Run earliest la Splunk time parameter entries.test

With Splunk queries, it is important to use the correct ‘Time Picker’ parameters.

The ‘ ’ and ‘ ’ entry boxes have been created to replicate the Splunk Time Picker.earliest_time latest_time

Enter the following values here using the Splunk time protocol

earliest_time: -48h@h

latest_time: now

Select the ‘ ’ button to initiate the Splunk search and retrieve the data.Run

Progress of the Splunk search query can then be monitored from the panel.Flows

The Status indicates current progress of data ingestion (running), a failure in connection to the Splunk source or in retrieving the search dataset (failed), or confirmation that the search has run successfully (completed).

Progress can also be seen by the counter following the number of rows ingested.

If the Status result is at 'Failed', please refer to the section on how to view the Parser Logs.Troubleshooting


Switch to the to view the model's produced in the ‘ ’ panel. The result should resemble the following. No change from Exploration dashboard Elements Datathe result of Tutorial 2, which is not unexpected of course, but in order to make better use of the power of Splunk, feel free to complete .Tutorial 3

Tutorial 3: Machine Learning with Gemini Explore

There may come a time when you need to explore your data in more detail. Splunk’s app, gives you the ability to apply machine Machine Learning Toolkitlearning preprocessing or feature extraction algorithms to your data prior to its ingestion into where it can then interact dynamically with Gemini Exploreother datasets. The following tutorial gives you just a flavor of what can be achieved using this app.

If you have completed Tutorial 1 & 2, will currently have access to;Gemini Explore

The data source (covid_country67_cases.csv)CSV1

The data source ( | inputlookup remarks_country67.csv)Splunk

The first 3 rows of data, and the information is shown below for closer inspection;header

covid_country67_cases.csv

case,dateAnnouced,gender,age,citizenship,region,source,dateOfEntry,onsetDate,discoveryPipeline,recovered,dateOfRecovery,dischargeDate1,"Tuesday, January 21",Female,5X,Taiwan,South-Kaohsiung,Overseas,20-Jan,11-Jan,,yes,2/6 San Cai Yin,6-Feb2,"Friday, January 24",Female,5X,China,North-Taipei,Overseas,21-Jan,23-Jan,,,,3,"Friday, January 24",male,5X,Taiwan,South-Kaohsiung,Overseas,21-Jan,20-Jan,,,,

Splunk search: | inputlookup remarks_country67.csv

case,remarks1,"The first, case of the first case of imported severe disease (unintubated) in China"2,"Xiaogang Airport Entry"3,"1/22 Entering the Kaohsiung Ballroom and, staying in the negative pressure ward for 2 months"

Questions regarding how two data sources are related and what can be learned from combining them become apparent. In this example, the data already lends itself to further scrutiny as the two sources have been linked by the ‘ ’ value. If we ask the right questions of our data, we can utilize case Gemini Exploreto visually interact with data, drilling down to find patterns or relationships that may otherwise be difficult to spot.

Notice that the remarks data source references a lot of other case numbers within the text. Text data is difficult to work with, but there is a feature TFIDFextraction algorithm in the Splunk MLTK app, which may be able to help us find correlations or clusters among the remarks. Or maybe the Splunk cluster command could help, as this also works well with text.

In the following example, we have used the algorithm. We decided to first create a combined field with each case number added to its remark to ‘TFIDF’improve correlation. The ‘ ’ algorithm will find the most relevant words in the text (excluding the most common English words). We then apply the ‘TFIDF KMea

’ algorithm which will look for clusters amongst the text. We will explore this briefly in Splunk before using Gemini Explore to probe a little deeper.ns

Step 1: Run the following at your Splunk interface;

| inputlookup remarks_country67.csv | eval remarks = 'remarks'." ".'case' | fit TFIDF remarks stop_words=english | fit KMeans k=30 remarks_tfidf_* | dedup 15 cluster | table cluster case remarks | sort cluster

Below is an example of the desired output, but bear in mind that the number will change every time you run the search.cluster

Search through the resultant clusters. As in our example, you should find a that contains 'case 19' (in this example, ‘cluster 22'). This search has clusterhighlighted the fact that this male subject 'case19’ has infected members of his family, namely case 20, case 21, case 22 and case 23.

We should now investigate this data using Gemini Explore.

Step 2: Login to Gemini Explore and Modify the Splunk Model

From the dashboard of the Data Modelling interface, select the and choose the button.Models remarks_country_67_model ‘Edit’

Add the exact Splunk search as used in Step 1, and amend the of the previous model with the addition of the . Copy and paste Query Mapping cluster nodefrom the code box below.

Select the ‘ ’ button to save the Model on completion.Submit

creator "country67-splunk-remarks"version "1.0"graph [ node [ id 1 tags "case,person" *caption "case" ] node [ id 2 tags "cluster" caption "cluster" ] node [ id 3 tags "remarks" caption "remarks" ] edge [ source 3 target 1 label "IS_FROM" ] edge [ source 2 target 1 label "CLUSTER_CONTAINS" ]]

Step 3: Clean-up and Run the Graph Flow based on the new Splunk search

Because we already have run a Graph Flow against this model, we should perform a operation to remove the current data, and then ‘ ’ the Clean-up RunGraph Flows for all data sources to repopulate the data.

From the menu, select either of the Graph sources (both will be removed anyway), and select the ‘ ’ button to remove historic graph data.Graphs Clean-up

Now delete the Graph from the entry and re-create a Flow using appropriate time parameters (ie. -48h@h and now)Flow remarks_country67_graph

Return to the menu, and ‘ ’ Flows for all Graphs listed.Graphs Run

Step 4: Visualise the result in Gemini Explore

Return to the Gemini dashboard. This should reveal an additional element called ‘ ’ following the new Splunk search. (Note, that the Exploration cluster‘remarks’ value may differ from that below)

So let us now explore the data using the Graph canvas. We know from our earlier Splunk search that a cluster will be formed from the relatives of . case 19(remember the cluster numbers will change each time you run the Splunk search!).

Let’s add to the canvas and double-click to find its neighbors. When expanded on the canvas, a result similar to that below should be expected case 19 where, in my case, is seen to be related.cluster 16

If we double-click , we find the family group of cluster 16 case 19.

Another interesting use-case involves and and can be seen below. These two individuals were part of a group of Turkish visitors touring the case 56 case 57North of the country, and has the other delegates of the tour.cluster 18

Step 5 (optional): Add the Symptoms data into the mix

We have prepared another data source for you that is directly related to the existing datasets. This one brings in Symptoms related to each of the cases.

Here is an example of the Header and first few rows of data from the symptoms_country67.csv data source.

case,symptom,medical history,Source of infection1,"Fever, cough, shortness of breath, pneumonia",,O (Wuhan)2,fever,,O (Wuhan)3,cold,,O (Wuhan)4,cough,,O (Wuhan)5,"Fever, muscle soreness",,O (Wuhan)

By using a Splunk search we can break down the comma-delimited symptoms into individual rows, which can also give us some useful statistics on the mode prevalent symptoms.

If you wish to add this new data source, use the following download link, and add it to your local Splunk environment.

If you have added this CSV using the method then setup a new Source, and Model in Gemini Explore to add into your Settings / Lookups / Lookup Table existing graph data.

If you prefer, use the Data Input wizard to add this CSV data. If this method is chosen, begin the Splunk search with instead of the sourcetype=csvinputlookup command used in our example.

Add a new for our symptoms_country67, based on our local Splunk environment. Source

Add a new using the search and shown below.Model Query Mapping

| inputlookup symptoms_country67.csv| makemv delim="," symptom| mvexpand symptom| eval symptom=trim(lower(symptom))| fields case, symptom

creator "country67-splunk-symptoms"version "1.0"graph [ node [ id 1 tags "case,person" *caption "case" ] node [ id 2 tags "symptom" caption "symptom" ] edge [ source 2 target 1 label "IS_FROM" ]]

Create a that refers to the above Model.Graph

Create a for the above Graph.Flow

Run the using suitable time parameters, (ie. and )Flow -48h@h now

Observe the results at the dashboard. Symptoms will be connected to each or .Exploration case person

Troubleshooting Guide

It may be necessary to access the logs in order to troubleshoot the modeling process, if your results are unexpected or if the graph refuses to run. Use the Exinterface on port 9090 for analysis.plore Cockpit parser log file

Accessing the Parser Logs

Login to the over port 9090 (ie. http://:9090)Explore Cockpit

From the , select from the options menu.Host Tab Podman Containers

In the main panel under the heading of Containers, select the , and select the link in order to view the current log file. See below for explore_parser_1 ‘Logs'an example.

Note: The preferred web browser is Mozilla Firefox.

Restarting Gemini Explore

If the environment becomes unstable or unresponsive for any reason, it is possible to perform a ‘ ’ using the Explore Cockpit interface.restart

Login to the over port 9090 (ie. http://:9090)Explore Cockpit

From the , select from the options menu.Host Tab Overview

In the top right corner of the main panel, you will find a ‘ ’ option. Select this, and note that there is a delay of 1 minute built-in before the restart will Restartoccur. After a few minutes, refresh the browser to continue.

Note: Although data modeling components are retained during this process, the model flows will need to be re-run to repopulate the Exploration canvas.

Known Problems/Solutions

Problem 1: Error trying to access the Data Modeling section

Solution: If this is the error you receive:

Login to the Explore Cockpit and from the tab, select Host Podman Containers.

Change the Filter at the top of the main panel to ‘ ’Everything

Locate the container, and use the ‘ ’ button to restart the container. explore_parser_1 Start

Problem 2: Gemini Explore has become unresponsive.

Solution: You will need to restart Gemini Explore

Follow the ‘ ’ procedure above.Restarting Gemini Explore

Problem 3: Error in the Parser Log when trying to load data

1.

2.

3.

4.

Solution: Problems with parsing can be attributed to many issues and a log entry similar to that below is noted in the parser log.

logging: {"caption":"288","id":"c5b40804-4fa2-4a81-9f6c-59dd1be0c718"}logging: {"caption":"Group tour with # 250","id":"7700f6e3-207f-4657-b531-78240504d830"}logging: {"relation_type":"is_related_to","source_id":"7700f6e3-207f-4657-b531-78240504d830","destination_id":"c5b40804-4fa2-4a81-9f6c-59dd1be0c718","direction":1,"id":"5443c6d1-a237-4f31-bae7-9f4e279eca6b"}logging: {"caption":"289","id":"776b7c85-3cc4-453f-9caf-41832ee06e4f"}Neo4jError: Invalid input 't': expected whitespace, '.', node labels, '[', "=~", IN, STARTS, ENDS, CONTAINS, IS, '^', '*', '/', '%', '+', '-', '=', '~', "", "!=", '', "=", AND, XOR, OR, ',' or '}' (line 3, column 160 (offset: 178))" MERGE (node :remarks { `caption`:'Mr. (# 293) 3/16, Suspected of being infected by a quarantine husband because he didn't wear a good wife in a mask and gloves' })"

Here are some suggestions to assist, in the order they should be attempted.

Check the for an obvious error message (ie. invalid null input), and correct the dataset accordingly.parser log

If it is a CSV data source, use the dos2unix utility on the CSV and re-ingest to Gemini Explore.

If it is a Splunk query input, check the result in Splunk, to ensure that all rows are consistent.

The parser is sensitive to certain characters, for instance the apostrophe within a text string can cause issues, even if the text is within double-quotes. Remove apostrophes from text strings.

Problem 4: I can’t see the Parser Log file, it’s stuck at ‘Loading logs…’

Solutions: Refresh your Browser and try again

Using the Firefox Browser seems to be the best solution.


gemini explore (beta) - user guide · 2020. 6. 1. · restarting gemini explore known...

Documents