Back to Blogs

VQL Survival Guide

Author: Husam Gameel (Oppenheim3r)


1. VQL Design Philosophy & Core Concepts

Before writing queries, you must understand three crucial architectural concepts in VQL:

A. Dynamic Data Sources (Plugins vs. Tables)

In SQL, you query static tables (FROM users). In VQL, the data sources are plugins—which are actual Go functions running inside the Velociraptor agent (FROM pslist(), FROM glob()).

  • Plugins return rows dynamically.
  • Plugins accept keyword arguments (e.g., pslist(pid=1234)).

B. Lazy Evaluation (Doing Less Work)

VQL goes to extreme lengths to postpone evaluation. When you specify a column expression, VQL wraps it in a Lazy Evaluator.

  • If a row is filtered out by a WHERE clause before a column needs to be printed, that column's function is never evaluated.
  • This is critical when using expensive functions like hash() or upload().
💡 Performance Tip: Order your WHERE conditions from cheapest to most expensive (left-to-right). If a cheap string match fails, VQL aborts early and won't run an expensive file hash function on that row!

Bad (Expensive): WHERE hash(path=FullPath).SHA256 = "..." AND FullPath =~ "system32"
Good (Cheap First): WHERE FullPath =~ "system32" AND hash(path=FullPath).SHA256 = "..."

C. The Scope

The scope is a stack-like data structure containing active variables, plugins, and helper functions.

  • When a query executes, nested child scopes are created (e.g., inside foreach() loops).
  • If a variable or column name inside a subquery matches a variable in the parent scope, the inner one masks the outer one. You can use AS to rename variables and avoid collisions.

2. Level 1: Basic Queries & Syntax

Let's master the basic SQL-like skeleton of a VQL query:

SELECT Column1, Column2, Column3 AS CustomName
FROM plugin(arg1="value", arg2=123)
WHERE Column1 =~ "regex_pattern"
ℹ️ Note: Unlike SQL, VQL does not require or allow a semicolon (;) at the end of statements!

2.1 Basic System Information Query

Let's get basic system facts using the info() plugin:

SELECT Hostname, OS, Platform, KernelVersion, Architecture, IsAdmin
FROM info()

Output (JSON format):

[
 {
  "Hostname": "Rick",
  "OS": "windows",
  "Platform": "Microsoft Windows 11 Pro",
  "KernelVersion": "10.0.22631",
  "Architecture": "amd64",
  "IsAdmin": true
 }
]

2.2 Filtering and the Regex Operator (=~)

The regex match operator (=~) is one of the most frequently used operators in DFIR queries. Let's list running processes using pslist(), filtering for Microsoft Edge or Chrome:

SELECT Pid, Name, CommandLine, Exe 
FROM pslist()
WHERE Name =~ "chrome|msedge"

2.3 Defining Stored Queries (LET Variable Binding)

You can declare queries as variables using the LET keyword. Stored queries are lazy and will only be executed when you reference them!

-- Declare the stored query (not evaluated yet!)
LET FindBrowsers = SELECT Pid, Name, Exe 
                   FROM pslist() 
                   WHERE Name =~ "chrome|msedge|firefox"

-- Execute the query and limit the results to 3
SELECT * FROM FindBrowsers 
LIMIT 3

2.4 Multi-line Raw Strings (''')

Windows file paths use backslashes (\). Standard strings require escaping ("C:\\Windows\\System32"). To avoid escaping nightmares, VQL provides raw strings using triple single-quotes ('''). Inside a raw string, everything is literal!

-- Escaping required:
SELECT * FROM glob(globs="C:\\Windows\\System32\\*.exe")

-- No escaping required (Raw string):
SELECT * FROM glob(globs='''C:\Windows\System32\*.exe''')

3. Level 2: Foreach, Memoize & Subqueries

In standard SQL, you combine two tables using JOIN. Because VQL has no static tables or static indexes, it does not have a JOIN operator. Instead, VQL uses the foreach() plugin and memoize() function to combine dynamic data.

3.1 The Mighty foreach() Loop

The foreach() plugin runs an inner query for every row returned by an outer query.

Syntax:

SELECT * FROM foreach(
  row={ SELECT Exe, Pid FROM pslist() },
  query={ SELECT Pid, ModTime, Size, FullPath FROM stat(filename=Exe) }
)

In this query:

  1. row query runs first, yielding process executables (Exe).
  2. For each row, VQL creates a nested scope containing Exe and runs the query subquery.
  3. The stat() plugin takes the Exe variable from the outer row and returns file details.

3.2 Dynamic Join: Processes and Network Connections

Let's write a query to map active network connections (netstat()) to their respective process names (pslist()).

SELECT * FROM foreach(
  row={
    SELECT Family, Type, Laddr.IP AS LocalIP, Laddr.Port AS LocalPort, 
           Raddr.IP AS RemoteIP, Raddr.Port AS RemotePort, Pid
    FROM netstat()
    WHERE RemoteIP != "0.0.0.0" AND RemoteIP != "127.0.0.1"
  },
  query={
    SELECT Name AS ProcessName, Exe AS ProcessPath, LocalIP, LocalPort, RemoteIP, RemotePort, Pid
    FROM pslist(pid=Pid)
  }
)
⚠️ Performance Alert: This query is extremely slow on busy systems! Why? Because for every single socket returned by netstat(), the pslist() plugin runs again to find the process name. If there are 300 sockets, pslist() runs 300 times!

3.3 Solving Performance with Materialization (<=)

To solve the slow nested-query problem, we can materialize the process list into memory using the <= operator.

  • LET x = query (Lazy - runs the query every time it is referenced).
  • LET x <= query (Materialized - runs the query once, caches the results in an in-memory array, and accesses it instantly).
-- Step 1: Cache the process list in memory (runs exactly ONCE)
LET ProcessCache <= SELECT Pid AS ProcessPid, Name AS ProcessName, Exe AS ProcessPath 
                    FROM pslist()

-- Step 2: Query netstat and map using a fast in-memory subquery lookup
SELECT Laddr.IP AS LocalIP, Laddr.Port AS LocalPort, 
       Raddr.IP AS RemoteIP, Raddr.Port AS RemotePort, Pid,
       {
         SELECT ProcessName 
         FROM ProcessCache 
         WHERE ProcessPid = Pid 
       } AS ProcessName
FROM netstat()
WHERE RemoteIP != "0.0.0.0" AND RemoteIP != "127.0.0.1"

This materialized lookup runs in milliseconds instead of seconds!

3.4 Speeding up with memoize()

For large lookups, you can use the memoize() function. It builds an indexed hash map in memory on a specific key:

-- Re-cache every 10 seconds (10000ms)
LET ProcessLookup <= memoize(
  key="ProcessPid",
  query={ SELECT Pid AS ProcessPid, Name, Exe FROM pslist() },
  period=10000
)

-- Access the cached process details using get() in O(1) time!
SELECT Laddr.IP AS LocalIP, Laddr.Port AS LocalPort, Pid,
       get(item=ProcessLookup, field=Pid).Name AS ProcessName
FROM netstat()

4. Level 3: Log Parser (Grok, CSV, and Aggregate Functions)

Log parsing is a fundamental task for detection engineers. VQL handles CSVs, JSONs, XMLs, and unstructured text files beautifully.

4.1 Parsing Structured CSV Logs

Let's parse a CSV log file on the fly using parse_csv().

LET RawLogs = '''Timestamp,User,Action,IPAddress
2026-05-19T10:00:00Z,admin,login_success,192.168.1.50
2026-05-19T10:01:05Z,attacker,login_failed,45.2.3.9
2026-05-19T10:02:10Z,attacker,login_failed,45.2.3.9
2026-05-19T10:03:00Z,attacker,login_success,45.2.3.9
'''

SELECT Timestamp, User, Action, IPAddress
FROM parse_csv(accessor="data", filename=RawLogs)
WHERE Action = "login_success"

4.2 Grouping and Aggregate Analysis (GROUP BY)

Let's group the failed logins by user and calculate counts using the aggregate count() function:

SELECT User, count() AS TotalFailedAttempts
FROM parse_csv(accessor="data", filename=RawLogs)
WHERE Action = "login_failed"
GROUP BY User

Output:

[
 {
  "User": "attacker",
  "TotalFailedAttempts": 2
 }
]

4.3 Parsing Unstructured Logs with grok()

When logs are unstructured text files (like web IIS logs or Linux syslogs), you can extract fields using Grok expressions (named regex patterns).

Let's parse raw web server log lines:

LET WebServerLogs = '''192.168.1.100 - - [19/May/2026:10:00:01 -0700] "GET /index.html HTTP/1.1" 200 452
192.168.1.150 - - [19/May/2026:10:01:45 -0700] "POST /login.php HTTP/1.1" 401 234
192.168.1.180 - - [19/May/2026:10:02:15 -0700] "GET /admin/cmd.aspx?cmd=whoami HTTP/1.1" 404 125
'''

-- Parse lines and extract fields using a custom Grok expression
LET parsed_lines = SELECT grok(
  grok="%{IP:ClientIP} - - \\[%{HTTPDATE:Timestamp}\\] \"%{WORD:Method} %{NOTSPACE:URI} HTTP/%{NUMBER:Version}\" %{NUMBER:StatusCode:int} %{NUMBER:ResponseBytes:int}",
  data=Line
) AS LogRecord
FROM parse_lines(accessor="data", filename=WebServerLogs)

-- Use foreach to deconstruct the parsed dictionaries into individual columns
SELECT * FROM foreach(row=parsed_lines, column="LogRecord")
WHERE StatusCode >= 400

Output:

[
 {
  "ClientIP": "192.168.1.150",
  "Timestamp": "19/May/2026:10:01:45 -0700",
  "Method": "POST",
  "URI": "/login.php",
  "Version": "1.1",
  "StatusCode": 401,
  "ResponseBytes": 234
 },
 {
  "ClientIP": "192.168.1.180",
  "Timestamp": "19/May/2026:10:02:15 -0700",
  "Method": "GET",
  "URI": "/admin/cmd.aspx?cmd=whoami",
  "Version": "1.1",
  "StatusCode": 404,
  "ResponseBytes": 125
 }
]