An Introduction to MinaExplorer’s BigQuery Public Dataset: Medium article by Gareth Davies

One of Mina Protocol’s most prominent innovators is full stack developer, Gareth Davies. A few month’s ago Gareth wrote an article on Medium that went into great detail about Evaluating a Mina staking pool’s performance and this week he has written a new article entitled ‘An Introduction to MinaExplorer’s BigQuery Public Dataset’.

You can read the original article on Medium here, but Gareth has also allowed us under a creative commons licence to publish his work here too (See below).


Mina is a succinct blockchain, which means while we can verify the chain’s current state using a zero-knowledge proof, the prior history is not retained. So, if we want to analyze the chain’s history (for example, to see individual transactions), we need to obtain it from an archive node.

The official implementation of an archive node stores its data to a Postgres database, which typically requires the use of recursive queries to determine the canonical chain information.

See this post on how to set up and configure an archive node for redundancy.

Those who want to query historical data without running their own archive nodes (and associated Mina node(s)) can use historical services like Figment’s DataHub or MinaExplorer’s archive GraphQL API. However, neither offer simple aggregation features to answer questions such as “how many blocks were produced in the last 24 hours”. While obtaining this information via scripting is possible, directly querying a database with a SQL query is more accessible and efficient.

MinaExplorer has published its custom archive node dataset to Google BigQuery as a public dataset to resolve this issue. Google BigQuery is a cloud-based big data analytics web service for processing very large data sets. Data is replicated from MinaExplorer’s database (which stores GraphQL subscriptions) with a small latency of no more than a few minutes.

The schema of the BigQuery dataset matches that of the MinaExplorer GraphQL API. Notably, snarks and user transactions (aka user commands), which are also nested in blocks data, are separated into their own tables for easier querying.

Making your first query

While you can use any GUI that supports BigQuery, such as PopSQLTableau, or DataGrip, we will use the BigQuery console to execute our queries for this article. Access the console via https://console.cloud.google.com/bigquery, create a new project if required, and add the data source by selecting Add Data -> Pin a project -> Enter project name and enter minaexplorer.

While it is not necessary to pin the project, this will allow you to browse the table’s schemas easily.

To create our first query, click Compose New Query in the top right of the console, and enter the following to determine the number of blocks in the database, and click Run.

SELECT
canonical,
COUNT(statehash) as total
FROM
minaexplorer.archive.blocks
GROUP BY
canonical

If all goes well, you should see output similar to the below (your result will differ).

Number of blocks in the database, grouped by canonical status

This query highlights that the database stores all blocks seen. If you are only interested in the canonical chain, filter all of your queries WHERE canonical = true to only return canonical blocks/snarks/transactions.

While MinaExplorer pays to host the dataset, any queries you run against the dataset are charged against your own personal billing account. There is a generous free tier of 1TB of data processing to get started.

The remainder of the article will provide some sample queries to demonstrate the use of the dataset.

Querying Block Data

Each block has a corresponding datetime field (UTC), which we can use to, for example, group canonical blocks by day.

SELECT datetime_trunc(datetime, DAY) as day,
COUNT(statehash) as total_blocks
FROM minaexplorer.archive.blocks
WHERE canonical = true
GROUP BY day
ORDER BY total_blocks DESC

This query returns the following result, which we could additionally visualize in Data Studio by clicking on the Explore Data link.

Number of canonical blocks produced per day

Querying Transaction Data

For transactions, as well as filtering for the canonical status of the transaction, you should also check that the transaction did not fail. You can do this by adding a WHERE failurereason IS NULL filter as this field is only populated if the transaction fails. Let’s find how many transactions have failed and how many of each type:

SELECT failurereason,
COUNT(id) as total
FROM minaexplorer.archive.transactions
WHERE canonical = true
GROUP BY failurereason

Transactions can fail and should filter for failurereason IS NULL

There are currently two failure reasons in the database Amount_insufficient_to_create_account where the receiving account is not in the ledger, and the amount sent does not cover the account creation fee, and Receiver_not_present which occurs when delegating to an account, not in the ledger.

You can also query user transactions via the blocks table, which is stored as nested JSON under blocks->transactions->usercommands. However, using the transactions table for transaction querying is likely simpler, joining on the block statehash if you require additional block data.

Let’s find the largest transactions (in MINA) that occurred over the last 7 days.

SELECT amount / 1000000000 as sent_amount,
`from`,
`to`,
dateTime,
blockheight
FROM minaexplorer.archive.transactions
WHERE canonical = true
AND failurereason IS NULL
AND datetime >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -7 DAY)
ORDER BY amount DESC

Largest transactions over the last 7 days in MINA

The following gist retrieves the last 100 transactions for a public key, both sent and received, with the associated transaction information.

Querying SNARK Data

One major advantage of the MinaExplorer data over the Postgres archive node is the ability to extract individual SNARK information (these are combined to a single fee transfer in the archive node). As with the transaction data, SNARK data is also available, stored as nested JSON, in the blocks table.

Let’s get summary data for all SNARKs included in the canonical chain:

SELECT
COUNT(fee) as total_snarks,
MIN(fee) / 1000000000 as minimum_fee,
MAX(fee) / 1000000000 as maximum_fee,
AVG(fee) / 1000000000 as average_fee
FROM
minaexplorer.archive.snarks
WHERE
canonical = true

Maximum, minimum, and average SNARK fees in the canonical chain

Yes, this block really did have a SNARK included for 700 MINA.

You can use this data to help choose appropriate SNARK fees, for example, by filtering for just the last 24 hours by adding AND datetime >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -24 HOUR) to the WHERE clause. Or via the previous 100 blocks (this query includes a subquery to first determine the latest height):

SELECT COUNT(fee) as total_snarks,
MIN(fee) / 1000000000 as minimum_fee,
MAX(fee) / 1000000000 as maximum_fee,
AVG(fee) / 1000000000 as average_fee,
FROM minaexplorer.archive.snarks as s,
(
SELECT blockheight
FROM minaexplorer.archive.blocks
WHERE canonical = true
ORDER BY blockheight DESC
LIMIT 1
) as b
WHERE canonical = true
AND s.blockheight >= (b.blockheight – 100)

Querying Account Data

The following query to determine the number of accounts in the ledger demonstrates how accounts are added to the ledger. Either via:

  • Included in the Genesis ledger (ledger hash jx7buQVWFLsXTtzRgSxbYcT8EYLS8KCZbLrfDcJxMtyy4thw2Ee).
  • Received a transaction with an amount high enough to cover the ledger creation fee.
  • Was included via a fee transfer (snark work or coinbase receiver) with a high enough amount to cover the ledger creation fee.
WITH ordered_inclusion as (
    SELECT
        `to`,
        datetime,
        ROW_NUMBER() OVER (
            PARTITION BY `to`
            ORDER BY
                datetime
        ) as first_created
    FROM
        (
            SELECT
                `to`,
                datetime,
            FROM
                minaexplorer.archive.transactions
            WHERE
                canonical = true
                AND failurereason iS NULL
            UNION ALL
            SELECT
                `public_key`,
                '2021-03-17T00:00:00.000+00:00'
            FROM
                minaexplorer.archive.ledgers
            WHERE
                ledgerhash = "jx7buQVWFLsXTtzRgSxbYcT8EYLS8KCZbLrfDcJxMtyy4thw2Ee"
            UNION ALL
            SELECT
                transactions.coinbasereceiveraccount.publickey,
                datetime
            FROM
                minaexplorer.archive.blocks
            WHERE
                canonical = true
            UNION ALL
            SELECT
                json_extract_scalar(h, '$.recipient'),
                datetime
            FROM
                minaexplorer.archive.blocks
                LEFT JOIN unnest(json_extract_array(transactions.feetransfer)) as h
            WHERE
                canonical = true
                AND json_extract_scalar(h, '$.recipient') = "{{public_key}}"
        )
)
SELECT
    `to`,
    datetime
FROM
    ordered_inclusion
where
    first_created = 1
ORDER BY
    datetime ASC

view raw

The result of this query determines when the account was first added to the ledger so we can visualize the data grouped by date to produce the following chart (thanks to davidg#1063 for help in generating this query).

Number of accounts in the MINA ledger

We can calculate the balance of any account at any block height with the following query. Replace the variable ${public_key} with the address of the account that you wish to calculate.

SELECT sum(balances.amount) / 1000000000 as current_balance
FROM (
         /* Fee transfers received (positive) - this can be snark work or transaction fees */
         SELECT SUM(CAST(json_extract_scalar(h, '$.fee') as NUMERIC)) as amount
         FROM minaexplorer.archive.blocks
                  LEFT JOIN unnest(json_extract_array(transactions.feetransfer)) as h
         WHERE canonical = true
           AND json_extract_scalar(h, '$.recipient') = '${public_key}'
         UNION ALL
         /* Fee transfers paid out of the received coinbase (negative) - snark work paid for out of coinbase transaction */
         SELECT SUM(CAST(json_extract_scalar(h, '$.fee') as NUMERIC)) * -1 as amount
         FROM minaexplorer.archive.blocks
                  LEFT JOIN unnest(json_extract_array(transactions.feetransfer)) as h
         WHERE canonical = true
           AND json_extract_scalar(h, '$.recipient') != '${public_key}'
           AND blocks.transactions.coinbasereceiveraccount.publickey = '${public_key}'
           AND json_extract_scalar(h, '$.type') = 'Fee_transfer_via_coinbase'
         UNION ALL
         /* Transactions received (positive) */
         SELECT sum(amount) as amount
         FROM minaexplorer.archive.transactions
         WHERE `to` = '${public_key}'
           AND canonical = true
           AND failurereason IS NULL
         UNION ALL
         /* Amount paid for an outgoing transaction (negative). Include a check if the tx failed. */
         SELECT sum(amount) * -1 as amount
         FROM minaexplorer.archive.transactions
         WHERE `from` = '${public_key}'
           AND canonical = true
           AND failurereason IS NULL
         UNION ALL
         /* Transaction fees sent (negative). Always apply even if tx failed. */
         SELECT sum(fee) * -1 as amount
         FROM minaexplorer.archive.transactions
         WHERE `from` = '${public_key}'
           AND canonical = true
         UNION ALL
         /* Coinbase transactions received (positive) */
         SELECT SUM(transactions.coinbase)
         FROM minaexplorer.archive.blocks
         WHERE blocks.transactions.coinbasereceiveraccount.publickey = '${public_key}'
           AND canonical = true
         UNION ALL
         /* Any balance in the Genesis ledger or -1 for ledger creation fee. Genesis ledger is in MINA instead of nanomina */
         (
             WITH genesis_balance AS (
                 select balance * 1000000000
                 from minaexplorer.archive.ledgers
                 WHERE public_key = '${public_key}'
                   AND ledgerhash = 'jx7buQVWFLsXTtzRgSxbYcT8EYLS8KCZbLrfDcJxMtyy4thw2Ee'
             )
             SELECT *
             FROM genesis_balance
             UNION ALL
             SELECT -1000000000
             FROM (
                      SELECT 1
                  )
                      LEFT JOIN genesis_balance ON FALSE
             WHERE NOT EXISTS (
                 SELECT 1
                 FROM genesis_balance
             )
         )
     ) balances

view raw Calculate the balance of any public key. Will return -1 if the address is not in the ledger.

Balances are comprised of:

  • Any balance in the Genesis ledger (or a 1 MINA ledger creation fee).
  • The amount of any outgoing transactions that succeded.
  • Any fees paid for any outgoing transactions.
  • The amount of any incoming transactions received.
  • Any coinbase rewards received.
  • Any fee transfers received (via block production or snark work).

If the account is not in the ledger, the following query will return -1.

Finally, we can determine the vesting slot of any timed account with the following query (replace the variable{public_key}).

SELECT
    timing.initial_minimum_balance,
    timing.cliff_time,
    timing.cliff_amount,
    timing.vesting_period,
    timing.vesting_increment,
    CASE timing.vesting_increment WHEN 0 THEN timing.cliff_time ELSE CAST(
        timing.cliff_time + (
            (
                timing.initial_minimum_balance - timing.cliff_amount
            ) / timing.vesting_increment
        ) * timing.vesting_period AS INT
    ) END AS fully_vested_slot
FROM
    minaexplorer.archive.ledgers
WHERE
    ledgerhash = "jx7buQVWFLsXTtzRgSxbYcT8EYLS8KCZbLrfDcJxMtyy4thw2Ee"
    AND public_key = "${public_key}"

view raw Calculate the unvested slot for any timed public key

Foundation Delegation Analysis

The Foundation delegation program requires calculating the amount to return to the Foundation each epoch from the balance of the pool and the number of blocks produced. While scheduled emails now provide this information, we can independently verify the values in the email using a public key and epoch data using the following query.

We can also confirm the amount sent between the epoch deadlines (change the variables ${public_key} and ${epoch} to match the address and epoch, you wish to check against the emails received.

SELECT SUM(
CAST(json_extract_scalar(h, ‘$.amount’) as NUMERIC)
) / 1000000000 as amount
FROM minaexplorer.archive.blocks
LEFT JOIN unnest(json_extract_array(transactions.usercommands)) as h
WHERE canonical = true
AND json_extract_scalar(h, ‘$.to’) = “${public_key}”
AND protocolstate.consensusstate.slotsincegenesis >= (${epoch} * 7140) + 3501
AND protocolstate.consensusstate.slotsincegenesis <= ((${epoch} + 1) * 7140) + 3500

Read the article on Medium here >


Disclaimer
We are not qualified brokers/dealers, or investment advisors. This website and its associated content is not a service for the giving or receiving of financial or investment advice. None of the content constitutes – or should be understood as constituting – a recommendation to enter in any securities, cryptocurrency, or cryptoasset transactions or to engage in any of the investment strategies presented in my discussions or content provided. We do not provide personalised recommendations or views as to whether a stock, cryptocurrency, cryptoasset or investment approach is suited to the financial needs of a specific individual.

Leave a Reply