Aborted (core dumped) with `hafenkran/duckdb-bigquery` with nodejs API #104

vwxyzjn · 2025-01-10T22:04:25Z

I have detailed the issue there hafenkran/duckdb-bigquery#58.

Using the duckdb cli works and python API also works, but the nodejs API fails...

vwxyzjn · 2025-01-10T22:07:33Z

duckdb-async also works

import { Database } from "duckdb-async";

async function main() {
    try {
        // Initialize DuckDB with config
        console.log('Connecting to DuckDB...');
        const db = await Database.create('cache.db', {
            allow_unsigned_extensions: 'true'
        });
        
        // Create table if not exists
        console.log('Ensuring local table exists...');
        await db.exec(`
            CREATE TABLE IF NOT EXISTS metrics (
                task_name VARCHAR,
                task_idx BIGINT,
                task_config JSON,
                model_config JSON,
                compute_config VARCHAR,
                metrics JSON,
                run_date VARCHAR,
                num_instances BIGINT,
                processing_time DOUBLE,
                workspace VARCHAR,
                experiment_id VARCHAR,
                eval_sha VARCHAR,
                task_hash VARCHAR,
                model_hash VARCHAR
            );
        `);
        console.log('Table created successfully!');

        // Install and load bigquery extension
        console.log('Setting up bigquery extension...');
        await db.exec('INSTALL bigquery FROM community;');
        await db.exec('LOAD bigquery;');

        // Get total count first
        console.log('Counting total rows to copy...');
        const countResult = await db.all(`
            SELECT COUNT(*) as total_count 
            FROM bigquery_scan('testestes.deletable.model_evaluations');
        `);
        
        const total_to_copy = Number(countResult[0].total_count);
        console.log(`Found ${total_to_copy.toLocaleString()} rows to copy`);

        
    } catch (error) {
        console.error('Error:', error);
        process.exit(1);
    }
}

main();

jraymakers · 2025-01-11T01:32:49Z

Thanks for the report. Unfortunately it's difficult for me to diagnose, because I don't have access to a BigQuery environment, so I can't run your script. (I unsurprisingly get an error about Google credentials.)

If there is a problem in Node Neo (as opposed to, say, the BigQuery extension), then it should be possible to create a repro without that extension. Do you still see the problem if you alter your example to avoid using that extension, perhaps by first exporting all or part of the data in a separate step?

Also, it would be helpful to know what version of @duckdb/node-api you're using, and on which platform.

carlopi · 2025-01-11T07:02:41Z

@vwxyzjn: what platforms are those environment?

Could you check what's the result of PRAGMA platform; for the CLI, the Python API, the duckdb-async and the node-neo API in your machine?

It's unclear what to make of the answer then, but could be of help tracking this down.
Also some general information on the architecture / OS you are running this in would be of help (already asked by @jraymakers I now see...)

vwxyzjn · 2025-01-11T15:22:18Z

D PRAGMA platform;
┌──────────────────┐
│     platform     │
│     varchar      │
├──────────────────┤
│ linux_amd64_gcc4 │
└──────────────────┘

Yeah I feel like the only way for reproduction is if you all create a bigquery table yourself... Here is the command to create tables on bigquery

CREATE OR REPLACE TABLE `ai2-allennlp.deletable.model_evaluations`
AS
SELECT 
  ROW_NUMBER() OVER() as id,
  CONCAT('project_', CAST(FLOOR(RAND() * 100) AS STRING)) as project,
  CONCAT('user_', CAST(FLOOR(RAND() * 1000) AS STRING)) as username,
  CONCAT('model_', CAST(FLOOR(RAND() * 1000) AS STRING)) as model_name,
  CONCAT('run_', CAST(FLOOR(RAND() * 1000) AS STRING)) as run_id,
  CASE CAST(FLOOR(RAND() * 5) AS INT64)
    WHEN 0 THEN 'gsm8k'
    WHEN 1 THEN 'ifeval'
    WHEN 2 THEN 'popqa'
    WHEN 3 THEN 'mmlu:cot::summarize'
    ELSE 'mmlu_abstract_algebra:mc'
  END as task_name,
  RAND() as primary_score
FROM 
  UNNEST(GENERATE_ARRAY(1, 1000000));  -- 1 million rows

Below is the package.json and lock file.
https://gist.github.com/vwxyzjn/289c63935dd24568f4db94c57973eda0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aborted (core dumped) with `hafenkran/duckdb-bigquery` with nodejs API #104

Aborted (core dumped) with `hafenkran/duckdb-bigquery` with nodejs API #104

vwxyzjn commented Jan 10, 2025

vwxyzjn commented Jan 10, 2025

jraymakers commented Jan 11, 2025

carlopi commented Jan 11, 2025 •

edited

Loading

vwxyzjn commented Jan 11, 2025

Aborted (core dumped) with hafenkran/duckdb-bigquery with nodejs API #104

Aborted (core dumped) with hafenkran/duckdb-bigquery with nodejs API #104

Comments

vwxyzjn commented Jan 10, 2025

vwxyzjn commented Jan 10, 2025

jraymakers commented Jan 11, 2025

carlopi commented Jan 11, 2025 • edited Loading

vwxyzjn commented Jan 11, 2025

Aborted (core dumped) with `hafenkran/duckdb-bigquery` with nodejs API #104

Aborted (core dumped) with `hafenkran/duckdb-bigquery` with nodejs API #104

carlopi commented Jan 11, 2025 •

edited

Loading