Competition tasks:

Track 1 - Secure Evaluation of DNA binding Classification Convolutional Neural Network

Topic	Key Points
Title	iDASH 2025 Track 1 – Secure Evaluation of DNA-binding Classification CNN
Objective	Securely evaluate a provided CNN on encrypted DNA sequences using homomorphic encryption (HE); return encrypted prediction identical to plaintext run.
Parties	QE (querying entity – owns sensitive DNA) & CE (computing entity – owns model).
Model	Torch CNN with conv + pooling + FC layers; binary protein-DNA binding output; no retraining needed.
Input	2-column, tab-delimited file; each sequence: 200 nt (A,C,G,T).
Output	Participant-defined text file; one score/vector per sequence (after QE decryption).
Provided	• Torch model + docs • Example file (1 000 seqs/classes) for format only.
Technical Rules	• Implement/approximate every layer (incl. final) under HE Std (Albrecht-19). • Only one linear scaling allowed before/after decryption. • No explicit QE/CE traffic simulation needed.
Evaluation	Test set: 2 000 sequences. Metric: auROC / exp(wall-time / 20 min); >60 min wall-time ⇒ disqualified.
Deliverables	Code/binaries + documentation; BSD 3-Clause license; submission link (TBA).
Timeframe & Limits	≤60 min wall-time for full evaluation; faster yields better score via exponential penalty.
Data/Code Usage	No dataset sharing; no publications before workshop; abide by shared licenses.
Support/FAQ	FAQ Document
Docs & Data link	Documentation and Data

BACKGROUND

As the demand grows for privacy-preserving computation on large machine learning models such as those used in natural language processing, biomedical analysis, homomorphic encryption (HE) has emerged as a promising approach for enabling secure evaluation of sensitive queries.

In this year's challenge, participants are tasked with developing secure HE-based methods to evaluate a convolutional neural network (CNN) designed to predict protein-DNA interactions from DNA sequences. The goal is to protect the confidentiality of the input data: Given a query DNA sequence (encrypted), the participants’ methods should securely compute the model’s prediction, returning an encrypted output that can be decrypted to reveal the (same) result as the model run on the plaintext input.

We are motivated by last year's challenge and tuned the model complexity. This year’s challenge balances practicality and scalability. The CNN model provided includes convolutional and pooling layers, followed by fully connected layers for classification, representing a real-world use case in computational biology. Successfully addressing this challenge will demonstrate the feasibility of secure evaluation for complex bioinformatics models, contributing to the broader goal of deploying privacy-preserving AI in biomedical research and healthcare applications.

GOAL

In this challenge, there are 2 entities:

1) QE: Querying entity with a DNA sequence fragment they would like to classify. The DNA sequence is considered sensitive and cannot be revealed.

2) CE: Computing entity that holds the model.

QE wants to evaluate CE's model to classify whether a specific protein binds to the DNA fragment.

The model takes a DNA sequence (i.e., a sequence of one-letter DNA nucleotides) and estimates binding as a binary variable output.

CE only receives encrypted DNA sequence(s) and the set of keys from QE (except for the secret key), evaluates the model securely on the encrypted DNA sequence, and returns the encrypted binary output to QE. For generality, we assume that DNA sequences are encrypted under the public key of HE. Finally, QE decrypts the results.

CHALLENGE

In the challenge period, the teams will be provided with:

Protein binding prediction neural network model and related documentation.

A text file with sequences and classes for 1000 DNA sequences. This file is provided as the example input file format for the Evaluation Stage of the competition. This example file is not the training dataset for the model, and teams are not required to train the model; this file is only included as an example of the file format we will use in the Evaluation stage.

DATA FORMATTING

Input DNA data is formatted as a 2-column tab-delimited text file. Each DNA sequence is 200 nucleotides (i.e., sequence of A,C,G,T) long.

The model is provided as a torch file with the required documentation to run and explore the architecture and weights.

The output (after decryption) should be a text file that contains a vector for each input DNA sequence, indicating the score for binding. The format of this output can be defined by the participants.

REQUIREMENTS FOR THE SOLUTIONS

The teams must implement (or approximate) each network layer and provide the details of the approximation in the final documentation.

The solutions are required to implement the final layer (unlike last year)

Solutions do not need to simulate QE and CE or the network traffic explicitly. The consecutive steps can be implemented into the solution.

Preprocessing of plaintext DNA sequence data is not allowed except for one linear scaling operation of the data before and after decryption.

The encryption must satisfy the requirements of the HE white paper: Albrecht et al., Homomorphic Encryption Standard, https://eprint.iacr.org/2019/939.pdf

EVALUATION CRITERIA

In the evaluation period, the submissions will be evaluated with respect to the accuracy of secure classification on a test set of 2000 DNA sequences. All submissions will also be benchmarked with respect to time usage in minutes.

A final score will be used to assign ranks to all submissions:

"auROC per exp(wall time per 20 min)" = (auROC / exp(total time in minutes / 20))

* exp(total time in minutes / 20) denotes the e^(wall_time_in_minutes / 20)

* Submissions that require more than 60 minutes of wall time will be excluded from further evaluation.

LICENSE

This track requires every participating team to share their code and/or binaries under the BSD 3-Clause Open Source license. The track co-organizers will upload all submitted code and/or binaries to a GitHub repository under the BSD 3-Clause License right after the competition results are announced. By submitting the code and/or binaries, the participants automatically consent to allow the track co-organizers to release the code and/or binaries under the BSD 3-Clause License.

DATA USAGE AND PUBLICATION AGREEMENT

By registering and/or participating in this challenge and receiving restricted access to the challenge dataset, members of all teams agree to abide by the following rules of data usage:

1. They will not share the challenge dataset with others.

2. They will not use the challenge dataset in any publications until after the iDASH25 Workshop concludes.

3. They will adhere to the license terms of the shared code, data, and documentation when they are used before or after the challenge period.

These are set up to ensure fairness among the participating teams.

FAQ: For questions and clarifications, please check and post on our FAQ here

Submission: Please use the link below to submit your solution:

TBA

For any question, please contact Arif @ arif.o.harmanci@uth.tmc.edu

DATA & DOCUMENTATION: Click Here

Track 2: Access Request Recording and Querying for Biomedical Datasets

Goal

To develop blockchain-based smart contracts for managing biomedical data requests, to facilitate the continuity/efficiency of biomedical, clinical, and genomic research [1].

Challenge

Multiple smart contracts must be implemented to manage both data (i.e., data requests) and dictionaries (e.g., list of principal investigators).

All data/dictionaries and intermediary data/dictionaries (e.g., index or cache of the original data/dictionaries) must be stored entirely on-chain via smart contracts (i.e., no off-chain data storage is allowed).

We will provide the skeleton of the smart contracts.

The system must manage data requests from research institutions while enforcing specific data access requirements around dataset ownership, Principal Investigator (PI) credentials, and Data Use Agreements (DUA).

Each participant can determine how each insertion is represented and stored in the smart contracts.

Participants can implement any algorithm to store, retrieve and present the data/dictionaries correctly and efficiently.

Users should be able to query the data from any of the blockchain nodes.

Evaluation Criteria

The data access request system will need to demonstrate satisfactory performance (i.e., 100% accurate query results) on a test dataset, which will be different from the training set provided online.

We will evaluate the efficiency of each solution using insertion and query times.

We will insert data requests for a specified time frame before conducting the queries.

We will only grade the accuracy and storage/retrieval speed on data but not dictionaries (i.e., we will pre-store the dictionaries on-chain before evaluating participants’ smart contracts).

Experimental Setting

Given a set of institutional biomedical data requests, design a time/space efficient data structure and mechanisms to manage (i.e., store and retrieve) these requests based on Ethereum Solidity smart contracts.

Additional Rules

The submission should include 3 files of Solidity source code per team.

All 3 smart contracts must be implemented.

Reusing any external code/library must follow the license agreement for both the code/library and our track (please see below License section for more details), and the reusing code blocks must be clearly and explicitly cited using Solidity comments.

One person can only participate in one team. All team members' names and emails must be listed as in Solidity comment in the submission code file and team membership cannot be changed after the submission due date. Although it is allowed to submit multiple times before the due date, only the last submission will be evaluated. If there are solution-wise communications cross teams, it must also be disclosed in comments. The solutions of the teams not following the rules above will not be evaluated for fairness consideration.

Data Usage and Publication Agreement

By registering and/or participating in this challenge and receiving restricted access to the challenge dataset, members of all teams agree to abide by the following rules of data usage: (1) They will not share the challenge dataset with others. (2) They will not use the challenge dataset in any publications until after the iDASH 2025 Workshop concludes. These are set up to ensure fairness among the participating teams.

License

This track requires every participating team to share their code and/or binaries under the BSD 3-Clause License Open Source license. The track co-organizers will upload all submitted code and/or binaries to a GitHub repository under the BSD 3-Clause License right after the competition results are announced. By submitting the code and/or binaries, the participants automatically consent to allow the track co-organizers to release the code and/or binaries under the BSD 3-Clause License.

Data Skeleton

Click Here

FAQ:

For questions and clarifications, please check and post on our FAQ here

References

Yu Y, Edelson M, Pham A, Pekar JE, Johnson B, Post K, Kuo T-T. Distributed, immutable, and transparent biomedical limited data set request management on multi-capacity network. Journal of the American Medical Informatics Association. 2024. doi: 10.1093/jamia/ocae288. PubMed PMID: 39569448.