Competition tasks - IDASH PRIVACY & SECURITY WORKSHOP 2026

Track 1 : Secure Gene Expression Generation by Decoder Evaluation

Background

As the demand grows for privacy-preserving computation on large machine learning models—particularly in biomedical and genomics applications—homomorphic encryption (HE) has emerged as a powerful approach for enabling secure evaluation of sensitive data.

In this year’s challenge, participants are tasked with developing secure HE-based methods to evaluate the decoder component of a variational autoencoder (VAE) designed to reconstruct gene expression profiles from latent embeddings. The goal is to protect the confidentiality of the input embeddings: given encrypted latent representations, participants’ methods should securely compute the decoder output, returning encrypted gene expression reconstructions equivalent to those obtained in the plaintext domain.

Unlike prior challenges focused on convolutional architectures, this year emphasizes continuous-valued latent representations and dense neural decoding, which are increasingly common in single-cell and bulk transcriptomics modeling. Successfully addressing this challenge will demonstrate the feasibility of secure evaluation for generative models in biomedical AI, contributing to privacy-preserving analysis pipelines for sensitive omics data.

Goal

In this challenge, there are 2 entities:

1. QE (Querying Entity): Holds latent embeddings. These embeddings are considered sensitive and must remain private.

2. CE (Computing Entity): Holds the trained VAE decoder model.

QE encrypts the latent embeddings using a public-key HE scheme. CE evaluates the VAE decoder entirely in the encrypted domain. CE returns encrypted reconstructed gene expression vectors. QE decrypts the results.

Challenge

During the challenge period, teams will be provided with:

1. A pre-trained VAE decoder model (Python/PyTorch implementation with full documentation).

2. Model parameters (weights and biases for all layers).

3. A sample dataset of embeddings demonstrating the expected input format.

4. Example plaintext outputs for validation and debugging.

Participants are not required to train the model. The focus is entirely on secure inference of the decoder.

Data Formatting

Input: A matrix of latent embeddings (e.g., 2004 samples × latent dimension). Provided as a text or binary file (format specified in documentation). Each row corresponds to a sample embedding vector.

Model: The decoder is provided as a PyTorch model consisting of fully connected layers (and possibly nonlinear activations). Documentation will describe architecture, layer ordering, and parameter usage.

Output: The output (after decryption) should be a matrix of reconstructed gene expression values. Each row corresponds to a sample, and each column corresponds to a gene. Output format can be defined by participants but must align with evaluation scripts.

Requirements for the Solutions

Teams must implement (or approximate) all decoder layers using HE-compatible operations

Any nonlinearities (e.g., ReLU, sigmoid, tanh) must be approximated using polynomial or other HE-friendly methods, and details must be documented

Solutions must perform end-to-end secure evaluation of the decoder

Explicit simulation of QE/CE communication is not required; implementations may be consolidated

Only linear scaling of inputs/outputs is allowed outside the encrypted domain (before encryption and after decryption)

The encryption scheme must comply with the HE standard: Albrecht et al., Homomorphic Encryption Standard (https://eprint.iacr.org/2019/939.pdf)

Evaluation Criteria

- Accuracy as average concordance correlation coefficient (CCC) between the known and predicted expression profiles. Given x and y are the known and predicted expression profiles, CCC (ρ_c) is calculated as:

$$ \rho_c = \frac{2\rho \sigma_x \sigma_y} {\sigma_x^2 + \sigma_y^2 + (\mu_x - \mu_y)^2} $$

Where ρ is the Pearson correlation between x and y, σ_x, μ_x are the standard deviation and mean of x.

- Runtime (Walltime in minutes)

Time normalized accuracy $ \left( \frac{\rho_c^2} {\exp(t/20)} \right) $ will be used to rank submissions. Any submission with runtime >60 min will be disqualified.

License

This track requires every participating team to share their code and/or binaries under the BSD 3-Clause Open Source License.

All submissions will be released in a public repository after the competition. By submitting, participants consent to this release.

Data Usage

By participating, teams agree:

1. Not to share the challenge dataset.

2. Not to use the dataset in publications before the iDASH26 Workshop concludes.

3. To comply with licensing terms of all shared materials.

Contact

For any question, please contact Arif @ arif.o.harmanci@uth.tmc.edu

Data and Model

https://drive.google.com/file/d/1pp6sFy1G65865La7vPq7WLyExrYgXb6n/view?usp=sharing

FAQ

https://docs.google.com/document/d/1Xn4EhrJI3D8GxyBckSosRlSO52Q6D5-usQGSgBWbTwM/edit?usp=sharing

Track 2: Gene-Disease Research Activity Logging System for Collaborative Data Analysis

Goal

To develop a blockchain-based smart contract for recording collaborative data analysis for gene-disease research activities [1] [2].

Experimental Setting

Given a set of collaborative gene-disease analysis activities, design an efficient data structure and mechanisms to manage (i.e., store and retrieve) these requests within an Ethereum Solidity smart contract.

Challenge

Participants will implement a single smart contract to record the gene-disease research activities. All data and intermediary data/dictionaries (e.g., index or cache of the original data) must be stored entirely on-chain via smart contracts (i.e., no off-chain data storage is allowed). We will provide the skeleton of the smart contract. The system must manage the activities from research institutions while enforcing the specific order of the activities (i.e., an institution first asks a research question, followed by each institution providing their local results, and finally the original institution aggregates the results). The results will be evaluated based on the number of complete gene-disease question-and-answer records. Participants can determine how each insertion is represented and stored in the smart contract. Participants can implement any algorithm to store, retrieve and present the data correctly and efficiently. Users should be able to query the data from any of the blockchain nodes.

Evaluation Criteria

The data access request system will need to demonstrate satisfactory performance (i.e., 100% accurate query results for complete records) on a test dataset, which will be different from the training set provided online. ~~We will evaluate the efficiency of each solution using insertion and query times.~~ We will evaluate the efficiency of each solution using insertion and query performance. We will insert data requests for a specified time frame before conducting the queries.

Additional Rules

The submission should include 1 file of Solidity source code per team. Reusing any external code/library must follow the license agreement for both the code/library and our track (please see below License section for more details), and the reusing code blocks must be clearly and explicitly cited using Solidity comments. One person can only participate in one team. All team members' names and emails must be listed as in Solidity comment in the submission code file and team membership cannot be changed after the submission due date. Although it is allowed to submit multiple times before the due date, only the last submission will be evaluated. If there are solution-wise communications cross teams, it must also be disclosed in comments. The solutions of the teams not following the rules above will not be evaluated for fairness consideration.

License

This track requires every participating team to share their code and/or binaries under the BSD 3-Clause License Open Source license. The track co-organizers will upload all submitted code and/or binaries to a GitHub repository under the BSD 3-Clause License right after the competition results are announced. By submitting the code and/or binaries, the participants automatically consent to allow the track co-organizers to release the code and/or binaries under the BSD 3-Clause License.

Data Usage and Publication Agreement

By registering and/or participating in this challenge and receiving restricted access to the challenge dataset, members of all teams agree to abide by the following rules of data usage: (1) They will not share the challenge dataset with others. (2) They will not use the challenge dataset in any publications until after the iDASH 2026 Workshop concludes. These are set up to ensure fairness among the participating teams.

Data and Code Skeleton

https://drive.google.com/file/d/1HG4mOZTM1JNkgnnRTU-8HhIfHQrmmcgU/view?usp=sharing

FAQ

https://docs.google.com/document/d/1gKiK0yP8GVnyQHIp1ZYHFj0XPZ4WBpjYth26tNprOiU/

References

[1] Tsung-Ting Kuo, Anh Pham, Maxim E Edelson, Jihoon Kim, Jason Chan, Yash Gupta, Lucila Ohno-Machado, The R2D2 Consortium , Blockchain-enabled immutable, distributed, and highly available clinical research activity logging system for federated COVID-19 data analysis from multiple institutions, Journal of the American Medical Informatics Association, Volume 30, Issue 6, June 2023, Pages 1167–1178, https://doi.org/10.1093/jamia/ocad049.

[2] B.D. Solomon, A. Nguyen, K.A. Bear, & T.G. Wolfsberg, Clinical Genomic Database, Proc. Natl. Acad. Sci. U.S.A. 110 (24) 9851-9855, https://doi.org/10.1073/pnas.1302575110 (2013).

Track 3: Accelerating MPC-Based Deep Learning for Drug-Target Interaction Prediction

Organized by Haixu Tang (Indiana University) Chenghong Wang (Indiana University) and XiaoFeng Wang (Nanyang Technological University)

Goal

To develop a blockchain-based smart contract for recording collaborative data analysis for gene-disease research activities [1] [2].

Background

The identification of Drug-Target Interactions (DTI) serves as a foundational pillar in modern pharmacology and drug discovery. By accurately predicting the binding affinity between potential drug candidates and specific protein targets, researchers can significantly narrow down the vast chemical space to identify molecules that effectively modulate biological pathways. This process is essential for understanding the mechanism of action of new therapeutics, repurposing existing drugs for new indications, and anticipating potential off-target effects that might lead to toxicity. As the cost and time required for traditional experimental screening continue to rise, computational DTI prediction has become an indispensable tool for accelerating the delivery of life-saving treatments to patients.

The primary motivation for adopting Secure Multiparty Computation (MPC) in DTI prediction stems from the need for pharmaceutical companies to collaboratively build robust DTI prediction models without exposing their highly sensitive, proprietary internal screening data. As demonstrated by the pioneer work by Brian Hie and colleagues [1], MPC frameworks using secret sharing protocols allow multiple entities to pool the predictive power of their disparate datasets while ensuring that raw chemical structures and protein targets remain encrypted and private. By distributing data as “shares” across non-colluding servers, this approach enables the training of models on a global scale of information that no single company could possess alone, effectively overcoming the “data silo” problem that historically hindered drug discovery.

Notably, in the original study of the MPC approach, conventional machine learning models were used for DTI prediction. The recent advances of deep neural networks have led to many new models with superior performance. For instance, Convolutional Neural Networks (CNNs) were employed in tools like DeepDTA [2] to model the local patterns in protein sequences and SMILES strings by treating them as structural motifs. More recently, the focus has shifted toward Transformer-based models [3], which utilize sub-structure pattern mining and self-attention mechanisms to capture long-range dependencies. These deep learning advancements offer superior predictive accuracy and scalability, though they pose significant computational challenges when integrated into MPC frameworks. While MPC is traditionally CPU-bound due to complex secret sharing logic, the integration of high-throughput hardware such as GPUs or Data Processing Units (DPUs) provides a path to significantly reduce latency and increase throughput.

Challenge

This challenge focuses on the acceleration of Secure Multiparty Computation (MPC) for DTI prediction through the synergy of algorithmic approximation and GPU parallelization. Specifically, we invite participating teams to implement a two party secret sharing framework that approximates the DTI predictor, DeepDTAGen [4]. The objective is to achieve a level of prediction accuracy comparable to the original DeepDTAGen model while significantly enhancing computational efficiency via hardware-aware optimization via GPU and algorithmic shortcuts.

Evaluation Criteria

We will use two evaluation criteria:

The accuracy of the MPC implementation of the DTI prediction, which is the average of the sensitivity and specificity.

The fold of acceleration of the MPC implementation on a specific test platform (virtual machine) of two workstations each with a H100 GPU (detailed configuration will be available later).

The solution is considered qualified if it reaches the prediction accuracy no more than 2% lower on the test data. We will compare the fold of acceleration for the final ranking of qualified solutions.

Experimental Setting

The revised DeepDTAGen implementation and model weights is available at https://github.com/Lang280/DeepDTAGen#. Please use the two sample test datasets to evaluate the model performance: 1) Davis Test data, https://drive.google.com/file/d/154sC0nQ4-Cr54vAVMZjbOhPlS3eaFoK-/view?usp=drive_link and 2) KIBA test set: https://drive.google.com/file/d/1XYEx1-zRmoqiAMWFesEntq_MKi-3zhin/view?usp=drive_link. A held-out test dataset (not shared with participating teams) will be used to evaluate submitted solutions. In the submitted solution, each team should describe the expected data format for running the program.

Test Platform

TBD

FAQ

https://docs.google.com/document/d/1Ntqy-_CVMB-T_48Zsp54ZPIxrB39ENNRyKukguM16ac/edit?usp=sharing

References

[1] Hie, B., Cho, H. and Berger, B., 2018. Realizing private and practical pharmacological collaboration. Science, 362(6412), pp.347-350.

[2] Öztürk, Hakime, Arzucan Özgür, and Elif Ozkirimli. "DeepDTA: deep drug–target binding affinity prediction." Bioinformatics 34, no. 17 (2018): i821-i829.

[3] Zhang, P., Wei, Z., Che, C. and Jin, B., 2022. DeepMGT-DTI: Transformer network incorporating multilayer graph information for Drug–Target interaction prediction. Computers in biology and medicine, 142, p.105214.

[4] Shah, P.M., Zhu, H., Lu, Z., Wang, K., Tang, J. and Li, M., 2025. DeepDTAGen: a multitask deep learning framework for drug-target affinity prediction and target-aware drugs generation. Nature Communications, 16(1), p.5021.

Competition tasks:

Track 1 : Secure Gene Expression Generation by Decoder Evaluation

Background

Goal

Challenge

Data Formatting

Requirements for the Solutions

Evaluation Criteria

License

Data Usage

Contact

Data and Model

FAQ

Track 2: Gene-Disease Research Activity Logging System for Collaborative Data Analysis

Goal

Experimental Setting

Challenge

Evaluation Criteria

Additional Rules

License

Data Usage and Publication Agreement

Data and Code Skeleton

FAQ

References

Track 3: Accelerating MPC-Based Deep Learning for Drug-Target Interaction Prediction

Goal

Background

Challenge

Evaluation Criteria

Experimental Setting

Test Platform

FAQ

References