RSGDream Conference

Predicting gene expressions using random promoter sequences

Our paper has been published in BMC Bioinformatics! [paper]

We developed a transformer-based model aimed at predicting gene expression utilizing promoter sequence data. Our efforts resulted in securing the third position during the RSGDream Conference in Las Vegas, 2022. [code]

Participating in this conference marked my first experience in such academic gatherings, and I found it thoroughly captivating. Engaging in discussions about our research during coffee breaks, listening to keynote speeches, and comparing our approach with that of fellow researchers were invigorating moments that solidified my passion for academia.

RSGDream 2022, Las Vegas

We commenced with data preprocessing. I identified the repetition of specific segments at the start and end of DNA sequences. Removing these repetitions led to a 5% performance improvement. Our approach began with experimenting with N-gram models, followed by exploring RNN, LSTM, GRU, and transformer-based models. Subsequently, we delved into efficient models for DNA sequence analysis. To mitigate variance, we adopted the masking technique from BERT models. While abundant embeddings are available for language tasks, the same does not hold for DNA sequence data. Consequently, we needed to construct embeddings from scratch.

Secured the third place
A small insight unfolded: the essence of theoretical understanding 
as well as the art of practical execution holds equal importance.