2009 Volume 18 Issue 1
Article Contents

Gao Jie, Xu Zhen-Yuan. 2009: Chaos game representation (CGR)-walk model for DNA sequences, Chinese Physics B, 18(1): 370-376.
Citation: Gao Jie, Xu Zhen-Yuan. 2009: Chaos game representation (CGR)-walk model for DNA sequences, Chinese Physics B, 18(1): 370-376.

Chaos game representation (CGR)-walk model for DNA sequences

  • Available Online: 30/01/2009
  • Fund Project: the National Natural Science Foundation of China(Grant 60575038)%the Natural Science Foundation of Jiangnan University, China(Grant 20070365)
  • Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.
  • 加载中
  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Article Metrics

Article views(67) PDF downloads(0) Cited by(0)

Access History

Chaos game representation (CGR)-walk model for DNA sequences

Abstract: Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.

Reference (0)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return