zoukankan      html  css  js  c++  java
  • 如何提取大序列文件的子集

    有时候需要比较大的序列文件里提取出它的子集,通常是根据序列号来操作的,上代码:

    #存入字典
    def read_to_dict(in_file):
        sequence = {}
        ac = ''
        seq = ''
        for line in open(in_file):
            if line.startswith('>') and seq != '':
                sequence[ac] = seq
                seq = ''
            if line.startswith('>'):
                ac = line.strip().split()[1] #处理序列key命名,视情况而定
            else:
                seq = seq + line.strip()
        sequence[ac] = seq
        return sequence
    
    #待提取序列号读取入列表
    def read_to_list(in_file2):
        list2 = []
        with open(in_file2, 'r', encoding='utf-8') as list1:
            for line in list1:
                list2.append(line.strip())
        return list2
    
    #提取
    other_features = read_to_dict('other_genomic.fasta')
    list = read_to_list('search_results.txt')
    handle = open("selected.txt", "w")
    for acc in list:
        if acc in other_features.keys():
            handle.write('>' + acc + '
    ' + other_features[acc] + '
    ')
        else:
            print(acc)
    handel.close()

    原文件:

    >ABC102 ABC102 SPXID:S000121252, Chr I from 608-776, Genome Release 55-0-1, ABC, “Autonomously Replicating Sequence”
    TATACACTTATGTCAATATTACAGAAAAATCCCCACAAAAATCACCTAAACATAAAAATATTCTACTTTT

    待提取序列:

    ABC301
    ABC305
    ABC306
    ABC901

    输出结果:

    >ABC301
    TATACACTTATGTCAATATTACAGAAAAATCCCCACAAAAATCACCTAAACATAAAAATATTCTACTTTT

    BioPython里也有类似的操作,但是输出效果不好,不如上面的灵活。

    >>> from Bio import SeqIO
    >>> uniprot = SeqIO.index("other_genomic.fasta", "fasta")
    >>> handle = open("selected.txt", "w")
    >>> for acc in ["ABC301", "ABC305", "ABC306", "ABC901", ... "ABC998"]:
    ...     handle.write(uniprot.get_raw(acc))
    >>> handle.close()


    原文链接:https://blog.csdn.net/liuninghua521/article/details/106780991

  • 相关阅读:
    图算法(一)——基本图算法(BFS,DFS及其应用)(1)
    设置drawable图片
    view的加载
    onSaveInstanceState和onRestoreInstanceState
    android中的生命周期(新增2个函数)
    int integer string间的转换
    andorid中Html.fromHtml方法
    android获取inflater
    android中获取string字符串的方法
    c#基础试题
  • 原文地址:https://www.cnblogs.com/LQZ888/p/13153724.html
Copyright © 2011-2022 走看看