supamas: 2010

วันพฤหัสบดีที่ 21 มกราคม พ.ศ. 2553

Assignment 6

Select one of your interesting sequence from the database (sequence should be longer than 300 base pair) to do the BLAST search and answer the following questions:
a. What are the different between 6 BLASTs (blastn, blastp, blastx, tblastn, tblastx, PSI-BLAST)?
b. Use your sequence to do 3 out of 6 BLASTs and discuss " What' s the strength and weakness of BLAST you have selected?"
c. Show us the first hit on each BLAST with their identity or/and similarity scores.
d. Summarize the result from 3 BLASTs you select.

BLAST, or Basic Local Alignment Search Tool, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequence of different proteins or the nucleotides of DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library of database of sequences, and identify library sequences that resemble the query sequence above a certain threshold. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence. BLAST is actually a family of programs such as >>
1. blastn (Nucleotide-nucleotide BLAST)
This program, given a DNA query, returns the most similar DNA sequences from the DNA database that the user specifies.
2. blastp (Protein-protein BLAST)
This program, given a protein query, returns the most similar protein sequences from the protein database that the user specifies.
3.blastx (Nucleotide 6-frame translation-protein)
This program compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database
4.tblastn (Protein-nucleotide 6-frame translation)
This program compares a protein query against the all six reading frames of a nucleotide sequence database.
5. tblastx ( Nucleotide 6-frame translation-nucleotide 6-frame translation)
This program is the slowest of the BLAST family. It translates the query nucleotide sequence in all six possible frames and compares it against the six-frame translations of a nucleotide sequence database. The purpose of tblastx is to find very distant relationships between nucleotide sequences.
6. PSI-BLAST (Position-Specific Iterative BLAST)
This program is used to find distant relatives of a protein. First, a list of all closely related proteins is created. These proteins are combined into a general “profile” sequence, which summarises significant features present in these sequences. A query against the protein database is then run using this profile, and a larger group of proteins is found. This larger group is used to construct another profile, and the process is repeated. By including related proteins in the search, PSI-BLAST is much more sensitive in picking up distant evolutionary relationships than a standard protein-protein BLAST.

ทำการเปรียบเทียบจุดอ่อน จุดแข็งของ blast แต่ละตัวในที่นี้เลือก blastp, psi-blast และ tblastn โดยใช้ sequence ชนิดเดียวกันคือ
gi6552299 refNP_009225.1 breast cancer 1, early onset isoform 1 [Homo sapiens] 1863 aa
MDLSALRVEE VQNVINAMQK ILECPICLEL IKEPVSTKCD HIFCKFCMLK LLNQKKGPSQ
CPLCKNDITK RSLQESTRFS QLVEELLKII CAFQLDTGLE YANSYNFAKK ENNSPEHLKD
EVSIIQSMGY RNRAKRLLQS EPENPSLQET SLSVQLSNLG TVRTLRTKQR IQPQKTSVYI
ELGSDSSEDT VNKATYCSVG DQELLQITPQ GTRDEISLDS AKKAACEFSE TDVTNTEHHQ
PSNNDLNTTE KRAAERHPEK YQGSSVSNLH VEPCGTNTHA SSLQHENSSL LLTKDRMNVE
KAEFCNKSKQ PGLARSQHNR WAGSKETCND RRTPSTEKKV DLNADPLCER KEWNKQKLPC
SENPRDTEDV PWITLNSSIQ KVNEWFSRSD ELLGSDDSHD GESESNAKVA DVLDVLNEVD
EYSGSSEKID LLASDPHEAL ICKSERVHSK SVESNIEDKI FGKTYRKKAS LPNLSHVTEN
LIIGAFVTEP QIIQERPLTN KLKRKRRPTS GLHPEDFIKK ADLAVQKTPE MINQGTNQTE
QNGQVMNITN SGHENKTKGD SIQNEKNPNP IESLEKESAF KTKAEPISSS ISNMELELNI
HNSKAPKKNR LRRKSSTRHI HALELVVSRN LSPPNCTELQ IDSCSSSEEI KKKKYNQMPV
RHSRNLQLME GKEPATGAKK SNKPNEQTSK RHDSDTFPEL KLTNAPGSFT KCSNTSELKE
FVNPSLPREE KEEKLETVKV SNNAEDPKDL MLSGERVLQT ERSVESSSIS LVPGTDYGTQ
ESISLLEVST LGKAKTEPNK CVSQCAAFEN PKGLIHGCSK DNRNDTEGFK YPLGHEVNHS
RETSIEMEES ELDAQYLQNT FKVSKRQSFA PFSNPGNAEE ECATFSAHSG SLKKQSPKVT
FECEQKEENQ GKNESNIKPV QTVNITAGFP VVGQKDKPVD NAKCSIKGGS RFCLSSQFRG
NETGLITPNK HGLLQNPYRI PPLFPIKSFV KTKCKKNLLE ENFEEHSMSP EREMGNENIP
STVSTISRNN IRENVFKEAS SSNINEVGSS TNEVGSSINE IGSSDENIQA ELGRNRGPKL
NAMLRLGVLQ PEVYKQSLPG SNCKHPEIKK QEYEEVVQTV NTDFSPYLIS DNLEQPMGSS
HASQVCSETP DDLLDDGEIK EDTSFAENDI KESSAVFSKS VQKGELSRSP SPFTHTHLAQ
GYRRGAKKLE SSEENLSSED EELPCFQHLL FGKVNNIPSQ STRHSTVATE CLSKNTEENL
LSLKNSLNDC SNQVILAKAS QEHHLSEETK CSASLFSSQC SELEDLTANT NTQDPFLIGS
SKQMRHQSES QGVGLSDKEL VSDDEERGTG LEENNQEEQS MDSNLGEAAS GCESETSVSE
DCSGLSSQSD ILTTQQRDTM QHNLIKLQQE MAELEAVLEQ HGSQPSNSYP SIISDSSALE
DLRNPEQSTS EKAVLTSQKS SEYPISQNPE GLSADKFEVS ADSSTSKNKE PGVERSSPSK
CPSLDDRWYM HSCSGSLQNR NYPSQEELIK VVDVEEQQLE ESGPHDLTET SYLPRQDLEG
TPYLESGISL FSDDPESDPS EDRAPESARV GNIPSSTSAL KVPQLKVAES AQSPAAAHTT
DTAGYNAMEE SVSREKPELT ASTERVNKRM SMVVSGLTPE EFMLVYKFAR KHHITLTNLI
TEETTHVVMK TDAEFVCERT LKYFLGIAGG KWVVSYFWVT QSIKERKMLN EHDFEVRGDV
VNGRNHQGPK RARESQDRKI FRGLEICCYG PFTNMPTDQL EWMVQLCGAS VVKELSSFTL
GTGVHPIVVV QPDAWTEDNG FHAIGQMCEA PVVTREWVLD SVALYQCQEL DTYLIPQIPH
SHY

>>> 1. ข้อมูลที่ได้จากการ blastp (protein-protein blast)

>refNP_009225.1 breast cancer 1, early onset isoform 1 [Homo sapiens]
spP38398.2BRCA1_HUMAN RecName: Full=Breast cancer type 1 susceptibility protein; AltName:
Full=RING finger protein 53
gbAAA73985.1 breast and ovarian cancer susceptibility [Homo sapiens]
9 more sequence titles
gbAAC37594.1 BRCA1 [Homo sapiens]
gbAAP12647.1 breast cancer 1, early onset [Homo sapiens]
gbABA29208.1 breast cancer 1 early onset [Homo sapiens]
gbABA29211.1 breast cancer 1 early onset [Homo sapiens]
gbABA29214.1 breast cancer 1 early onset [Homo sapiens]
gbABA29220.1 breast cancer 1 early onset [Homo sapiens]
gbABA29223.1 breast cancer 1 early onset [Homo sapiens]
gbABA29226.1 breast cancer 1 early onset [Homo sapiens]
dbjBAG10985.1 breast cancer type 1 susceptibility protein [synthetic construct]
Length=1863

GENE ID: 672 BRCA1 breast cancer 1, early onset [Homo sapiens]
(Over 100 PubMed links)

Score = 3844 bits (9969), Expect = 0.0, Method: Compositional matrix adjust.
Identities = 1863/1863 (100%), Positives = 1863/1863 (100%), Gaps = 0/1863 (0%)

>>> 2. ข้อมูลที่ได้จากการ psi-blast (Position-specific lterated BLAST)

>refNP_009225.1 breast cancer 1, early onset isoform 1 [Homo sapiens]
spP38398.2BRCA1_HUMAN RecName: Full=Breast cancer type 1 susceptibility protein; AltName:
Full=RING finger protein 53
gbAAA73985.1 breast and ovarian cancer susceptibility [Homo sapiens]
9 more sequence titles
gbAAC37594.1 BRCA1 [Homo sapiens]
gbAAP12647.1 breast cancer 1, early onset [Homo sapiens]
gbABA29208.1 breast cancer 1 early onset [Homo sapiens]
gbABA29211.1 breast cancer 1 early onset [Homo sapiens]
gbABA29214.1 breast cancer 1 early onset [Homo sapiens]
gbABA29220.1 breast cancer 1 early onset [Homo sapiens]
gbABA29223.1 breast cancer 1 early onset [Homo sapiens]
gbABA29226.1 breast cancer 1 early onset [Homo sapiens]
dbjBAG10985.1 breast cancer type 1 susceptibility protein [synthetic construct]
Length=1863

GENE ID: 672 BRCA1 breast cancer 1, early onset [Homo sapiens]
(Over 100 PubMed links)

Score = 3844 bits (9969), Expect = 0.0, Method: Compositional matrix adjust.
Identities = 1863/1863 (100%), Positives = 1863/1863 (100%), Gaps = 0/1863 (0%)

>>> 3. ข้อมูลที่ได้จากการ tblastn

>dbjAB385129.1 Synthetic construct DNA, clone: pF1KB5593, Homo sapiens BRCA1
gene for breast cancer type 1 susceptibility protein, complete
cds, without stop codon, in Flexi system
Length=5606

Score = 3578 bits (9279), Expect = 0.0, Method: Compositional matrix adjust.
Identities = 1863/1863 (100%), Positives = 1863/1863 (100%), Gaps = 0/1863 (0%)
Frame = +1

สำหรับความเหมือนของโปรแกรม BLASTs ทั้ง 3 รูปแบบที่เลือกมาใช้ทดสอบ สามารถ input sequence ที่เป็น protein code เข้าไปได้และให้ผลลัพท์ออกมาคือ blastp และ psi-blast มีค่า score ที่เท่ากันคือ 3844 bits(9969)และค่า E value = 0.0 กับค่าความยาวเท่ากับ 1863 ยกเว้นค่าของ score ของโปรแกรม tblastn ซึ่งพบว่ามีค่าเท่ากับ 3578 bits(9279)มีความยาวเท่ากับ 5606 มีค่า frame = +1 ซึ่งจากข้อมูลที่ได้สรุปได้ว่าแม้ว่าจะเป็นข้อมูลเดียวกัน (amino acids) แต่หากใช้โปรแกรมที่มีวัตถุประสงค์ในการเปรียบเทียบต่างกันออกไป (blastp, psi-blast VS tblastn) จะทำให้ได้ผลลัพธ์ที่ต่างกัน แต่เป็นที่น่าสังเกตว่าโปรแกรม blastp, psi-blast และ phi-blast ทั้ง 3 แบบซึ่งอยู่ในหมวดของการเปรียบเทียบของโปรตีน แต่มี algorithms ในการใช้คำนวณหาแตกต่างกัน โดย blastp ใช้ algorithms แบบเดียวกับ blastn คือกำหนดให้ค้นหาความยาว word เท่ากับ W ซึ่งใน blastp กำหนดให้เท่ากับ 3 และมี score น้อยกว่าค่า threshold ที่กำหนดเพื่อใช้ในการเปรียบเทียบ ส่วน psi-blast ใช้ algorithms แบบ PSSM (Position-specific scoring matrix) ขณะที่ phi-blast ใช้ algorithms แบบ Pattern Hit Initiated

วันเสาร์ที่ 16 มกราคม พ.ศ. 2553

Assignment 5

Please use the bioinformatics tools to design these following items;
1. The real-time PCR primer and probe set(s) which can be used to distinguish between 2009 Swine-Origin Influenza A (H1N1)from other influenza subtypes.Please also describe what are gene(s)/region(s) that you choose? And give us the reason why?
2. The conventional PCR and sequencing primer set which can be used to identify oseltamivir resistance associated NA gene mutations: N1: H274Y.
Note:a) Please show the size of PCR product and locate the position of PCR, probe, and sequencing primer used in #1 and #2b) What kind/type/name of the programs that you use for #1 and #2?
Hence:a) Algorithm used to design PCR primer, real-time PCR primer, and sequencing primer are different.

ตอบคำถามข้อ 1
จากคำถามข้อ 1 เลือกใช้ gene ของ Hemagglutinin หรือ HA เนื่องจากเป็น protein ที่แสดงออกบน envelope ของ virus โดย HA ของ Swine Influenza คือ H1
ขั้นตอนการทำ
1. หา mRNA sequence และ amino acid sequnce ของ hemagglutinin (HA) ของ swine influenza จากเว็บไซต์ http://www.ncbi.nlm.nih.gov/ เลือกค้นหา Nuecleotide ของ Influenza A virus (2009(H1N1)) segment 4 hemagglutinin (HA) gene
2. หลังจากนั้นเลือก mRNA (ซึ่งมีข้อมูลของ mRNA จำนวน 39 ข้อมูล) แล้วเลือกเป็นของ Swine ปี 2009 จำนวน 2 อัน และอื่นๆ 3 อัน โดยมี GenBank accession number ดังนี้
ปี 2009
เลือก 1. GQ149662.1>> Influenza A virus (A/Mexico/4108/2009(H1N1)) segment 4 hemagglutinin (HA) gene, complete cds
2. GQ150342.1 >> Influenza A virus (A/Nonthaburi/102/2009(H1N1)) segment 4 hemagglutinin (HA) gene, complete cds
3. GQ117100.2 >> Influenza A virus (A/Ohio/07/2009(H1N1)) segment 4 hemagglutinin (HA) gene, complete cds
3. เข้าไปที่เว็บไซด์ http://www.ebi.ac.uk/Tools/clustalw2/index.html นำ FASTA ของ nucleotideใส่ลงในหน้าช่องว่าง แล้วกด run
4. จากการทำ multiple alignment พบว่ามีลำดับเบสตำแหน่งที่ 1-15 หายไปใน 1 และ 3 ส่วน 2 ที่เลือกมามีตำแหน่งเบสที่ครบดูได้จากรูป

5. เลือก nucleotide sequence ตำแหน่งที่ 651-950 สำหรับออกแบบ real-time PCR primer และ probe โดยเข้าไปที่เว็บไซต์ http://frodo.wi.mit.edu/primer3/ นำ nucleotide sequence ตำแหน่งที่ 651-950 ใส่ในช่องว่าง หลังจากนั้นเลือก pick right, left primer และ pick hybridization probe >> แล้วคลิ๊กที่ pick primers แสดงดังรูป

6. จะได้ real-time PCR primer and probe set(s) ที่ต้องการดังรูป

ตอบคำถามในข้อ 2
1. หา nucleotide sequence ของ neuraminidase จาก GenBank โดยจะได้ GenBank accession number ดังนี้ swine-originated influenza neuraminidase: GU371257.1 oseltamivir resistance: GU371269.1
2. ทำการ alignment โดยเข้าไปที่เว็บไซต์ http://www.ebi.ac.uk/Tools/emboss/align/ ใส่ amino acid sequence ลงในช่องว่างทำการ alignment พบว่า ลำดับ amino acid ตำแหน่งที่ 274 H >> Y และตำแหน่ง amino acid ที่ 828 ดังรูป

3. ทำการเลือก sequence ให้ครอบคลุม amino acid ตำแหน่งที่ 828
4. เข้าไปที่เว็บไซด์http://frodo.wi.mit.edu/primer3/ นำ sequence ที่เลือกแล้วไปใส่ลงช่องว่าง เลือก pick right,left primer >> แล้วคลิ๊กที่ pick primer
5. จะได้ conventional PCR ดังรูป

6. กลับไปที่ข้อ 4 กำหนด ค่า tm ของ primer tm min,opt และ max ใหม่เป็น 45, 50, 55 ตามลำดับจะได้sequencing primer ดังรูป

วันพฤหัสบดีที่ 7 มกราคม พ.ศ. 2553

Assignment 4

The function of a protein a direct consequence of its 3-D structure (shape), the logical link was established Sequence >> Structure>>Function
It is now a central concept of molecular biology devoted bioinformatics. As a consequence, an increasing proportion of the bioinformatics pie is now devoted to the development of tools to navigate between sequences and 3-D structures. (This sqecialized area is called structural bioinformatics.)
Please use the following sequence to explain this concept.
ขั้นตอนการทำ
1. เข้าไปที่เว็บไซต์ http://www.expasy.org/tools/dna.html..

2. หลังจากนั้นใส่ unknown sequence แล้วกดคลิ๊ก TRANSLATE SEQUENCE

3. จะได้ผลการ translated จาก Sequence DNA>>Amino acid ได้ทั้งหมด 6 openreading frames (ORF) ให้เลือก ORF ที่ translated ไปเป็นสายโพลีเปปไทด์ที่ยาวที่สุด ในที่นี้คือ 5' 3' Frame 1