[Date Prev][Date
Next][Thread Prev][Thread Next][Date
Index][Thread Index]
RE: OT: Perl problem
- To: ukha_d@xxxxxxx
- Subject: RE: OT: Perl problem
- From: Tony Butler <lists@xxxxxxx>
- Date: Thu, 17 Jul 2003 13:18:36 +0100
- Mailing-list: list ukha_d@xxxxxxx; contact
ukha_d-owner@xxxxxxx
- References:
<AD8A0CAFA084D7118B2C00500465E03813859B@ZENO>
- Reply-to: ukha_d@xxxxxxx
Quoting "Ward, David" <David.Ward@xxxxxxx>:
> Go on then I'll ask. I'm dying to know what you are doing with the
data?
Helping my bro with his MSc [Not coz I like him, but coz I have to maintain
the
illusion that I know _everything_ and am thus vastly superior to him :)],
which
involves scanning the human genome for MicroRNAs (They interfere with the
normal DNA -> RNA -> Protein production,
by sticking to RNAs and preventing them from being translated into protein.
They are being looked @ as a possible new form of gene therapy)
He needs to read in the human genome (the 15Mb files was just a
"test" one -
the real one is 90Mb+!) looking for these little buggers and spitting out a
file formatted as described in the initial problem.
It gets worse though, so if u want to show even more perl skill, there is a
step previous to this:
As you may or may not know, most of the human genome appears to have no
function/unknown function/doesn't encode any genes. The main task is to
write a
program that can look through the non-coding parts of the genome for
MicroRNA,
but in order to do that, the progam needs to be able to tell coding from
non-
coding DNA.
What we have is a file containing the DNA sequence, and a file of "co-
ordinates" which tells you that there is a gene at position 2234-2595,
2699-
3050 etc.
The cunning plan is to replace all these genes with XXX's, or even just to
cut
them out of the file completely, so that the resulting file contains only
noncoding sequences, possibly with XXX's that can be ignored.
Said data is then fed into the MicroRNA identifying proggie, who's output
then
goes into the routines donated by Ant, Keiran et al earlier.
I will not tell u the current solution my bro is using, as it is too
embarassing to reveal in public :)
My own thought is that he should open the file for random read/write and
either
overwrite the valid sequences with X's, or just cut them out of the file
completely.
Again, is this possible in perl (I assume it must be) and how would one go
about it????
cheers guys,
Tony.
Home |
Main Index |
Thread Index
|