The UK Home Automation Archive

Archive Home
Group Home
Search Archive


Advanced Search

The UKHA-ARCHIVE IS CEASING OPERATIONS 31 DEC 2024


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: OT: Perl problem


  • To: ukha_d@xxxxxxx
  • Subject: RE: OT: Perl problem
  • From: Tony Butler <lists@xxxxxxx>
  • Date: Thu, 17 Jul 2003 13:18:36 +0100
  • Mailing-list: list ukha_d@xxxxxxx; contact ukha_d-owner@xxxxxxx
  • References: <AD8A0CAFA084D7118B2C00500465E03813859B@ZENO>
  • Reply-to: ukha_d@xxxxxxx

Quoting "Ward, David" <David.Ward@xxxxxxx>:
> Go on then I'll ask.  I'm dying to know what you are doing with the
data?

Helping my bro with his MSc [Not coz I like him, but coz I have to maintain
the
illusion that I know _everything_ and am thus vastly superior to him :)],
which
involves scanning the human genome for MicroRNAs (They interfere with the
normal DNA -> RNA -> Protein production,
by sticking to RNAs and preventing them from being translated into protein.
They are being looked @ as a possible new form of gene therapy)

He needs to read in the human genome (the 15Mb files was just a
"test" one -
the real one is 90Mb+!) looking for these little buggers and spitting out a
file formatted as described in the initial problem.

It gets worse though, so if u want to show even more perl skill, there is a
step previous to this:

As you may or may not know, most of the human genome appears to have no
function/unknown function/doesn't encode any genes. The main task is to
write a
program that can look through the non-coding parts of the genome for
MicroRNA,
but in order to do that, the progam needs to be able to tell coding from
non-
coding DNA.
What we have is a file containing the DNA sequence, and a file of "co-
ordinates" which tells you that there is a gene at position 2234-2595,
2699-
3050 etc.

The cunning plan is to replace all these genes with XXX's, or even just to
cut
them out of the file completely, so that the resulting file contains only
noncoding sequences, possibly with XXX's that can be ignored.

Said data is then fed into the MicroRNA identifying proggie, who's output
then
goes into the routines donated by Ant, Keiran et al earlier.

I will not tell u the current solution my bro is using, as it is too
embarassing to reveal in public :)
My own thought is that he should open the file for random read/write and
either
overwrite the valid sequences with X's, or just cut them out of the file
completely.
Again, is this possible in perl (I assume it must be) and how would one go
about it????

cheers guys,

Tony.


Home | Main Index | Thread Index

Comments to the Webmaster are always welcomed, please use this contact form . Note that as this site is a mailing list archive, the Webmaster has no control over the contents of the messages. Comments about message content should be directed to the relevant mailing list.