The UK Home Automation Archive

Archive Home
Group Home
Search Archive


Advanced Search

The UKHA-ARCHIVE IS CEASING OPERATIONS 31 DEC 2024


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Re: OT: Javascript Reg's -> ASP.NET to clean MSWord HTML



I guess I would have done something similar...

Private Function CleanHtml(ByVal strHtml As String) As String
strHtml = strHtml.Replace("/<\?xml[^>]*>/ig",
"")
strHtml = strHtml.Replace("/<\/?[a-z]+:[^>]*>/ig",
"")
strHtml =
strHtml.Replace("/<span[^>]*><\/span[^>]*>/ig",
"")
strHtml = strHtml.Replace("/<span><span>/ig",
"<span>")
strHtml = strHtml.Replace("/<\/span><\/span>/ig",
"</span>")
Return strHtml
End Function

What is not working?  Is the JS actually looking for
"/<\?xml[^>]*>/ig" or
is that some Pattern Matching Syntax used in JavaScript?  I ask, because
the
first argument in the JavaScript "Replace" command had no quotes
around it
in your example...

The more that I look at it, the more the first arguments look like perl
pattern constructs, and not literal strings.  For example, the
"[a-z]" in
the 2nd replace command...

OK, I've just done a little research, and found out some more info.

I am going to use the first "replace string" as an example:
/<\?xml[^>]*>/ig

This says, in English:
"Look for a string that starts with '<?xml', then has any number of
additional characters that are not '>' and then has '>' at the end. 
Make
this search Case Insensitively, and use Global Pattern Matching."

So to duplicate this, I would replace the first "replace" with:

Dim intEnd As Integer
Dim intStart as integer

intStart = InStr(strHtml, "<?xml")
Do While intStart>0
intEnd = InStr(intStart + 5, strHtml, ">")
If intEnd > intStart + 5 Then
strHtml.Replace(strHtml.Substring(intStart, intEnd - intStart + 1),
"")
End If
intStart = InStr(strHtml, "<?xml")
Loop


And so on... Essentially, For as long as it can find "<?xml"
in the string,
it will look for the right pattern and remove it.

Tedious, and a good reason why people go to JavaScript or Perl for heavy
text processing, but it should work fine.

For info about the pattern matching in JavaScript, I looked at:
http://tinyurl.com/227l4

Hope this helps, and sorry to bore you all,

Glenn Sullivan, MCSE+I  MCDBA
David Clark Company Inc.

-----Original Message-----
From: Chris Bond [mailto:chris@xxxxxxx]
Sent: Wednesday, January 28, 2004 12:03 PM
To: ukha_d@yahoogroups.com
Subject: [ukha_d] Re: OT: Javascript Reg's -> ASP.NET to clean MSWord
HTML


> What does your VB.Net code look like now?
>
> If you are just looking for "/<\?xml[^>]*>/ig" and
replacing it
with "",
> then this should work, but it looks like the  js replace routine
uses
> Perl-like parsing syntax, which I'm not familiar with...


Private Function CleanHtml(ByVal strHtml As String) As String
Dim strTemp As String
strTemp = Regex.Replace(strHtml, "/<\?xml[^>]*>/ig",
"")
strTemp = Regex.Replace(strTemp,
"/<\/?[a-z]+:[^>]*>/ig", "")
'// Missing 1
'// Missing 2
strTemp = Regex.Replace(strTemp,
"/<span[^>]*><\/span[^>]
*>/ig", "")
strTemp = Regex.Replace(strTemp,
"/<span[^>]*><\/span[^>]
*>/ig", "")
strTemp = Regex.Replace
(strTemp, "/<span><span>/ig",
"<span>")
strTemp = Regex.Replace
(strTemp, "/<\/span><\/span>/ig",
"</span>")
Return strTemp
End Function

Oblessly that isnt working properly though.


UK Home Automation Meet 2004 - BOOK NOW!
http://www.ukha2004.com

http://www.automatedhome.co.uk
Post message: ukha_d@yahoogroups.com
Subscribe:  ukha_d-subscribe@yahoogroups.com
Unsubscribe:  ukha_d-unsubscribe@yahoogroups.com
List owner:  ukha_d-owner@yahoogroups.com



Home | Main Index | Thread Index

Comments to the Webmaster are always welcomed, please use this contact form . Note that as this site is a mailing list archive, the Webmaster has no control over the contents of the messages. Comments about message content should be directed to the relevant mailing list.