[Date Prev][Date
Next][Thread Prev][Thread Next][Date
Index][Thread Index]
RE: Re: OT: Javascript Reg's -> ASP.NET to clean MSWord HTML
I guess I would have done something similar...
Private Function CleanHtml(ByVal strHtml As String) As String
strHtml = strHtml.Replace("/<\?xml[^>]*>/ig",
"")
strHtml = strHtml.Replace("/<\/?[a-z]+:[^>]*>/ig",
"")
strHtml =
strHtml.Replace("/<span[^>]*><\/span[^>]*>/ig",
"")
strHtml = strHtml.Replace("/<span><span>/ig",
"<span>")
strHtml = strHtml.Replace("/<\/span><\/span>/ig",
"</span>")
Return strHtml
End Function
What is not working? Is the JS actually looking for
"/<\?xml[^>]*>/ig" or
is that some Pattern Matching Syntax used in JavaScript? I ask, because
the
first argument in the JavaScript "Replace" command had no quotes
around it
in your example...
The more that I look at it, the more the first arguments look like perl
pattern constructs, and not literal strings. For example, the
"[a-z]" in
the 2nd replace command...
OK, I've just done a little research, and found out some more info.
I am going to use the first "replace string" as an example:
/<\?xml[^>]*>/ig
This says, in English:
"Look for a string that starts with '<?xml', then has any number of
additional characters that are not '>' and then has '>' at the end.
Make
this search Case Insensitively, and use Global Pattern Matching."
So to duplicate this, I would replace the first "replace" with:
Dim intEnd As Integer
Dim intStart as integer
intStart = InStr(strHtml, "<?xml")
Do While intStart>0
intEnd = InStr(intStart + 5, strHtml, ">")
If intEnd > intStart + 5 Then
strHtml.Replace(strHtml.Substring(intStart, intEnd - intStart + 1),
"")
End If
intStart = InStr(strHtml, "<?xml")
Loop
And so on... Essentially, For as long as it can find "<?xml"
in the string,
it will look for the right pattern and remove it.
Tedious, and a good reason why people go to JavaScript or Perl for heavy
text processing, but it should work fine.
For info about the pattern matching in JavaScript, I looked at:
http://tinyurl.com/227l4
Hope this helps, and sorry to bore you all,
Glenn Sullivan, MCSE+I MCDBA
David Clark Company Inc.
-----Original Message-----
From: Chris Bond [mailto:chris@xxxxxxx]
Sent: Wednesday, January 28, 2004 12:03 PM
To: ukha_d@yahoogroups.com
Subject: [ukha_d] Re: OT: Javascript Reg's -> ASP.NET to clean MSWord
HTML
> What does your VB.Net code look like now?
>
> If you are just looking for "/<\?xml[^>]*>/ig" and
replacing it
with "",
> then this should work, but it looks like the js replace routine
uses
> Perl-like parsing syntax, which I'm not familiar with...
Private Function CleanHtml(ByVal strHtml As String) As String
Dim strTemp As String
strTemp = Regex.Replace(strHtml, "/<\?xml[^>]*>/ig",
"")
strTemp = Regex.Replace(strTemp,
"/<\/?[a-z]+:[^>]*>/ig", "")
'// Missing 1
'// Missing 2
strTemp = Regex.Replace(strTemp,
"/<span[^>]*><\/span[^>]
*>/ig", "")
strTemp = Regex.Replace(strTemp,
"/<span[^>]*><\/span[^>]
*>/ig", "")
strTemp = Regex.Replace
(strTemp, "/<span><span>/ig",
"<span>")
strTemp = Regex.Replace
(strTemp, "/<\/span><\/span>/ig",
"</span>")
Return strTemp
End Function
Oblessly that isnt working properly though.
UK Home Automation Meet 2004 - BOOK NOW!
http://www.ukha2004.com
http://www.automatedhome.co.uk
Post message: ukha_d@yahoogroups.com
Subscribe: ukha_d-subscribe@yahoogroups.com
Unsubscribe: ukha_d-unsubscribe@yahoogroups.com
List owner: ukha_d-owner@yahoogroups.com
Home |
Main Index |
Thread Index
|