– Using table-valued functions in SQL convert HTML to plaintext
Execute this code in SQL
CREATE FUNCTION [dbo].[Html_To_Plaintext_ITVF] (@HTMLText nvarchar(max), @linefeed nvarchar(10)) RETURNS TABLE AS RETURN WITH cteTagsToReplaceWithLF AS ( SELECT a.tag , ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS R FROM ( VALUES ('</p>'), ('</tr>'), ('<br>'), ('<br />'), ('</li>'),(' ') ) a(tag) ) , cteWithLineFeeds(n, Html) AS ( SELECT 1, REPLACE(@HTMLText, cteTags.tag, @linefeed) FROM cteTagsToReplaceWithLF cteTags WHERE cteTags.R = 1 UNION ALL SELECT n + 1, REPLACE(ctelf.Html, cteTags.tag, @linefeed) FROM cteWithLineFeeds ctelf JOIN cteTagsToReplaceWithLF cteTags ON ctelf.n = cteTags.R ) , cteHtml (i, HtmlText) AS ( SELECT TOP 1 0, Html FROM cteWithLineFeeds ORDER BY n DESC UNION ALL SELECT i + 1, CONVERT(nvarchar(MAX), STUFF(HtmlText, CHARINDEX(N'<', HtmlText), CHARINDEX(N'>', HtmlText, CHARINDEX(N'<', HtmlText)) - CHARINDEX(N'<', HtmlText) + 1, '')) FROM cteHtml WHERE CHARINDEX('<', HtmlText) > 0 AND CHARINDEX('>', HtmlText, CHARINDEX('<', HtmlText)) > 0 AND CHARINDEX('>', HtmlText, CHARINDEX('<', HtmlText)) - CHARINDEX('<', HtmlText) > 0 ) SELECT TOP 1 LTRIM(RTRIM(HtmlText)) AS PlainText FROM cteHtml ORDER BY i DESC Go
Let’s take a look at what’s going on. The first thing we see is a CTE with the name cteTagsToReplaceWithLF.
SELECT a.Tagtoreplace, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS Row FROM ( VALUES ('</p>'), ('</tr>'), ('<br>'), ('<br />'), ('</li>'),(' ') ) a(Tagtoreplace)
You can just execute this on any database to get a two-column result. The first column has a sequential number, and the second column has one of the tags that we defined. These are the tags that will be replaced by linefeeds.
The second CTE on the list replaces our HTML tags with linefeed characters.
Now Execute this function:
select * from Html_To_Plaintext_ITVF('<p><strong>Fruits</strong></p><ul><li><strong>Apple</strong></li><li><strong>Mango</strong></li><li><strong>Cherry</strong></li> </ul> <p> </p> <p> </p> <ol> <li><strong>Veggies</strong></li> <li><strong>Potatoes</strong></li> </ol>','')
Result :
– Using a User-Defined Function, Convert HTML to PlainText
The following User Defined Function accepts HTML input and returns only TEXT. Before passing as input to function, all single quotes in HTML should be replaced with two single quotes (not double quotes).
CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX)) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @Start INT DECLARE @End INT DECLARE @Length INT SET @Start = CHARINDEX('<',@HTMLText) SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText)) SET @Length = (@End - @Start) + 1 WHILE @Start > 0 AND @End > 0 AND @Length > 0 BEGIN SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'') SET @Start = CHARINDEX('<',@HTMLText) SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText)) SET @Length = (@End - @Start) + 1 END RETURN LTRIM(RTRIM(@HTMLText)) END GO
Execute the above function as follows:
SELECT dbo.udf_StripHTML('<b>The CodeHubs </b> <a href="http://thecodehubs.com">Thecodehubs.com</a>') as [PlainText]