Provides Tutorials on Web Technologies.

Jsoup WhiteList : Sanitizing HTML Input

 


  • Jsoup has many features like parsing html document, searching inside Dom, manipulating dom element, cleaning the output with the help of jtidy.
  • Jsoup provide whitelist feature for the sanitizing/cleaning the html.
  • Whitelist allows what are the features that are passed to cleaning and others are discarded.
  • Download Link of jar (jsoup-1.7.1.jar) :-
                                           http://jsoup.org/download

Methods In WhiteList:-

  • addAttributesString (tag,String... keys) : Allows  the tag and its listed attributes.
  • none  :  Only text nodes are allowed.Other types are removed.
  • simpleText   : Allows only these tags b, em, i, strong, u.
  • basic :a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li, ol, p, pre, q, small, strike, strong, sub, sup, u, ul, http, https, ftp, mailto, rel=nofollow
  • basicWithImages :    Like basic plus image.
  • relaxed: Allows full text and html body elements.
  • addProtocols(tag, key,String... protocol):Allows element with attributes and list of values for this attribute.
  • preserveRelativeLinks(boolean flag) : True preserves relative links.

Project Structure:-



Testing WhiteList :-

  • The JSoupWhiteListDemo.java file are,
package com.sandeep.jsoup.whitelist;

import org.jsoup.Jsoup;
import org.jsoup.safety.Whitelist;

public class JSoupWhiteListDemo {
 
 public static void  main(String [] args){
  
  String inputString ="<title>My Page</title><ul><li><em>Sandeep</em></li><li><em>Surabhi</em></li>
<li><img src='mySnamp.png'></li><li><a href='https:\\loremipsumdollar.com'>click me</a></li></ul>";
  
  /*for simpleText method*/
  String outputString = Jsoup.clean(inputString, Whitelist.simpleText());
  System.out.println("SIMPLETEXT OUTPUT : " + outputString);
  
  /*for basic method*/
  outputString = Jsoup.clean(inputString, Whitelist.basic());
  System.out.println("BASIC OUTPUT : " + outputString);
  
  /*for basicWithImages method*/
  outputString = Jsoup.clean(inputString, Whitelist.basicWithImages());
  System.out.println("BASICWITHIMAGES OUTPUT : " + outputString);
  
  /*for none method*/
  outputString = Jsoup.clean(inputString, Whitelist.none());
  System.out.println("NONE OUTPUT : " + outputString);
  
  /*for relaxed method*/
  outputString = Jsoup.clean(inputString, Whitelist.relaxed());
  System.out.println("RELAXED OUTPUT : " + outputString);
  
  /*for addAtribute method*/
  Whitelist customwhitelist1 = new Whitelist();
  customwhitelist1.addAttributes("img", "src");
  outputString = Jsoup.clean(inputString, customwhitelist1);
  System.out.println("ADDATRIBUTE OUTPUT : " + outputString);
  
  
  /*for addProtocols method*/
  Whitelist customwhitelist2 = new Whitelist();
  customwhitelist1.addProtocols("a", "href", "ftp", "http");
  outputString = Jsoup.clean(inputString, customwhitelist2);
  System.out.println("addProtocols OUTPUT : " + outputString);
  

 }

}
  

Output:-

SIMPLETEXT OUTPUT : My Page
<em>Sandeep</em>
<em>Surabhi</em>click me
BASIC OUTPUT : My Page
<ul>
 <li><em>Sandeep</em></li>
 <li><em>Surabhi</em></li>
 <li></li>
 <li><a href="https:\loremipsumdollar.com" rel="nofollow">click me</a></li>
</ul>
BASICWITHIMAGES OUTPUT : My Page
<ul>
 <li><em>Sandeep</em></li>
 <li><em>Surabhi</em></li>
 <li><img /></li>
 <li><a href="https:\loremipsumdollar.com" rel="nofollow">click me</a></li>
</ul>
NONE OUTPUT : My PageSandeepSurabhiclick me
RELAXED OUTPUT : My Page
<ul>
 <li><em>Sandeep</em></li>
 <li><em>Surabhi</em></li>
 <li><img /></li>
 <li><a href="https:\loremipsumdollar.com">click me</a></li>
</ul>
ADDATRIBUTE OUTPUT : My PageSandeepSurabhi
<img src="mySnamp.png" />click me
addProtocols OUTPUT : My PageSandeepSurabhiclick me