'''parseando xml/xhtml''' '''BeautifulSoup vamos ver como funciona:''' licença, download, documentação, créditos etc... http://www.crummy.com/software/BeautifulSoup/ parseando xhtml seja o arquivo teste_parser.html {{{ teste do parser
linha1 celula1linha1 celula2
linha2 celula1linha2 celula2
linha3 celula1linha3 celula2
}}} '''no idle''' {{{ Python 2.4.1 (#2, May 5 2005, 11:32:06) [GCC 3.3.5 (Debian 1:3.3.5-12)] on linux2 Type "copyright", "credits" or "license()" for more information. **************************************************************** Personal firewall software may warn about the connection IDLE makes to its subprocess using this computer's internal loopback interface. This connection is not visible on any external interface and no data is sent to or received from the Internet. **************************************************************** IDLE 1.1.1 >>> from BeautifulSoup import BeautifulSoup >>> arq=file('teste_parser.html') >>> tree=BeautifulSoup(arq.read()) >>> tree('title') [teste do parser] >>> tree('title')[0] teste do parser >>> tree('title')[0].string 'teste do parser' >>> len(tree('table')[0]('td')) 6 >>> it=len(tree('table')[0]('td')) >>> it 6 >>> for i in range(it): print tree('table')[0]('td')[i].string linha1 celula1 linha1 celula2 linha2 celula1 linha2 celula2 linha3 celula1 linha3 celula2 >>> #explorando atributos >>> tree('table')[0]['id'] 'table1' >>> #explorando um
>>> for i in range(len(tree('form'))): print tree('form')[i]
>>> >>> for i in range(len(tree('form'))): for j in range(len(tree('form')[i]('select'))): for k in range(len(tree('form')[i]('select')[j]('option'))): print tree('form')[i]('select')[j]('option')[k]['value'] 1 2 3 >>>#é claro que cabe uma 'refatoração' na loucura acima >>>#alterando um atributo >>> tree('form')[0]['method'] 'post' >>> tree('form')[0]['method']='get' >>> tree('form')[0]['method'] 'get' >>>#inserindo um atributo >>> tree('form')[0]['enctype']='multipart/form-data' >>> print tree('form')[0]
>>>#potência!!!! como diz o Bidú (meu cunhado) >>> >>>#Luiz Antonio de Campos >>>#no módulo BeautifulSoup existem outras 3 subclasses - verifique na URL acima >>>#tente também com xhtml malformado (tipo xxxxxxxxxxxxxx) ele vai acertar se >>>#a malformação não for muito 'porca' }}}