el 2004-12-16 00:00:00 - Secciones: - Enlace permanente: 386Un script que he escrito en un momento para sacar las urls que han ido saliendo en la web. Además, si hay un texto en title lo usa como explicación del enlace.
#!/usr/bin/perl while (<STDIN>) { $linea=$_; while ($linea=~/(<a href="(http[^"]+?)" target="_blank" title="([^"]*?)">(.*?)<\/a>)/i ) { $url=$2; $titulo=$3; $texto=$4; if ( !$titulo ) { $titulo=$texto; } print "<li><a href=\"$url\" target=\"_blank\" title=\"$titulo\">$titulo</a>.\n"; $linea=~s/(<a href="(http[^"]+?)" target="_blank" title="([^"]*?)">(.*?)<\/a>)//i; } }
¿Como se podria hacer que de igual si el target está o no, o que no importe el orden?
![]() | Keith Amling (23/09/2005, 13:25) Correctly handling arbitrary HTML in a regex is a little difficult. The most robust way to handle this (at least with a regex) to first check against /<a ([^>]*)>/ and then check the return from that against /title="([^"]*)"/ and /href="([^"]*)"/ separately to parse title and URL. Extracting links from HTML is a hard problem and the correct (although slow) answer is to use HTML::Parser. I apologize for English. I can pretend to read Spanish, but not to write it. |
![]() | Saiyine (24/09/2005, 00:51) Hi! Yeah, the truth is that this is just a quick'n'dirty hack I can't remember right now why I wrote. Maybe for a module for this web in its pre .com era? I don't know. You're absolutely correct, the best way is to reuse the code someone, better programmer than I, wrote and put in the HTML::Parser, but it's simply that I don't like to use external libraries for this kind of things. I apologize for my English too! ![]() ![]() |
Saiyine recommends the easiest way to earn money with your web: get paid just by having some links! Click this button to check it out.

Varios ejemplos de lo que buscaban visitantes recientes: