[an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] (none) [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] (none) [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive]
 
[an error occurred while processing this directive] [an error occurred while processing this directive]
Skåne Sjælland Linux User Group - http://www.sslug.dk Home   Subscribe   Mail Archive   Forum   Calendar   Search
MhonArc Date: [Date Prev] [Date Index] [Date Next]   Thread: [Date Prev] [Thread Index] [Date Next]   MhonArc
 

Re: [PROGRAMMERING] Udtræk af url'er fra ASCII fil



On 14/07/2013, at 01.37, Jens Bang <sslug@sslug> wrote:

> Jeg har en ret stor ASCII fil, som bl.a. indeholder en hel del url'er. Formatet af filen er at hver url er prefixet af entel
> 	"url":
> eller
> 	"referrer":
> umiddelbart derefter kommer url'en, omgivet af dobbeltplinger. F.eks.:
> 	"url":"http://www.google.com";
> 
> Min programmeringserfaring ligger i C, C++ og Pascal. Og selvom jeg da sagtens kan skrive et program i et af disse sprog, der kan trække url'erne ud af filen, så må der være nemmere og hurtigere løsninger. Hvad vil I foreslå? Og hvor finder jeg info om hvordan jeg gør det?

Undskyld hvis denne mail kommer 2 gange, jeg havde noget bøvl med en gammel "mail.tele.dk" adresse, og en ny(ere) gmail.com adresse.

Diverse "script" sprog burde da være oplagte til det, men jeg er sikker på at man nok også kan skrue en kommandolinie sammen med sed/awk, f.eks. noget i stil med 
	
	cat urltest.txt | sed 's/\"//g' | awk -F: '{for(i=2;i<=$NF;i++) print $i" "}'

men umiddelbart burde følgende python nok kunne klare det

https://gist.github.com/jinie/5992705

==============================
#!/usr/bin/env python
import re
import sys

ex = re.compile("\"(url|referer)\"\:\"(.*)\"")
with open(sys.argv[1]) as f:
   for line in iter(f.readline,""):
       m = ex.search(line)
       print(m.group(2))
==============================

/Jimmy


 
Home   Subscribe   Mail Archive   Index   Calendar   Search

 
 
Questions about the web-pages to <www_admin>. Last modified 2013-08-01, 02:05 CEST [an error occurred while processing this directive]
This page is maintained by [an error occurred while processing this directive]MHonArc [an error occurred while processing this directive] # [an error occurred while processing this directive] *