[an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] (none) [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] (none) [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive]
 
[an error occurred while processing this directive] [an error occurred while processing this directive]
Skåne Sjælland Linux User Group - http://www.sslug.dk Home   Subscribe   Mail Archive   Forum   Calendar   Search
MhonArc Date: [Date Prev] [Date Index] [Date Next]   Thread: [Date Prev] [Thread Index] [Date Next]   MhonArc
 

Re: [PROGRAMMERING] Udtræk af url'er fra ASCII fil



Jeg svarer lige samlet her.

Jeg har endnu en gang fået bevist, at jeg ikke skal skrive mails klokken kvart over kvalme om natten. Specielt ikke når jeg skal beskrive en problemstilling.

Filen indeholder andre ting end de urler jeg skal have ud af den, inkl. andre tekster afghrænset med citationstegn. Og den er ikke inddelt i linier. Faktisk er der ikke et eneste linieskift i hele filen, som er 2,2 MB stor.

Disse manglende informationer gør at ingen af de foreslåede løsninger virkede. Men et par af dem kom tæt på.

JEg endte med at klaske et hurtigt PAscal program sammen, som hentede de url'er jeg skulle bruge. Men det ser ud til at det vil være en god idé at se nærmere på awk/sed, inden næste gang.

Tak for hjælpen til jer alle tre. :-)

On 14/07/13 03:13, Jimmy Selgen Nielsen wrote:
Diverse "script" sprog burde da være oplagte til det, men jeg er sikker på at man nok også kan skrue en kommandolinie sammen med sed/awk, f.eks. noget i stil med
	
	cat urltest.txt | sed 's/\"//g' | awk -F: '{for(i=2;i<=$NF;i++) print $i" "}'

Denne virker ikke helt. Jeg har bl.a. set at den udskiller https for sig, og resten af url'en (uden :) for sig.


men umiddelbart burde følgende python nok kunne klare det

https://gist.github.com/jinie/5992705

==============================
#!/usr/bin/env python
import re
import sys

ex = re.compile("\"(url|referer)\"\:\"(.*)\"")
with open(sys.argv[1]) as f:
    for line in iter(f.readline,""):
        m = ex.search(line)
        print(m.group(2))
==============================

Denne så ud til at den ville have virket, hvis ikke den havde troet at der ville være linieskift i filen.

--

        |\     _,,,---,,_       Greetings, Jens
 ZZZzz /,`.-'`'    -.  ;-;;,_
      |,4-  ) )-,_. ,\ (  `'-'  sslug@sslug
     '---''(_/--'  `-'\_)
----------------------------------------------------
Been there, done that, got the T-shirt.


 
Home   Subscribe   Mail Archive   Index   Calendar   Search

 
 
Questions about the web-pages to <www_admin>. Last modified 2013-08-01, 02:05 CEST [an error occurred while processing this directive]
This page is maintained by [an error occurred while processing this directive]MHonArc [an error occurred while processing this directive] # [an error occurred while processing this directive] *