파이썬 2024. 1. 16. 23:56

python에서는 데이터 처리를 위해, 대표적으로 4가지 데이터 저장 방법(list,tuple,dict,set) 을 사용한다.

List, tuples, string 은 sqequences(배열)이다. 배열 내 구성요소를 elements라 한다. 모든 배열은 순서가 존재하는 index가 존재하며 index번호는 0부터 시작한다.
dictionary → key-value형태 자료형
set 은 집합이다. 순서x index(X)

더 자세히 알아보자.

List

List는 같은 성격의 데이터를 담고 있다. (다른 타입과 혼용해서 담아도 된다.)

List에는 숫자, 문자 등 다양한 타입 데이터를 담을 수 있다.

list1 = [1,2,3,4,5,6,11,'a','b']

담긴 정보는 mutable , 변경 가능하다.

인덱싱, 슬라이싱

list1 = [1,2,3,4,5,6,11,'a','b']

#indexing
print(list1[0]) #인덱스0번출력 : 2
print(list1[2:5]) #인덱스 2부터4까지 출력 : 345
print(list1[:7]) #처음부터 인덱스 6까지출력
print(list1[4:]) #인덱스 4부터 끝까지
print(list1[-1]) #마지막 인덱스출력 : b
print(list1[-4:]) #-4번째 인덱스부터 끝까지 출력 : 6 11 a b

#slicing
print(list12)
print(list12[:]) #list copy
print(list12[::2]) #인덱스 2씩 증가하면서 가져옴, 0 2 4 6 .. 인덱스만 가져옴
list12[0:3] = ['one','two','three'] #list 수정
print(list12)
list12[0:3] = [] #인덱스 삭제
print(list12)
del list12[::2] #짝수번만삭제
print(list12)
del list12[:]
print(list12) #elements 모두삭제
del list12 #list삭제

"""
    [1, 2, 3, 4, 5, 6, 'c', 'a', 'b', 11, 12, 13]
    [1, 2, 3, 4, 5, 6, 'c', 'a', 'b', 11, 12, 13]
    [1, 3, 5, 'c', 'b', 12]
    ['one', 'two', 'three', 4, 5, 6, 'c', 'a', 'b', 11, 12, 13]
    [4, 5, 6, 'c', 'a', 'b', 11, 12, 13]
    [5, 'c', 'b', 12]
    []
 """

list 값 추가, 연결, 출력

list1 = [1,2,3,4,5,6,11,'a','b']

#mutable
list1[6] = 'c'

#append, 리스트에 값 추가
list2 = []
for item in range(11,14,1):
    list2 += [item] # = list2.append(item)
print(list2) #[11,12,13]

list3 = []
list3 += 'Python Kooc' #문자열을 리스트에 추가
print(list3, len(list3)) #['P', 'y', 't', 'h', 'o', 'n', ' ', 'K', 'o', 'o', 'c']

list3 += ('M','o','o') #기존 리스트에 튜플 값 추가
print(list3) #['P', 'y', 't', 'h', 'o', 'n', ' ', 'K', 'o', 'o', 'c', 'M', 'o', 'o']

# 리스트 연결
list12 = list1 + list2
print(list12)

# index와 elements동시에 출력하기
for i in range(len(list12)):
    print(f'({i}, {list12[i]})', end=' ')

숫자와 문자 배열의 정렬과 검색

숫자, 문자 배열 정렬

#숫자를 갖는 배열의 정렬
numbers = [7,2,5,4,3,6,1,9,8,15,11]

#sort() 오름차순 정렬
numbers.sort()
print(numbers) #[1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 15]

# reverse() : 역순정렬, sort(reverse=True): 내림차순 정렬
numbers.sort(reverse=True) # [15, 11, 9, 8, 7, 6, 5, 4, 3, 2, 1]
numbers.reverse()
print(numbers) #[1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 15]

#string을 갖는 배열의 정렬
a_string = 'J I love you, and do you love me?'

# split() : 단어별로 하나하나 element로 추출해 리스트로 변환
a_string = a_string.split()
print(a_string,type(a_string)) #['J', 'I', 'love', 'you,', 'and', 'do', 'you', 'love', 'me?'] <class 'list'>

#sort(key=len): 길이로 정렬 (짧은 -> 긴 순), 글자 수가 같다면 리스트에 들어간 순서대로 정렬
a_string.sort(key=len)
print(a_string) #['J', 'I', 'do', 'and', 'you', 'me?', 'love', 'you,', 'love']

#sorted() : 알파벳 순으로 정렬하되 (대문자가 먼저), 숫자정렬의 경우 오름차순
b_string = sorted(a_string)
print(b_string) #['I', 'J', 'and', 'do', 'love', 'love', 'me?', 'you', 'you,']

x = [7,2,5,4,3,6,1,9,8,15,11]
y = sorted(x)
z = reversed(x)
print(x)
print(y)
print(list(z))
print(x)

"""
**********[7, 2, 5, 4, 3, 6, 1, 9, 8, 15, 11]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 15]
[11, 15, 8, 9, 1, 6, 3, 4, 5, 2, 7]
[7, 2, 5, 4, 3, 6, 1, 9, 8, 15, 11]
"""**********

숫자 문자 배열 탐색 및 값 추가 삭제

#searching
print(x.index(3)) #3번째 인덱스값 출력
print(12 in x) #12가 x에 있느냐?
print(12 not in x)
print('Love' in a_string)

"""
    4
    False
    True
    False
"""

#리스트에 특정 위치에 요소 추가
print(a_string)
a_string.insert(3,"him")
print(a_string) #['J', 'I', 'do', 'him', 'and', 'you', 'me?', 'love', 'you,', 'love']

#특정 문자열 제거 혹은 추가
a_string.append('hate') #맨뒤에 추가됨
a_string.remove('him')
print(a_string) #['J', 'I', 'do', 'and', 'you', 'me?', 'love', 'you,', 'love', 'hate']

#여러 요소 추가(리스트 연산)
print(a_string + ['I','and']) #원데이터는 안바뀜 -> 리스트에 연산자를 쓸 수 있다는걸 기억
a_string.extend(['I','and']) #원데이터 바뀜
print(a_string)

#리스트 구성요소 모두 삭제
a_string.clear()

#리스트 복사
x_copied = x.copy()
print(x_copied)

multi dimensional lists

#matirx expression of list
a= [[1,2,3,4],[5,6,7,8],[9,10,11,12]]
print(a) # [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]

for row in a:
    for col in row:
        print(col, end=" ")
    print()

"""
1 2 3 4 
5 6 7 8 
9 10 11 12 
"""

Tuple

리스트처럼 여러 데이터타입의 데이터를 담을 수 있다.

그러나 , 담긴 데이터는 immutable하여 변경이 불가능하다.

t1 = 10,20,30,'John'
t1 += (40,50)
print(t1)
t3 = 'kim', 'park', 'kwon', [80,90,100]

# list를 tuple로 변환
t2 = tuple([1,2,3,4])
print(t2)

#tuple은 list도 elements 중 하나로 담을 수 있다.
print(t3, len(t3),t3[3], t3[3][1])
    # ('kim', 'park', 'kwon', [80, 90, 100]) 4 [80, 90, 100] 90

unpacking (중요!)

튜플뿐만 아니라 파이썬에서 모든 배열은 unpacking이 가능하다.

cf. 데이터 셋을 가져와서, 그 set을 training data와 label로 바꿀때 가장 많이 사용.

#unpacking : tuple의 구성요소를 뽑아내는 기능
t4 = (('kim', 'park', 'kwon'), [80,90,100])
print(len(t4)) # 2
last_name, grades = t4
print(last_name, grades,type(last_name), type(grades))
    #('kim', 'park', 'kwon') [80, 90, 100] <class 'tuple'> <class 'list'>

grade1,grade2, grade3 = grades
print(grade1,grade2,grade3) # 80 90 100

first, second, thrid, fourth = 'WIFI'
print(first, second, thrid, fourth) #W I F I

Enumerate

for 반복문을 사용하지 않고, 인덱스와 값을 뽑아낼 수 있다.

# enumrate
colors = ['red', 'green', 'blue']
print(list(enumerate(colors)))
print(tuple(enumerate(colors)))
"""
    [(0, 'red'), (1, 'green'), (2, 'blue')]
    ((0, 'red'), (1, 'green'), (2, 'blue'))
"""
# enumrate와 unpacking 사용해 bar 차트 출력
bar = [19,8,15,7,11]
for index, value in enumerate(bar):
    print(f'{index:>5}{value:>8}   {"*"*value}')

    """
    0      19   *******************
    1       8   ********
    2      15   ***************
    3       7   *******
    4      11   ***********
    """

List comprehension

간단하게, 리스트 내부에 반복문을 넣어서 긴 코드를 한줄로 줄이는 파이썬 문법

💡 [표현식 for 항목 in 반복가능객체 if 조건문]

a_string = 'hello?'

#list comprehension
list_range = list(range(0,10,2))
print(list_range) #[0, 2, 4, 6, 8]

list_com = [i for i in range(0,10,2)]
print(list_com) #[0, 2, 4, 6, 8]

list_com2 = [i for i in range(0,10) if i % 2 != 0] #홀수로만 리스트만듦
print(list_com2) #[1,3,5,7,9]

list_op = [i**2 for i in range(0,10,2)]
print(list_op) #[0, 4, 8, 16, 36, 64]

list_up = [i.upper() for i in a_string]
print(list_up) #['H', 'E', 'L', 'L', 'O', '?']

list_gen = (i**2 for i in range(10) if i % 2 == 0) #제너레이터 expression, 리스트를 만드는게 아니라 제너레이터 오브젝트만 생성
print(list_gen) #<generator object <genexpr> at 0x102ba4740>
print(list(list_gen)) #[0, 4, 16, 36, 64]

list_a = list(range(0,8,2))
list_c = [i>3 for i in list_a]
print(list_c) #[False, False, True, True]

Lambda, filter, map, zip

filter

A built-in function filter. Filter is also genrator → 객체x, 틀o
filter is higher-order function taking a function as argument
filter takes a function with one argument and returns True when condition is meet.
filter: 조건이 맞으면 값을 돌려준다. 생성자와 같은 역할

list3 = list(range(0,25))
print(list3)

def is_multiple(x):
    return x % 3 == 0 #3의 배수이면 True를 리턴함

#filter function : generator이다. 실제 객체를 만드는것이아님. 펑션을 argument로 받아서 객체에 어떤 작업을 할 수 있다.
print(list(filter(is_multiple, list3))) #3의 배수로만 구성된 리스트 출력됨

lambda

익명함수

#lambda function : anonymous function 이름없는 무명 함수
#무명함수로 함수로서의 다양한 기능을 손쉽게 수행, 필터랑 같이쓰면 함수가 들어갈 매개변수자리에 람다식 사용

list4 = list(filter(lambda x: x % 3 == 0, list3))#list3에서 하나씩 x로 받아서 3의 배수이면 filter를 거쳐 3의배수로 이루어진 리스트를 만듦
print(list4) #3의 배수로만 구성된 리스트 출력

car = ['Santafe', 'Mini', 'pony']
print(min(car)) #알파벳 순이 가장 빠른 요소 출력
print(max(car,key = lambda i: i.lower()) #요소를 다 소문자로 바꾸어서, 가장 알파벳 순이 느린 요소 출력

map

Map도 generator이며 Lazy evaluation이라고도 한다.
1대1로 매핑한다.

#map
print(list(map(lambda x: x**2, list4))) #제곱한 값을 list4와 대칭적으로 순서 똑같이 집어넣어 리스트를 만들어라
list(map(lambda x: x**2, filter(lambda x: x%3 ==0, list3)))
print([i**2 for i in list4])
print(x**2 for x in list3 if x%3 == 0)

"""
[0, 9, 36, 81, 144, 225, 324, 441, 576]
[0, 9, 36, 81, 144, 225, 324, 441, 576]
[0, 9, 36, 81, 144, 225, 324, 441, 576]
[0, 9, 36, 81, 144, 225, 324, 441, 576]
출력값은 같으므로, 람다와 map을 남발하지말고 리스트컴프리헨션을 이용
"""

⇒ lambda, map, filter를 사용하는것보다 list comprehension을 사용하는게 코드가 훨 간단해질 수 있으니 리스트 컴프리헨션을 최대한 이용한다.

zip

두 배열을 쌍으로 뽑아내는 반복문에 주로 사용

#zip: 두개의 리스트 object에서 쌍으로 뽑아낼때 사용
players = ['Ryu', 'de', 'jikey' ,'eru']; goals = [7,20,15,7]
for last_name, goal in zip(players, goals):
    print(f'{last_name} : {goal}')
    
    
"""
Ryu : 7
de : 20
jikey : 15
eru : 7
"""

dictionary

key-value형태 pair 데이터 자장 방식
key와 value는 : 으로 구분되고, 각 element들은 콤마로 구분된다.
순서x → indexing, slicing 일부 가능
key값은 중복이 없어야한다. 그리고 immutable(변경불가) 하다.
dictionary는 중괄호를 사용한다.(리스트 → 대괄호, 튜플 → 소괄호)

countries = {'Korea':'kr', 'Japan':'jp', 'China':'cn', 'France':'fr'}
print(countries)
print(len(countries))
print(countries['Korea']) #딕셔너리는 인덱싱을 key값으로 하는 모습

"""
{'Korea': 'kr', 'Japan': 'jp', 'China': 'cn', 'France': 'fr'}
4
kr
"""

# Dictionary operations
#searching
print('korea' in countries) # false

#update
countries['Korea'] = 82
countries.update(Japan = '81')
print(countries) #{'Korea': 82, 'Japan': 'jp', 'China': 'cn', 'France': 'fr'}

#append
countries['Canada'] = 'ca' #Canada는 기존 dict에 없으므로 추가한다.

#delete :pair 삭제
del countries['Korea']
print(countries) #{'Japan': '81', 'China': 'cn', 'France': 'fr', 'Canada': 'ca'}

#get() value값 반환
print(countries.get('Japan')) #81

#없는 값 불러올때
print(countries['Korea']) #error
print(countries.get('Korea')) #none

iterations(반복) and extractions(추출)

#Iteratting elements in a dict
countries = {'Korea':'kr', 'Japan':'jp', 'China':'cn', 'France':'fr'}
for country, code in countries.items(): #items() 쌍으로 가져와서 분해해서 각각 할당 - 리스트,튜플 에서 enumerate를 쓰는것과 같은 원리이다.
    print(f'The country code of {country} is {code}.')

    """
    The country code of Korea is kr.
    The country code of Japan is jp.
    The country code of China is cn.
    The country code of France is fr.
    """
#key만 가져오기
for country in countries.keys():
    print(country, end=' ') #Korea Japan China France
print()

#value만 가져오기
for codes in countries.values():
    print(codes, end=' ') #kr jp cn fr
print()

c_codes = {'Korea':82, 'Japan':81, 'Taiwan':886, 'Finland':358}
print(list(c_codes.keys())) #['Korea', 'Japan', 'Taiwan', 'Finland']
print(list(c_codes.values())) #[82, 81, 886, 358]
l_countries = list(c_codes.items()) #key-value쌍이 튜플 데이터형으로 들어가있는 리스트
print(l_countries)#[('Korea', 82), ('Japan', 81), ('Taiwan', 886), ('Finland', 358)]
print(l_countries[0]) #indexing 부분적 가능
print(l_countries[:2]) #slicing 부분적 가능

#zip 사용 : items()처럼 key-value값을 따로따로 가져와서 각각 할당
for i, j in zip(countries.keys(), countries.values()):
    print(f'{i} : {j}')

sorting and comprehension expressions in dict

#1. switch
switched = {code: country for country, code in c_codes.items()}#country와 code를 받아서 뒤바꿔버렸다.
print(sorted(switched.items()))

#2. dict로 새로운 dict를 만들기도
temperature = {'Japan':[23,34,26], 'Korea':[22,33,25]}
temp_mean = {k: sum(v)/len(v) for k, v in temperature.items()}
#쌍을 받아옴, 그리고 언팩킹 k에는 key값을 할당하고, v에는 list값을 할당한다. k는 그대로, v는 리스트 요소들의 평균으로 변환
print(temp_mean)

#3. lambda와 함께 사용하기도
s_value = {k: v for k, v in sorted(c_codes.items(), key=lambda country: country[1])}
"""sorting을 할건데, value값을 key로 사용해서 정렬하여라, sorting한 새로운 dict를 s_value obejct로만들자
   country[1] : value를 나타낸다.(country[0]은 key를 나타낸다)
   
"""
print(s_value)

출력

[(81, 'Japan'), (82, 'Korea'), (358, 'Finland'), (886, 'Taiwan')]
{'Japan': 27.666666666666668, 'Korea': 26.666666666666668}
{'Japan': 81, 'Korea': 82, 'Finland': 358, 'Taiwan': 886}

단어추출에서 사용되는 dict

dream = """I have a dream that one day this nation will rise up and live out the true meaning of its creed We hold these truths to be self-evident that all men are created equal

I have a dream that one day on the red hills of Georgia the sons of former slaves and the sons of former slave owners will be able to sit down together at the table of brotherhood

I have a dream that one day even the state of Mississippi a state sweltering with the heat of injustice sweltering with the heat of oppression, will be transformed into an oasis of freedom and justice

I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character

I have a dream today

I have a dream that one day, down in Alabama, with its vicious racists, with its governor having his lips dripping with the words of interposition and nullification one day right there in Alabama little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers
"""

dream_words = dream.split() #space bar로 구분
print(dream_words)  #['I', 'have', 'a', 'dream', 'that', 'one', 'day', 'this', ...

wordlist = {}
for word in dream_words: #list에서 하나하나빼서, 중복단어 몇번이나 나왔는지 카운트하는 반복문
    if word in wordlist:
        wordlist[word] +=1
    else:
        wordlist[word] = 1

print(wordlist) #{'I': 6, 'have': 6, 'a': 8, 'drea ...
#내림차순으로정렬 value값으로
swl_byvalue = {k: v for k,v in sorted(wordlist.items(), key=lambda i:i[1],reverse=True)}
print(swl_byvalue) #{'of': 12, 'the': 11, 'a': 8, 'a...

print(f'{"WORD":<16}COUNT')

for word, count in swl_byvalue.items(): #word와 count를 unpacking
    if count >= 6: #6번중복 이상이면 출력
        print(f'{word:<16} {count}')
print('\\nNumber of unique words: ', len(wordlist))

"""
WORD            COUNT
of               12
the              11
a                8
and              7
I                6
have             6
dream            6
that             6
one              6
will             6
with             6

Number of unique words:  104

"""

collections 라이브러리의 counter 메소드

위 코드에서 count반복 작업 안할 수 있게 해준다. 중복인걸 count해서 dict형태로 반환

wordlist = {}
for word in dream_words: #list에서 하나하나빼서, 중복단어 몇번이나 나왔는지 카운트하는 반복문
    if word in wordlist:
        wordlist[word] +=1
    else:
        wordlist[word] = 1

바뀐 코드

wordlist1 = Counter(dream_words)

Set

데이터 집합
중복없는 elements로만 구성
순서x → indexing, slicing불가
머신러닝에서 거의 사용하지않음

#unique elements로만 구성된다 (중복이 없다.)

#set은 dict처럼 {}로 생성한다. 그러나 key-value형태가 아니다.
Months = {'January','February','March','April','May','June'}
print(Months) #random출력

#set()을 이용해서 set을 생성
digits = set(list(range(10))+list(range(5,15)))
print(digits) #중복이사라진모습 {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}

#특정 요소 삭제
digits.remove(8)

#모든 요소 삭제
digits.clear()
print(digits) #set()

#요소 수정
digits.update(range(10))
print(digits) #{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

#comparisons
print({1,2,3} <= {1,2,3,4,5,6,7}) #<= : 왼쪽이 오른쪽의 부분집합이냐? yes! True출력
print({1,2,3} | {1,2,3,4,5,6,7}) #합집합
print({1,2,3} & {1,2,3,4,5,6,7})  #교집합
print({1,2,3} - {1,2,3,4,5,6,7})  #여집합
print({1,2,3,4,5,6,7} - {1,2,3} )  #여집합
print(set(range(10)) - set(range(5,15))) #여집합

"""
True
{1, 2, 3, 4, 5, 6, 7}
{1, 2, 3}
set()
{4, 5, 6, 7}
{0, 1, 2, 3, 4}
"""

#comprehension
odds = {i for i in digits if i%2 != 0}
print(odds) #{1, 3, 5, 7, 9}

참고강의

https://kooc.kaist.ac.kr/python4ai/joinLectures/47074

인공지능 코딩을 위한 실용 파이썬(Practical Python for AI Coding) 강좌소개 : edwith

- KAIST 기술경영학부 권영선 교수

kooc.kaist.ac.kr

'파이썬' 카테고리의 다른 글

[python] 모듈, 패키지 (0)	2023.08.26
[python] 클래스 (0)	2023.08.26
[python] 입출력 (0)	2023.08.21
[python] 함수 (0)	2023.08.21
[python] 제어문 (0)	2023.08.20

ABOUT ME

lee-ding lee-ding

List

Tuple

List comprehension

Lambda, filter, map, zip

dictionary

Set

'파이썬' 카테고리의 다른 글

티스토리툴바

ABOUT ME

List

Tuple

List comprehension

Lambda, filter, map, zip

dictionary

Set

'파이썬' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바